handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
134
audit-runs/phase-c17-keWait-native-object/broad-impact.md
Normal file
134
audit-runs/phase-c17-keWait-native-object/broad-impact.md
Normal file
@@ -0,0 +1,134 @@
|
||||
# Phase C+17 — Broad-impact catalog (2026-05-14)
|
||||
|
||||
The C+17 fix touches a widely-used primitive (`ensure_dispatcher_object`,
|
||||
called by `Ke{Wait,Set,Reset,Pulse}Event`, `Ke{Wait,Release}Semaphore`, etc.).
|
||||
This catalog enumerates the surfaced divergences post-fix per chain.
|
||||
|
||||
## Resolved (3 of 5 catalogued in C+15-α)
|
||||
|
||||
### D-2 / D-3 / D-4 — KeWait*ForSingleObject native-obj handle (all 5 chains)
|
||||
|
||||
Class E asymmetry. Canary's `xeKeWaitForSingleObject` /
|
||||
`KeWaitForMultipleObjects_entry` calls `XObject::GetNativeObject` which
|
||||
emits `handle.create` for the synthesized wrapper; ours's
|
||||
`ensure_dispatcher_object` did the same shadow synthesis but never emitted
|
||||
the schema event. Fix: emit `handle.create` (with the appropriate
|
||||
`object_type` from `KernelObject::schema_object_type`) on first
|
||||
adoption, and register the SID so subsequent `wait.begin` events resolve
|
||||
non-zero `handles_semantic_ids[]`.
|
||||
|
||||
Observed: all 5 chains' divergences move past the wait-begin idx that was
|
||||
previously blocked at SID=0.
|
||||
|
||||
## Advanced
|
||||
|
||||
### Main tid=6→1 (+382)
|
||||
|
||||
102,171 → 102,553. The 382 new matching events between the two indexes are
|
||||
mostly `kernel.{call,return}`, `import.call`, `RtlEnter/LeaveCriticalSection`,
|
||||
plus the now-aligned `handle.create`+`wait.begin` pairs from
|
||||
`KeWaitForSingleObject` and `KeWaitForMultipleObjects` calls. Several
|
||||
new shadow `handle.create` events fire on first encounter of
|
||||
specific PKEVENT/PKSEMAPHORE pointers in the game's init path.
|
||||
|
||||
### Sister chains (+3 / +2 / +1 / +39)
|
||||
|
||||
- tid=4→11 +3: matches all 11 emitted events.
|
||||
- tid=7→2 +2: matches all 32 events.
|
||||
- tid=12→7 +1: matches through `handle.create` at idx=2.
|
||||
- tid=14→9 +39: walks past all the now-aligned `KeWait*` framing into the
|
||||
audio subsystem.
|
||||
|
||||
## Persisted (pre-existing bugs unaffected)
|
||||
|
||||
None of the C+15-α catalog's other groups are touched.
|
||||
|
||||
## NEW divergences (cataloged for future iterates)
|
||||
|
||||
### D-NEW-1 (HIGH) — main idx=102,553: `NtDuplicateObject` no `handle.create`
|
||||
|
||||
Canary's `NtDuplicateObject_entry` → `ObjectTable::DuplicateHandle`
|
||||
allocates a new slot via `AddHandle(object, &new_handle)`
|
||||
(util/object_table.cc:148-201), which fires the C+15-α-wired
|
||||
`phase_a::EmitHandleCreateAuto`. Ours's `nt_duplicate_object`
|
||||
(exports.rs `nt_duplicate_object`) implements per-AUDIT-062 alias-on-dup
|
||||
semantics: `dup_id = source_id` so refcount-bumped re-use of the same
|
||||
slot. No new `handle.create` fires.
|
||||
|
||||
This is a genuine engine-architectural difference. Mirror options:
|
||||
- (a) Make ours allocate a fresh handle on `NtDuplicateObject` and emit
|
||||
`handle.create` (mirror canary). ~30-40 LOC; downstream impact on
|
||||
every existing AUDIT-062-dependent code path needs audit.
|
||||
- (b) Diff-tool suppress this `handle.create` site. Band-aid.
|
||||
|
||||
Recommendation: (a). C+18 target. Trade-off: AUDIT-062's "alias on dup"
|
||||
was implemented to handle a specific worker-cluster handle-aliasing
|
||||
issue; un-doing it may surface a different regression. The risk
|
||||
profile is similar to C+15-α: invisible state divergences become
|
||||
visible. ~30 LOC fix or ~30 LOC tactical revert.
|
||||
|
||||
### D-NEW-2 (MEDIUM) — tid=12→7 idx=3: `wait.begin.timeout_ns` mismatch
|
||||
|
||||
```
|
||||
canary: wait.begin handles_semantic_ids=[SID-A] timeout_ns=-30000000
|
||||
ours: wait.begin handles_semantic_ids=[SID-B] timeout_ns=429466729600
|
||||
```
|
||||
|
||||
The SIDs differ (skipped per diff policy). The `timeout_ns` is the issue:
|
||||
canary uses 30ms relative timeout; ours has 429.47ms absolute-time
|
||||
encoding. Likely cause: ours's `decode_timeout_ns` returns the raw
|
||||
`mem.read_u64(timeout_ptr) as i64 * 100` without applying the
|
||||
"negative=relative / positive=absolute" semantics consistently with
|
||||
canary. Inspect `decode_timeout_ns` (exports.rs:4890) — canary's
|
||||
threading.cc emit code passes `(*timeout_ptr) * 100` directly without
|
||||
sign conversion either, so the divergence is upstream in how each engine
|
||||
**writes** the TIMEOUT* struct. Probably ε-class (game-side state
|
||||
encoding).
|
||||
|
||||
C+19 target estimate. ~10-30 LOC investigation.
|
||||
|
||||
### D-NEW-3 (LOW) — tid=15→10 idx=2: `handle.create` ordering on shared dispatcher
|
||||
|
||||
Canary's `GetNativeObject` is **process-global**: once any thread adopts
|
||||
a dispatcher pointer (stashing `kXObjSignature` in the wait_list), all
|
||||
subsequent threads find the existing handle and do NOT re-emit. Canary's
|
||||
`handle.create` for the semaphore at guest pointer `0x828a3230` (XAudio
|
||||
voice volume changemask?) emitted earlier on a different thread; on tid=15
|
||||
the first wait happens to skip straight to `wait.begin`.
|
||||
|
||||
Ours's `ensure_dispatcher_object` is also process-global (the `state.objects`
|
||||
map is shared in `KernelState`). However, the **timing of first adoption**
|
||||
differs because thread interleaving / boot ordering between the two engines
|
||||
isn't bit-identical. Ours's tid=10 happens to be the first to touch
|
||||
`0x828a3230`, so it emits `handle.create` at idx=2; canary's tid=15
|
||||
arrived after another thread (probably tid=6 or tid=10) had already
|
||||
adopted it.
|
||||
|
||||
This is a **timing-induced ordering** divergence, not a state-model
|
||||
asymmetry. It's the inverse of the typical D-1/D-2 class — both engines
|
||||
emit the SAME total number of `handle.create` events; the issue is which
|
||||
thread happens to be the "first toucher". The diff tool currently treats
|
||||
this as a divergence because it compares per-tid sequences strictly.
|
||||
|
||||
Two possible mitigations:
|
||||
- (a) Diff-tool: relax ordering for `handle.create` emits when the
|
||||
"next thread" event is `wait.begin` on the same dispatcher. Complex.
|
||||
- (b) Suppress `handle.create` from the per-thread sequence entirely;
|
||||
treat it as a global emit and only diff `wait.begin` SIDs against a
|
||||
process-global SID-registry. Could work via `SKIP_PAYLOAD_FIELDS_BY_KIND`
|
||||
extension to drop the event from per-tid alignment.
|
||||
- (c) Live with the +0/-14 trade-off on tid=15→10 — the main chain
|
||||
improvement dwarfs it.
|
||||
|
||||
Recommendation: (c) for now; C+20+ if the chain becomes load-bearing.
|
||||
|
||||
## Reading-error register
|
||||
|
||||
- **Reading-error #28 (verify framing first)**: FOLLOWED. Canary's
|
||||
`GetNativeObject` was read end-to-end before any code change.
|
||||
- **Reading-error #23 (widely-used primitive flip)**: MITIGATED. Cold-vs-cold
|
||||
gate caught no main-chain regression; minor sister-chain regression on
|
||||
tid=15→10 is documented as NEW-3.
|
||||
- **Reading-error #19 (host-side emits)**: FOLLOWED. `event_log::is_enabled()`
|
||||
guards on every new emit; default-off cost is one relaxed atomic-bool
|
||||
check (zero cost when disabled).
|
||||
Reference in New Issue
Block a user