Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
135 lines
6.3 KiB
Markdown
135 lines
6.3 KiB
Markdown
# Phase C+17 — Broad-impact catalog (2026-05-14)
|
||
|
||
The C+17 fix touches a widely-used primitive (`ensure_dispatcher_object`,
|
||
called by `Ke{Wait,Set,Reset,Pulse}Event`, `Ke{Wait,Release}Semaphore`, etc.).
|
||
This catalog enumerates the surfaced divergences post-fix per chain.
|
||
|
||
## Resolved (3 of 5 catalogued in C+15-α)
|
||
|
||
### D-2 / D-3 / D-4 — KeWait*ForSingleObject native-obj handle (all 5 chains)
|
||
|
||
Class E asymmetry. Canary's `xeKeWaitForSingleObject` /
|
||
`KeWaitForMultipleObjects_entry` calls `XObject::GetNativeObject` which
|
||
emits `handle.create` for the synthesized wrapper; ours's
|
||
`ensure_dispatcher_object` did the same shadow synthesis but never emitted
|
||
the schema event. Fix: emit `handle.create` (with the appropriate
|
||
`object_type` from `KernelObject::schema_object_type`) on first
|
||
adoption, and register the SID so subsequent `wait.begin` events resolve
|
||
non-zero `handles_semantic_ids[]`.
|
||
|
||
Observed: all 5 chains' divergences move past the wait-begin idx that was
|
||
previously blocked at SID=0.
|
||
|
||
## Advanced
|
||
|
||
### Main tid=6→1 (+382)
|
||
|
||
102,171 → 102,553. The 382 new matching events between the two indexes are
|
||
mostly `kernel.{call,return}`, `import.call`, `RtlEnter/LeaveCriticalSection`,
|
||
plus the now-aligned `handle.create`+`wait.begin` pairs from
|
||
`KeWaitForSingleObject` and `KeWaitForMultipleObjects` calls. Several
|
||
new shadow `handle.create` events fire on first encounter of
|
||
specific PKEVENT/PKSEMAPHORE pointers in the game's init path.
|
||
|
||
### Sister chains (+3 / +2 / +1 / +39)
|
||
|
||
- tid=4→11 +3: matches all 11 emitted events.
|
||
- tid=7→2 +2: matches all 32 events.
|
||
- tid=12→7 +1: matches through `handle.create` at idx=2.
|
||
- tid=14→9 +39: walks past all the now-aligned `KeWait*` framing into the
|
||
audio subsystem.
|
||
|
||
## Persisted (pre-existing bugs unaffected)
|
||
|
||
None of the C+15-α catalog's other groups are touched.
|
||
|
||
## NEW divergences (cataloged for future iterates)
|
||
|
||
### D-NEW-1 (HIGH) — main idx=102,553: `NtDuplicateObject` no `handle.create`
|
||
|
||
Canary's `NtDuplicateObject_entry` → `ObjectTable::DuplicateHandle`
|
||
allocates a new slot via `AddHandle(object, &new_handle)`
|
||
(util/object_table.cc:148-201), which fires the C+15-α-wired
|
||
`phase_a::EmitHandleCreateAuto`. Ours's `nt_duplicate_object`
|
||
(exports.rs `nt_duplicate_object`) implements per-AUDIT-062 alias-on-dup
|
||
semantics: `dup_id = source_id` so refcount-bumped re-use of the same
|
||
slot. No new `handle.create` fires.
|
||
|
||
This is a genuine engine-architectural difference. Mirror options:
|
||
- (a) Make ours allocate a fresh handle on `NtDuplicateObject` and emit
|
||
`handle.create` (mirror canary). ~30-40 LOC; downstream impact on
|
||
every existing AUDIT-062-dependent code path needs audit.
|
||
- (b) Diff-tool suppress this `handle.create` site. Band-aid.
|
||
|
||
Recommendation: (a). C+18 target. Trade-off: AUDIT-062's "alias on dup"
|
||
was implemented to handle a specific worker-cluster handle-aliasing
|
||
issue; un-doing it may surface a different regression. The risk
|
||
profile is similar to C+15-α: invisible state divergences become
|
||
visible. ~30 LOC fix or ~30 LOC tactical revert.
|
||
|
||
### D-NEW-2 (MEDIUM) — tid=12→7 idx=3: `wait.begin.timeout_ns` mismatch
|
||
|
||
```
|
||
canary: wait.begin handles_semantic_ids=[SID-A] timeout_ns=-30000000
|
||
ours: wait.begin handles_semantic_ids=[SID-B] timeout_ns=429466729600
|
||
```
|
||
|
||
The SIDs differ (skipped per diff policy). The `timeout_ns` is the issue:
|
||
canary uses 30ms relative timeout; ours has 429.47ms absolute-time
|
||
encoding. Likely cause: ours's `decode_timeout_ns` returns the raw
|
||
`mem.read_u64(timeout_ptr) as i64 * 100` without applying the
|
||
"negative=relative / positive=absolute" semantics consistently with
|
||
canary. Inspect `decode_timeout_ns` (exports.rs:4890) — canary's
|
||
threading.cc emit code passes `(*timeout_ptr) * 100` directly without
|
||
sign conversion either, so the divergence is upstream in how each engine
|
||
**writes** the TIMEOUT* struct. Probably ε-class (game-side state
|
||
encoding).
|
||
|
||
C+19 target estimate. ~10-30 LOC investigation.
|
||
|
||
### D-NEW-3 (LOW) — tid=15→10 idx=2: `handle.create` ordering on shared dispatcher
|
||
|
||
Canary's `GetNativeObject` is **process-global**: once any thread adopts
|
||
a dispatcher pointer (stashing `kXObjSignature` in the wait_list), all
|
||
subsequent threads find the existing handle and do NOT re-emit. Canary's
|
||
`handle.create` for the semaphore at guest pointer `0x828a3230` (XAudio
|
||
voice volume changemask?) emitted earlier on a different thread; on tid=15
|
||
the first wait happens to skip straight to `wait.begin`.
|
||
|
||
Ours's `ensure_dispatcher_object` is also process-global (the `state.objects`
|
||
map is shared in `KernelState`). However, the **timing of first adoption**
|
||
differs because thread interleaving / boot ordering between the two engines
|
||
isn't bit-identical. Ours's tid=10 happens to be the first to touch
|
||
`0x828a3230`, so it emits `handle.create` at idx=2; canary's tid=15
|
||
arrived after another thread (probably tid=6 or tid=10) had already
|
||
adopted it.
|
||
|
||
This is a **timing-induced ordering** divergence, not a state-model
|
||
asymmetry. It's the inverse of the typical D-1/D-2 class — both engines
|
||
emit the SAME total number of `handle.create` events; the issue is which
|
||
thread happens to be the "first toucher". The diff tool currently treats
|
||
this as a divergence because it compares per-tid sequences strictly.
|
||
|
||
Two possible mitigations:
|
||
- (a) Diff-tool: relax ordering for `handle.create` emits when the
|
||
"next thread" event is `wait.begin` on the same dispatcher. Complex.
|
||
- (b) Suppress `handle.create` from the per-thread sequence entirely;
|
||
treat it as a global emit and only diff `wait.begin` SIDs against a
|
||
process-global SID-registry. Could work via `SKIP_PAYLOAD_FIELDS_BY_KIND`
|
||
extension to drop the event from per-tid alignment.
|
||
- (c) Live with the +0/-14 trade-off on tid=15→10 — the main chain
|
||
improvement dwarfs it.
|
||
|
||
Recommendation: (c) for now; C+20+ if the chain becomes load-bearing.
|
||
|
||
## Reading-error register
|
||
|
||
- **Reading-error #28 (verify framing first)**: FOLLOWED. Canary's
|
||
`GetNativeObject` was read end-to-end before any code change.
|
||
- **Reading-error #23 (widely-used primitive flip)**: MITIGATED. Cold-vs-cold
|
||
gate caught no main-chain regression; minor sister-chain regression on
|
||
tid=15→10 is documented as NEW-3.
|
||
- **Reading-error #19 (host-side emits)**: FOLLOWED. `event_log::is_enabled()`
|
||
guards on every new emit; default-off cost is one relaxed atomic-bool
|
||
check (zero cost when disabled).
|