Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.3 KiB
Phase C+17 — Broad-impact catalog (2026-05-14)
The C+17 fix touches a widely-used primitive (ensure_dispatcher_object,
called by Ke{Wait,Set,Reset,Pulse}Event, Ke{Wait,Release}Semaphore, etc.).
This catalog enumerates the surfaced divergences post-fix per chain.
Resolved (3 of 5 catalogued in C+15-α)
D-2 / D-3 / D-4 — KeWait*ForSingleObject native-obj handle (all 5 chains)
Class E asymmetry. Canary's xeKeWaitForSingleObject /
KeWaitForMultipleObjects_entry calls XObject::GetNativeObject which
emits handle.create for the synthesized wrapper; ours's
ensure_dispatcher_object did the same shadow synthesis but never emitted
the schema event. Fix: emit handle.create (with the appropriate
object_type from KernelObject::schema_object_type) on first
adoption, and register the SID so subsequent wait.begin events resolve
non-zero handles_semantic_ids[].
Observed: all 5 chains' divergences move past the wait-begin idx that was previously blocked at SID=0.
Advanced
Main tid=6→1 (+382)
102,171 → 102,553. The 382 new matching events between the two indexes are
mostly kernel.{call,return}, import.call, RtlEnter/LeaveCriticalSection,
plus the now-aligned handle.create+wait.begin pairs from
KeWaitForSingleObject and KeWaitForMultipleObjects calls. Several
new shadow handle.create events fire on first encounter of
specific PKEVENT/PKSEMAPHORE pointers in the game's init path.
Sister chains (+3 / +2 / +1 / +39)
- tid=4→11 +3: matches all 11 emitted events.
- tid=7→2 +2: matches all 32 events.
- tid=12→7 +1: matches through
handle.createat idx=2. - tid=14→9 +39: walks past all the now-aligned
KeWait*framing into the audio subsystem.
Persisted (pre-existing bugs unaffected)
None of the C+15-α catalog's other groups are touched.
NEW divergences (cataloged for future iterates)
D-NEW-1 (HIGH) — main idx=102,553: NtDuplicateObject no handle.create
Canary's NtDuplicateObject_entry → ObjectTable::DuplicateHandle
allocates a new slot via AddHandle(object, &new_handle)
(util/object_table.cc:148-201), which fires the C+15-α-wired
phase_a::EmitHandleCreateAuto. Ours's nt_duplicate_object
(exports.rs nt_duplicate_object) implements per-AUDIT-062 alias-on-dup
semantics: dup_id = source_id so refcount-bumped re-use of the same
slot. No new handle.create fires.
This is a genuine engine-architectural difference. Mirror options:
- (a) Make ours allocate a fresh handle on
NtDuplicateObjectand emithandle.create(mirror canary). ~30-40 LOC; downstream impact on every existing AUDIT-062-dependent code path needs audit. - (b) Diff-tool suppress this
handle.createsite. Band-aid.
Recommendation: (a). C+18 target. Trade-off: AUDIT-062's "alias on dup" was implemented to handle a specific worker-cluster handle-aliasing issue; un-doing it may surface a different regression. The risk profile is similar to C+15-α: invisible state divergences become visible. ~30 LOC fix or ~30 LOC tactical revert.
D-NEW-2 (MEDIUM) — tid=12→7 idx=3: wait.begin.timeout_ns mismatch
canary: wait.begin handles_semantic_ids=[SID-A] timeout_ns=-30000000
ours: wait.begin handles_semantic_ids=[SID-B] timeout_ns=429466729600
The SIDs differ (skipped per diff policy). The timeout_ns is the issue:
canary uses 30ms relative timeout; ours has 429.47ms absolute-time
encoding. Likely cause: ours's decode_timeout_ns returns the raw
mem.read_u64(timeout_ptr) as i64 * 100 without applying the
"negative=relative / positive=absolute" semantics consistently with
canary. Inspect decode_timeout_ns (exports.rs:4890) — canary's
threading.cc emit code passes (*timeout_ptr) * 100 directly without
sign conversion either, so the divergence is upstream in how each engine
writes the TIMEOUT* struct. Probably ε-class (game-side state
encoding).
C+19 target estimate. ~10-30 LOC investigation.
D-NEW-3 (LOW) — tid=15→10 idx=2: handle.create ordering on shared dispatcher
Canary's GetNativeObject is process-global: once any thread adopts
a dispatcher pointer (stashing kXObjSignature in the wait_list), all
subsequent threads find the existing handle and do NOT re-emit. Canary's
handle.create for the semaphore at guest pointer 0x828a3230 (XAudio
voice volume changemask?) emitted earlier on a different thread; on tid=15
the first wait happens to skip straight to wait.begin.
Ours's ensure_dispatcher_object is also process-global (the state.objects
map is shared in KernelState). However, the timing of first adoption
differs because thread interleaving / boot ordering between the two engines
isn't bit-identical. Ours's tid=10 happens to be the first to touch
0x828a3230, so it emits handle.create at idx=2; canary's tid=15
arrived after another thread (probably tid=6 or tid=10) had already
adopted it.
This is a timing-induced ordering divergence, not a state-model
asymmetry. It's the inverse of the typical D-1/D-2 class — both engines
emit the SAME total number of handle.create events; the issue is which
thread happens to be the "first toucher". The diff tool currently treats
this as a divergence because it compares per-tid sequences strictly.
Two possible mitigations:
- (a) Diff-tool: relax ordering for
handle.createemits when the "next thread" event iswait.beginon the same dispatcher. Complex. - (b) Suppress
handle.createfrom the per-thread sequence entirely; treat it as a global emit and only diffwait.beginSIDs against a process-global SID-registry. Could work viaSKIP_PAYLOAD_FIELDS_BY_KINDextension to drop the event from per-tid alignment. - (c) Live with the +0/-14 trade-off on tid=15→10 — the main chain improvement dwarfs it.
Recommendation: (c) for now; C+20+ if the chain becomes load-bearing.
Reading-error register
- Reading-error #28 (verify framing first): FOLLOWED. Canary's
GetNativeObjectwas read end-to-end before any code change. - Reading-error #23 (widely-used primitive flip): MITIGATED. Cold-vs-cold gate caught no main-chain regression; minor sister-chain regression on tid=15→10 is documented as NEW-3.
- Reading-error #19 (host-side emits): FOLLOWED.
event_log::is_enabled()guards on every new emit; default-off cost is one relaxed atomic-bool check (zero cost when disabled).