Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.5 KiB
Phase C+18 Investigation — Shared-global first-toucher race (2026-05-14)
Framing verification (reading-error #28 discipline)
C+17 result: main matched-prefix advanced 102,171 → 102,553 (+382) when
ours's ensure_dispatcher_object started emitting handle.create for
synthesized shadows. But sister chain tid=15→10 REGRESSED from 16 → 2:
canary tid=15: ours tid=10:
[0] import.call KeWaitForSingleObject [0] import.call KeWaitForSingleObject
[1] kernel.call KeWaitForSingleObject [1] kernel.call KeWaitForSingleObject
[2] wait.begin sid=66ae1b598f928969 [2] handle.create sid=b9e6799594b746ee
[3] kernel.return [3] wait.begin sid=b9e6799594b746ee
[4] kernel.return
The two engines disagree at idx=2: canary's tid=15 has wait.begin,
ours's tid=10 has handle.create. The SIDs are different too
(66ae1b598f928969 vs b9e6799594b746ee) but the diff tool already
SKIPS SID fields per C+15-α schema-v1.
Root cause: shared-global first-toucher race
The dispatcher at guest pointer 0x828a3230 is a process-global
KSEMAPHORE (object_type=3) that's touched by MULTIPLE guest threads
during boot:
-
Canary: some thread other than tid=15 (likely the main boot thread, tid=6) touches it first → emits
handle.createthere. By the time tid=15 reachesKeWaitForSingleObject, the wrapper exists, soXObject::GetNativeObjectshort-circuits via thekXObjSignaturemarker and emits NO additional event. Canary tid=15's stream is 3 events long: import → kernel.call → wait.begin → kernel.return. -
Ours: tid=10 happens to be the first toucher → ours's
ensure_dispatcher_objectemitshandle.createon tid=10. ours tid=10's stream is 4 events long: import → kernel.call → handle.create → wait.begin → kernel.return.
Both engines do the right thing semantically; whichever thread wins the "first toucher" race depends on thread scheduling, which is NOT bit-identical across engines (different host schedulers, JIT, etc.). The diff tool sees one extra event on one side and reports it as a divergence — but it's observation-side, not behavioral.
This is C+17 D-NEW-3.
Verified via static + dynamic evidence
- Both ours's
ensure_dispatcher_object(exports.rs:4363) and canary'sXObject::GetNativeObject(xobject.cc:397-483) are per-pointer idempotent: re-entry on a pointer that already has thekXObjSignaturemarker short-circuits without emit. - The shared
objectstable is process-global in both engines (KernelState::objectsmap; canary'sKernelState::object_table()). - In the ours-cold log,
0x828a3230appears in exactly ONEhandle.create(on tid=10) — confirming the per-pointer idempotence:
$ grep '"raw_handle_id":"0x828a3230"' ours-cold.jsonl
{"kind":"handle.create","tid":10,"tid_event_idx":2,...}
-
The canary diff side reports
[2] wait.beginwith a SID that refers to a dispatcher whosehandle.createwas already emitted elsewhere (likely on canary tid=6 main chain or a worker). -
The SID computation in both engines uses
semantic_id(create_site_pc=0, creating_tid, idx_at_creation, object_type). Bothcreating_tidandidx_at_creationdepend on WHICH thread did the first touch — so even if both engines wrapped the same dispatcher, their SIDs would still differ.
Class of bug
Class η — harness observation-side asymmetry on scheduling-non- deterministic process-global state. Not a real engine bug; both engines are doing the right thing. The harness (per-tid sequence diff) is the wrong abstraction for this class of event.
Fix shape
Two coordinated changes, both small and additive:
(A) Engine: scheduling-invariant SID for process-global dispatchers
Add event_log::semantic_id_shared_global(pointer, object_type) (ours
and canary) — a SID recipe keyed only on (pointer, object_type).
Inputs to the existing FNV-1a:
create_site_pc = SHARED_GLOBAL_SID_MARKER (= 0xC01AB005, fixed sentinel)
creating_tid = 0
tid_event_idx = pointer as u64
object_type = object_type
The marker constant sits outside any plausible guest-PC range (PPC text 0x82000000-0x82FFFFFF; XEX header 0x3001xxxx; heap 0x4xxxxxxx) so it NEVER collides with regular per-thread SIDs (which use real PCs).
ensure_dispatcher_object (ours) and XObject::GetNativeObject
(canary) route their handle.create emit through this recipe instead
of the per-thread semantic_id. Both engines compute the same SID
for the same dispatcher pointer regardless of which guest thread wins
the first-toucher race.
(B) Diff tool: cross-tid floating handle.create matching
Pre-pass: collect the set of shared-global SIDs across BOTH engines and
ALL tids. A handle.create event is detected as shared-global by
recomputing the deterministic SID from its (raw_handle_id, object_type) payload and matching against handle_semantic_id.
When per-tid comparison finds a kind mismatch where one side has a
handle.create whose SID is in the floating set:
- Advance only that side's stream pointer past the floating event.
- Re-compare at the same canonical position.
This handles the "extra event on tid=10 but not tid=15" case
symmetrically. Subsequent wait.begin events whose
handles_semantic_ids element matches a shared-global SID continue to
align via the schema-v1 strict-equality rule (SID fields are already
skipped per the C+15-α SKIP_PAYLOAD_FIELDS_BY_KIND policy, but the
underlying object alignment is preserved by the deterministic recipe —
useful for future passes that re-enable SID comparison).
Why this is the right fix (not over-suppression)
- Pointer-derived SIDs are unique per object identity. Two distinct
dispatchers at the same pointer with different
object_typeget distinct SIDs (defense in depth). - Regular per-thread
handle.createevents keep strict alignment. Only events whose SID matches the deterministic shared-global recipe are eligible for cross-tid absorption. A regular file-handle create (allocated viaalloc_handle_for/AddHandle) uses the per-(tid, idx) SID recipe and CANNOT match the shared-global hash by construction. - The diff tool still reports real divergences. Tests confirm:
test_non_floating_real_divergence_still_caught— an unrelated extra event on ours's side IS reported.test_strict_alignment_without_floating— when the floating set is empty, legacy strict behavior holds.