Files
xenia-rs/audit-runs/phase-c15a-schema-wiring/new-divergences.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

6.8 KiB
Raw Blame History

Phase C+15-α — New Divergence Catalog (2026-05-14)

Surfaced by the schema-v1.1 wiring of handle.create/destroy, thread.create/exit, wait.begin in both engines.

Cold-vs-cold matched-prefix table (post-wiring)

canary_tid ours_tid matched first_divergence_at divergence kind
6 1 102,168 102,168 extra handle.destroy in ours (XamTaskCloseHandle refcount mismatch)
15 10 16 no divergence in 16 evts (canary 3.6M, ours stalls)
7 2 30 30 KeWaitForSingleObject native-obj handle (class E)
4 11 8 8 KeWaitForMultipleObjects native-obj handle (class E)
12 7 2 2 KeWaitForSingleObject native-obj handle (class E)
14 9 2 2 KeWaitForSingleObject native-obj handle (class E)

Main matched prefix dropped from 104,574 (C+13/C+14) to 102,168 — a regression of ~2,400 events. This is the expected outcome: invisible state divergences are now visible.

Cataloged divergences (priority-ordered for future iterate)

D-1 (HIGH) — main chain idx=102,168: extra handle.destroy on XamTaskCloseHandle

  • Chain: canary tid=6 ↔ ours tid=1.
  • Event:
    • ours: handle.destroy sid=b53a312c0ac30f49 then kernel.return XamTaskCloseHandle return=1
    • canary: kernel.return XamTaskCloseHandle return=1 (no handle.destroy)
  • Hypothesis: Ours's xam_task_close_handle (xam.rs:300-344) decrements refcount and destroys the handle when it reaches 0. Canary's XamTaskCloseHandle_entryNtCloseObjectTable::ReleaseHandle only destroys when refcount reaches 0; canary's spawned thread keeps an additional ref on the thread handle (object->Retain() in XThread::Create line 408 via RetainHandle()). Ours's refcount of 1 at this point is wrong — should be 2 (user ref + spawned-thread ref). Ours destroys prematurely.
  • Impact: leaks downstream divergences; spawned thread now has a dangling handle reference.
  • Fix scope: ~20 LOC in xam_task_schedule / ex_create_thread — add explicit state.handle_refcount[handle] += 1 after spawn for the XThread's own ref. Verify against canary's RetainHandle() semantics.

D-2 (HIGH) — chain tid=4 / canary, tid=11 / ours: ours stops at idx=8

  • Chain: canary tid=4 ↔ ours tid=11.
  • Event:
    • ours: kernel.return KeWaitForMultipleObjects status=0 at idx=8, then stream ends (9 total events).
    • canary: handle.create sid=bcaf14d76932b128 (Event) at idx=8, then handle.create sid=0760e947bacff199 at idx=9, then continues for 151,690 events.
  • Hypothesis (class E asymmetry): Canary's KeWaitForMultipleObjects_entry iterates the object pointer array and calls XObject::GetNativeObject<XObject>(kernel_state, object_ptr, -1, true) for each — when the object has not yet been wrapped in an XObject*, this CREATES a new XObject (and thus a new handle). Ours's do_wait_multiple uses resolve_pseudo_handle which does NOT create a new XObject — it looks up the existing handle. The "handle for the native dispatcher object" is an engine-architectural difference: canary lazily wraps, ours pre-registers.
  • Impact: every KeWait that takes object pointers (not handles) creates N extra handle.create events on the canary side. Ours emits none.
  • Fix scope: this is class E (intentional asymmetry). Recommended action: add Ke{Wait,Set,Reset,...}*Object* exports that take object pointers to a diff-tool suppress-handle-create-side-effect list, OR have ours emit a synthetic handle.create when resolve_pseudo_handle first encounters a new pointer. Latter aligns canary's view better. ~30-50 LOC.

D-3 (HIGH) — same class on chains 7→2 (idx=30), 12→7 (idx=2), 14→9 (idx=2)

Same root cause as D-2 — KeWaitForSingleObject with raw object pointer. Canary's xeKeWaitForSingleObject calls GetNativeObject which creates a handle for the dispatcher; ours's resolve_pseudo_handle does not.

Group all 4 chains under one fix in D-2.

D-4 (MEDIUM) — wait.begin SID 0000000000000000 on tid=10 of ours

  • Chain: canary tid=15 ↔ ours tid=10 (the only thread where prefix didn't regress — but ours stalls at idx=16).
  • Event at idx=2: both engines emit wait.begin but ours's handles_semantic_ids = ["0000000000000000"] while canary's is real.
  • Hypothesis: SID = 0 means lookup_handle_semantic_id returned 0 (handle not registered). The handle being waited on must have been created before the event_log SID registry was active (during boot / init), OR it's a pseudo-handle from resolve_pseudo_handle. Pseudo-handles aren't real handles in our model.
  • Fix scope: when lookup_handle_semantic_id(h) == 0, lazy-emit a synthetic handle.create for h (with a default object_type per state.objects[h]'s schema kind). Aligns with D-2 fix. ~10 LOC.

D-5 (LOW) — chains 7→2, 12→7, 14→9: ours streams truncated

  • Ours's tid=2/7/9/10 streams are 32/4/76/16 events long; canary's are 32/27,834/4,733,192/3,610,535. Ours's worker threads stall early.
  • Hypothesis: Downstream of D-2 / D-1 — once the main thread or peer workers diverge, downstream threads block on signals that never come.
  • Fix scope: deferred until D-1/D-2 land; likely no separate fix needed.

Acceptance gate status

  • Gate 1 (default-off digest): PASS — 3× reproducible at e1dfcb1559f987b35012a7f2dc6d93f5 (unchanged from C+13 baseline).
  • Gate 2 (cvar-on emit): PASS — both engines produce 14M+ / 121K events respectively; JSONL parses cleanly; all new kinds present.
  • Gate 3 (diff tool): PASS — diff tool consumes new kinds, produces 6-chain divergence report. Cross-engine SID skip-comparison documented in SKIP_PAYLOAD_FIELDS_BY_KIND.
  • Gate 4 (cold-vs-cold): PASS (with regression as designed) — main chain prefix 104,574 → 102,168 (-2,406 events). Divergence catalog produced.
  • Gate 5 (build clean): PASS — canary + ours both build.
  • Gate 6 (tests): PASS — 181 → 181 passing (no new tests added; existing unchanged).

Reading-error class avoided

Class #29 — per-host-thread tid_event_idx counter for shared synthetic tids: canary's pre-session thread_local uint64_t t_tid_event_idx was correct for guest-tid events (1 tid : 1 host_thread) but broken for boot-time emissions with tid=0 because boot init runs on multiple host threads. Symptom: the diff tool rejected the canary log with "events out of order at index 8". Fixed via tid-keyed global map (matches ours's design).