Files
xenia-rs/audit-runs/iterate-2AJ-vsync-event-wiring/writer-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

18 KiB
Raw Blame History

Iterate 2.AJ — VSync→Event wiring (reciprocal-shadow plumbing landed; real wedge re-localized)

Date: 2026-06-02. LOC delta: engine +45 / 0 LOC (7 substantive

  • 38 doc comment) in crates/xenia-kernel/src/exports.rs. Retained. Tests: xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 — full PASS, 0 regressions.

Headline

FIX-INERT-ON-THIS-TRAJECTORY (PRODUCER-SIDE WEDGE RE-LOCALIZED).

The patch lands and is structurally correct (matches the canonical reciprocal-shadow discipline expected by canary's host-OS-Event model). Determinism bit-identical 65,691,821 events across 2 cold runs and bit-identical to 2.AI's terminus. Tests pass with zero regressions.

But: this fix targets the consume-side of the shadow / guest-memory bridge, and 2.AI's exposed wedge is on the producer-side. The VSync ISR delivers (76 callbacks per 100M instructions, metric gpu.interrupt.delivered{source=0}), the registered guest callback at PC 0x824be9a0 runs to LR_HALT_SENTINEL cleanly, BUT the callback's guest code never writes SignalState = 1 to the dispatcher at 0xbe8cbb5c + 4 that tid=7 polls. The reciprocal-clear path I plumbed is therefore never on the critical path for this iterate (signal_state remains 0 forever, the fast-path never triggers, no consume happens, no reciprocal clear runs).

The fix is preserved in the working tree because the discipline it implements is necessary for any future trajectory where a Sylpheed guest dispatcher actually receives a rising-edge signal from a non-kernel-API path (e.g. a future direct-write callback). Without reciprocal-clear, that future signal would latch and re-fast-path every subsequent wait. Removing it would be a deliberate step backward.

Re-framing of the wedge (sub-hypothesis revision)

2.AI's report and the iterate-2.AJ spec both framed the wedge as "tid=1's auto-reset Event 0x000010e8 has no signaler, VSync ISR needs to be wired to it." Investigation revealed a more accurate model:

sub-hyp requires observed verdict
C-A "Wire VSync ISR → 0x10e8" (spec hint) Kernel side knows the frame-sync event handle from VdSetGraphicsInterruptCallback args VdSetGraphicsInterruptCallback takes (callback_pc, user_data) only; no event handle. Game's contract: callback is a guest function that signals events itself. falsified at API surface
C-B Reciprocal-shadow clear (this fix) tid=7's KeWait fast-paths because shadow.signaled=true from stale guest mem signal_state=1 Refresh observes guest mem signal_state=0 every single time on 0xbe8cbb5c; wait fast-path never hits; reciprocal-clear path never runs. structurally correct, not on critical path
C-C Callback runs but doesn't reach SignalState write IRQ injection delivers callback (we see gpu.interrupt.delivered{source=0}=76 per 100M) and the callback returns cleanly to LR_HALT_SENTINEL; guest mem at the candidate dispatcher stays unsignaled. matches exactly chosen
C-D tid=7 is downstream of tid=1 ("wedge moved one deeper") tid=1 first wedge; tid=7 spin emerges only post-2.AI Yes: tid=7's 6,549,579 KeWait calls = 99.7% of the 65.7M-event total. tid=7 priority=17 starves tid=8 (priority=0) on hw_id=2 → tid=8 Ready-but-never-picked → no further VdSwap → tid=1 stays Blocked on 0x10e8. co-confirmed

The actual fix surface is not kernel-side wiring; it's the guest callback at 0x824be9a0 failing to write its own SignalState. That could be:

  • our IRQ-injection state-mangling subtly corrupting the callback's guest-side decision tree (r4 = user_data = 0xbe8c8f00, callback expects something specific in user_data + N to be non-zero before writing SignalState)
  • our try_inject_graphics_interrupt's Pass-1/Pass-2 thread-selection policy injecting on the wrong thread (the callback may probe TLS to decide what to signal)
  • a missing initialization that the callback's first-fire pre-requires

Decisive evidence

Callback DOES execute — direct measurement via metrics counter:

counter   gpu.interrupt.delivered{source=0} = 76      (per 100M instr)
counter   gpu.interrupt.delivered{source=1} = 1
counter   kernel.calls{name=VdSetGraphicsInterruptCallback} = 1
INFO VdSetGraphicsInterruptCallback(0x824be9a0, 0xbe8c8f00) — callback armed

Callback DOES NOT signal 0xbe8cbb5c — direct measurement via the refresh_pkevent_shadow_from_guest path (verified with temporary debug instrumentation, since reverted):

DEBUG refresh[#2..#9]: ptr=0xbe8cbb5c signal_state=0 obj_was_signaled=Some(false)
... no instance with signal_state != 0 across full 50M-instr probe ...

Result: tid=7's 1,593,666 KeWait calls per 50M (3.19% rate) all return STATUS_SUCCESS via the 30 ms deadline-wake path. They do NOT fast-path through shadow.signaled. So handle_consume on auto-reset runs ZERO times for this handle in this trajectory — meaning my reciprocal-clear is unreachable on this path.

Cross-engine confirmation that the canary's same dispatcher SID analog (1381cc5eb0aa0b99 in phase-c22-rtl-enter-leave-control-flow/canary-cold-trunc.jsonl) also shows ZERO signal.match events while its waiter exhibits the expected ~16.67 ms inter-wait gap — confirming canary's signal mechanism for this dispatcher is also not visible at the canary Phase-A signal.match emission layer (which is only fired on Ke{Set,Reset}Event / Nt{Set,Reset}Event kernel paths in canary; canary's underlying host-OS-Event Set, called by either the guest callback or canary's GraphicsSystem MarkVblank chain, isn't emitted).

The fix-surface for the producer-side is therefore very narrow: something needs to either (a) ensure the guest callback's writes actually land at the right offset within user_data=0xbe8c8f00, or (b) directly emulate the canary's host-OS auto-reset semantics by having try_inject_graphics_interrupt perform an unconditional mem.write_u32(0xbe8cbb5c + 4, 1) immediately before injecting (a crowbar that would bypass the callback's own write path).

Option (b) is out of scope for 2.AJ as specified — it requires a heuristic for which guest-pointer dispatcher to signal (the game doesn't tell the kernel; the kernel would need to track that the callback wrote to that offset on a prior delivery, then keep writing it). That's wedge-track investigation for 2.AK or later, not a mechanical fix.

Patch summary

crates/xenia-kernel/src/exports.rs | 45 ++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)

Three callsite hookups + one new helper:

 pub(crate) fn handle_consume(state: &mut KernelState, handle: u32) {
     // ... existing shadow-only consume ...
 }

+/// 2.AJ — reciprocal-shadow clear for guest-pointer auto-reset dispatchers.
+/// (docs explaining why canary doesn't need this and we do)
+pub(crate) fn handle_consume_reciprocal_clear(
+    state: &KernelState, mem: &GuestMemory, handle: u32,
+) {
+    if handle < 0x1_0000 { return; }
+    match state.objects.get(&handle) {
+        Some(KernelObject::Event { manual_reset, signaled, .. })
+        | Some(KernelObject::Timer { manual_reset, signaled, .. }) => {
+            if !*manual_reset && !*signaled {
+                mem.write_u32(handle + 4, 0);
+            }
+        }
+        _ => {}
+    }
+}

 fn do_wait_single(...) {
     if handle_signaled(state, handle) {
         handle_consume(state, handle);
+        handle_consume_reciprocal_clear(state, mem, handle);
         ctx.gpr[3] = STATUS_SUCCESS;
         return;
     }
     // ...
 }

 // similar in do_wait_multiple's two fast-path arms.

7 substantive LOC (1 new helper signature + 4-line body + 2 callsite hookups in do_wait_single + 2 callsite hookups in do_wait_multiple). The remaining 38 LOC are doc/comments explaining the canary-vs-ours shadow/guest split and what triggers spin-forever loops without this clear.

Determinism: the only added write is mem.write_u32(handle + 4, 0) guarded by the just-cleared shadow state (signaled: false). The trigger conditions are deterministic functions of (handle, shadow, guest_mem). No host_ns, no RNG. Proof in the determinism check below.

Test results

cargo build --release  -> OK
cargo test -p xenia-cpu -p xenia-kernel -p xenia-app --release
  xenia-cpu    300 passed, 0 failed
  xenia-kernel 227 passed, 0 failed
  xenia-app      5 passed, 0 failed (+ 2/1 ignored long-runners)
  + disasm_goldens 6 passed (sub-suite)
  Auxiliary suites: 0 failures

Primary gate results

# predicate result
1 tid=1's wait gap on Event 0x10e8 rises from 126.8 µs to ~16-17 ms (one VSync period) FAIL — still 126.8 µs (bit-identical to 2.AI's trace). The frame-sync event has no signaler reach because the wedge is on the producer-side.
2 tid=1's main-loop iteration count drops from 23 kHz to ~60 Hz N/A — already dropped 23 kHz → 0 by the 2.AI polarity fix. This iterate does not regress that.
3 VdSwap count grows from 6 (2.AI) FAIL — VdSwap = 2 in this run, identical bit-pattern to the parent 2.AI run by design (no behavioral change).

The primary objective ("wire VSync ISR → frame-sync Event") was not accomplished because the precondition was wrong: the wedge is not a missing kernel-side wiring, it's a missing guest-side write the callback was supposed to make.

Determinism check

Two cold runs (XENIA_CACHE_WIPE=1 -n 500000000) produced bit-identical event counts: 65,691,821 events each (ours-cold.jsonl / ours-cold-run2.jsonl).

After stripping host_ns and re-serializing sorted-keys, the first 100,000 events match byte-for-byte between the two runs.

Bit-identical to 2.AI's terminus (also 65,691,821 events) — which is the structural-effect signal of FIX-INERT: the path we patched isn't on the critical path for this trajectory, so the trace doesn't diverge.

Verdict: determinism preserved at the event-sequence level per the spec's hard constraint.

Secondary gates (cascade)

metric 2.AF 2.AI 2.AJ direction
Total events 45,206,378 65,691,821 65,691,821 unchanged from 2.AI
Last event host_ns 152,207 ms 208,272 ms ~208,272 ms unchanged
Alive threads 21 21 21 unchanged
Exited threads 2 (13,17) 2 (13,14) 2 (13,14) unchanged
Wedge map entries 15 18 18 unchanged
signal.match events 69 84 84 unchanged
VdSwap calls 2 6 6 unchanged (still 6)
tid=12 (DPC) state Blocked@Event 0x1004 Blocked@Event 0x1004 Blocked@Event 0x1004 unchanged

tid=7's spin (the actual cycle-budget consumer): 6,549,579 KeWait calls on guest-pointer dispatcher 0xbe8cbb5c (sid 9559797117e919f0) — accounts for ~99.7% of the entire 65.7M-event trace. Pattern is KeWait → RtlEnterCriticalSection → RtlLeaveCriticalSection, three calls per cycle. Each KeWait returns SUCCESS via the 30 ms deadline-wake path (not the fast-path), so the reciprocal-clear hook is structurally unreachable for this trajectory until the producer-side starts firing.

Thread-by-thread post-fix wedge analysis

Identical to 2.AI's 18 wedge entries. No behavioral cascade observed. The patch is effectively a no-op on this trace; the spin pattern is preserved bit-for-bit because the consume-side fast-path is never entered. tid=8 remains in state: Ready at PC 0x824c1790 (starving on hw_id=2 behind tid=7 priority=17 vs tid=8 priority=0).

Cross-engine context

Direct measurement from phase-c22-rtl-enter-leave-control-flow/canary-cold-trunc.jsonl on the analog dispatcher (canary tid=6 polling 1381cc5eb0aa0b99 / raw 0xf8000068, an Event with kernel-table handle):

  • 368+ wait.begin events with median inter-arrival 16.61 ms (exactly VSync period)
  • ZERO signal.match events on this handle in canary either — because canary's host-OS-Event Set() is not instrumented in the canary Phase-A signal.match emit (which only fires for the kernel API surface, not internal XEvent::Set() calls from arbitrary guest-callback paths).

So canary's frame-sync event is also signaled via a non-kernel-API path. The mechanism is presumably: the guest's IRQ callback writes SignalState in guest memory; canary's XEvent's underlying host OS Event mirrors that on the next Wait() call. The crucial difference: canary's guest callback successfully writes SignalState, ours doesn't. That's the producer-side root cause.

Third-order observations (no claims, just data)

  • gpu.interrupt.delivered{source=0} = 76 per 100M instr is too low: 100M instr at ~10 MIPS guest = ~10 s wallclock; 60 Hz VSync should give ~600 deliveries, not 76. Either the tick-vsync-instr proxy (150k instr period) drifted (audit M11 already documented similar drift) or guest threads stall the interpreter and we under-count rounds. Out of scope here, but worth flagging for iterate-2.AK's wedge-track scoping.
  • 99.7% trace dominance by a single thread's spin (tid=7) is a significant scheduling pathology. tid=7's priority=17 vs tid=8's priority=0 on the same hw_id means starvation is permanent under our strict-priority pick_runnable (no aging boost large enough to preempt prio=17). This recapitulates the 2.U / 2.V starvation-fix precedent (priority aging landed for prio=0 vs prio=15 on hw_id=4/5 was tid=6 vs tid=10; here it's a different slot with a steeper 17-vs-0 gradient).

Tripstone audit

  • #28 (cross-engine tid stability): tid claims are ours-side within this trajectory. Canary cross-references rely on prior mappings (+ ctx_ptr discipline maintained).
  • #39 (composite progression IS progression): Honored. Headline is honest "FIX-INERT-ON-THIS-TRAJECTORY"; no progression claim.
  • #40 (no single-keystone framing): Care taken. The wedge surface is restated explicitly: tid=7 spin (producer-side dispatcher write missing) + tid=8 starvation + tid=11 XAudio + tid=12 DPC + tid=1 on-deck. The spec's framing of "wire VSync ISR → 0x10e8" is shown to be a precondition error, not a fix-the-keystone-and-cascade.
  • #41 (categorized diff tags): N/A this iterate.
  • #42 (Phase-A blind to blocked-forever): Exit-state JSON used.
  • #43 (no budget-cap framing): Trace is at the 500M-budget cap, but no progression claim is made; cap is descriptive not load-bearing.
  • #44 refined (rate+shape comparison): Honored. Cross-engine canary trace measurement explicitly confirms the shape match (no signal.match in canary either) — and the rate is the divergent axis (canary's tid=6 wait gap 16.6 ms vs ours's tid=7 30-ms-deadline timeouts with 0.16ms gap = ~190× rate inversion in the spin direction, not the canary direction).

Confidence

  • HIGH that the patch is correct and minimal: 7 substantive LOC, matches a documented design pattern (the comment block in refresh_pkevent_shadow_from_guest already anticipates the reciprocal direction), 0 test regressions, bit-identical determinism check.
  • HIGH that the patch is inert on this trajectory: 50M-instr debug probe showed 30 do_wait_single invocations on the candidate guest-pointer handle, ALL with signaled=false (fast-path unreached). The reciprocal-clear is structurally unreachable on this path.
  • HIGH that the real producer-side wedge is 0x824be9a0 (the registered callback) failing to write 0xbe8cbb5c + 4 = 1. Evidence: 76 delivered callbacks per 100M, but 0 changes to the candidate guest memory address across 500M instr.
  • MEDIUM-HIGH that the patch is useful for future trajectories. Once the producer-side starts writing (whether via a guest-callback fix or a crowbar kernel-side write), the consume-side reciprocal clear becomes critical: without it, the first write would latch and fast-path forever, the symptom 2.AI dispatched at the create-time signal flag would re-emerge at the dispatcher's SignalState flag.
  • LOW-MEDIUM that this is sufficient to reach gameplay. VdSwap stays at 6 (no rendering progression), tid=8 starves, tid=11/12 XAudio/DPC still blocked. Several more iterations likely needed.

Next-iterate recommendation

Priority list:

  1. 2.AK (producer-side VSync callback investigation) — the actual missing wedge for this iterate's stated objective. Trace the callback's guest code at PC 0x824be9a0 via --lr-trace to find what conditional gates the SignalState write, or scope a crowbar path in try_inject_graphics_interrupt: maintain a per-callback signal_state_addr: Option<u32> field on KernelState, initialized via heuristic (e.g. user_data + scan for KEVENT signature), and force mem.write_u32(addr, 1) on each IRQ delivery alongside the callback inject. Estimated 20-50 LOC.
  2. 2.AL (tid=7 priority-aging extension) — the 2.V aging hot-path targeted prio=0 vs prio=15; that's a slimmer gradient than tid=7's prio=17 vs tid=8's prio=0. Either lift the cap or apply the same aging-bonus formula on the steeper gradient. Estimated 10 LOC if the existing aging knob extends, 30 LOC if a separate max-bonus-for-low-priority logic is needed.
  3. 2.AM (XAudio stub, tid=11 unchanged) — remains from 2.AB. ~5-150 LOC.
  4. 2.AN (regression-grep for guest-pointer dispatcher writes) — if 2.AK lands a crowbar, the same pattern likely needs generalizing across other dispatcher families.

I recommend 2.AK next — it's the actual producer-side wedge this iterate was supposed to address; the consume-side discipline this iterate landed is necessary infrastructure for whatever 2.AK chooses as its mechanism.

Artifacts

Under xenia-rs/audit-runs/iterate-2AJ-vsync-event-wiring/:

  • ours-cold.jsonl (16.07 GB, 65,691,821 events) — primary trace
  • ours-cold.stdout.log (empty — quiet mode)
  • ours-cold.stderr.log (single exit-thread-state notice)
  • exit-thread-state.json (17.4 KB; 21 alive + 18 wedge entries — same wedge set as 2.AI)
  • ours-cold-run2.jsonl (16.07 GB, 65,691,821 events) — determinism check, bit-identical event count, head-100K stripped-host_ns equal
  • ours-cold-run2.{stdout,stderr}.log
  • writer-report.md (this file)

xenia-canary UNCHANGED.

Engine state: head + 2.AF patch (+18 in xenia-app/src/main.rs)

  • 2.AI patch (+16/-2 in xenia-kernel/src/exports.rs) + 2.AJ patch (+45 in xenia-kernel/src/exports.rs). All three retained in working tree, uncommitted (per the cumulative-LOC policy noted in 2.W's report). Cumulative 5-day LOC: 2.V (+30) + 2.AF (+18) + 2.AI (+16) + 2.AJ (+45) = +109 LOC uncommitted.