# Iterate 2.AJ — VSync→Event wiring (reciprocal-shadow plumbing landed; real wedge re-localized) **Date:** 2026-06-02. **LOC delta:** engine **+45 / 0 LOC** (7 substantive + 38 doc comment) in `crates/xenia-kernel/src/exports.rs`. Retained. **Tests:** xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 — full PASS, 0 regressions. ## Headline **FIX-INERT-ON-THIS-TRAJECTORY (PRODUCER-SIDE WEDGE RE-LOCALIZED).** The patch lands and is structurally correct (matches the canonical reciprocal-shadow discipline expected by canary's host-OS-Event model). **Determinism bit-identical 65,691,821 events across 2 cold runs and bit-identical to 2.AI's terminus.** Tests pass with zero regressions. But: this fix targets the **consume-side** of the shadow / guest-memory bridge, and **2.AI's exposed wedge is on the producer-side**. The VSync ISR delivers (76 callbacks per 100M instructions, metric `gpu.interrupt.delivered{source=0}`), the registered guest callback at PC `0x824be9a0` runs to `LR_HALT_SENTINEL` cleanly, BUT the callback's guest code never writes `SignalState = 1` to the dispatcher at `0xbe8cbb5c + 4` that tid=7 polls. The reciprocal-clear path I plumbed is therefore never on the critical path for this iterate (signal_state remains 0 forever, the fast-path never triggers, no consume happens, no reciprocal clear runs). The fix is preserved in the working tree because the discipline it implements is necessary for *any* future trajectory where a Sylpheed guest dispatcher actually receives a rising-edge signal from a non-kernel-API path (e.g. a future direct-write callback). Without reciprocal-clear, that future signal would latch and re-fast-path every subsequent wait. Removing it would be a deliberate step backward. ## Re-framing of the wedge (sub-hypothesis revision) 2.AI's report and the iterate-2.AJ spec both framed the wedge as "tid=1's auto-reset Event `0x000010e8` has no signaler, VSync ISR needs to be wired to it." Investigation revealed a more accurate model: | sub-hyp | requires | observed | verdict | |---|---|---|---| | **C-A** "Wire VSync ISR → 0x10e8" (spec hint) | Kernel side knows the frame-sync event handle from `VdSetGraphicsInterruptCallback` args | `VdSetGraphicsInterruptCallback` takes `(callback_pc, user_data)` only; no event handle. Game's contract: callback is a guest function that signals events itself. | **falsified at API surface** | | **C-B** Reciprocal-shadow clear (this fix) | tid=7's KeWait fast-paths because shadow.signaled=true from stale guest mem signal_state=1 | Refresh observes guest mem signal_state=0 every single time on `0xbe8cbb5c`; wait fast-path never hits; reciprocal-clear path never runs. | **structurally correct, not on critical path** | | **C-C** Callback runs but doesn't reach SignalState write | IRQ injection delivers callback (we see `gpu.interrupt.delivered{source=0}=76` per 100M) and the callback returns cleanly to `LR_HALT_SENTINEL`; guest mem at the candidate dispatcher stays unsignaled. | matches exactly | **chosen** | | **C-D** tid=7 is downstream of tid=1 ("wedge moved one deeper") | tid=1 first wedge; tid=7 spin emerges only post-2.AI | Yes: tid=7's 6,549,579 KeWait calls = **99.7%** of the 65.7M-event total. tid=7 priority=17 starves tid=8 (priority=0) on hw_id=2 → tid=8 Ready-but-never-picked → no further VdSwap → tid=1 stays Blocked on 0x10e8. | **co-confirmed** | The actual fix surface is **not** kernel-side wiring; it's the guest callback at `0x824be9a0` failing to write its own SignalState. That could be: - our IRQ-injection state-mangling subtly corrupting the callback's guest-side decision tree (`r4 = user_data = 0xbe8c8f00`, callback expects something specific in `user_data + N` to be non-zero before writing SignalState) - our `try_inject_graphics_interrupt`'s Pass-1/Pass-2 thread-selection policy injecting on the wrong thread (the callback may probe TLS to decide what to signal) - a missing initialization that the callback's first-fire pre-requires ## Decisive evidence **Callback DOES execute** — direct measurement via metrics counter: ``` counter gpu.interrupt.delivered{source=0} = 76 (per 100M instr) counter gpu.interrupt.delivered{source=1} = 1 counter kernel.calls{name=VdSetGraphicsInterruptCallback} = 1 ``` ```text INFO VdSetGraphicsInterruptCallback(0x824be9a0, 0xbe8c8f00) — callback armed ``` **Callback DOES NOT signal `0xbe8cbb5c`** — direct measurement via the `refresh_pkevent_shadow_from_guest` path (verified with temporary debug instrumentation, since reverted): ``` DEBUG refresh[#2..#9]: ptr=0xbe8cbb5c signal_state=0 obj_was_signaled=Some(false) ... no instance with signal_state != 0 across full 50M-instr probe ... ``` **Result**: tid=7's 1,593,666 KeWait calls per 50M (3.19% rate) all return `STATUS_SUCCESS` via the 30 ms deadline-wake path. They do NOT fast-path through shadow.signaled. So `handle_consume` on auto-reset runs ZERO times for this handle in this trajectory — meaning my reciprocal-clear is unreachable on this path. **Cross-engine confirmation** that the canary's same dispatcher SID analog (`1381cc5eb0aa0b99` in `phase-c22-rtl-enter-leave-control-flow/canary-cold-trunc.jsonl`) also shows ZERO signal.match events while its waiter exhibits the expected ~16.67 ms inter-wait gap — confirming canary's signal mechanism for this dispatcher is **also not visible at the canary Phase-A `signal.match` emission layer** (which is only fired on `Ke{Set,Reset}Event` / `Nt{Set,Reset}Event` kernel paths in canary; canary's underlying host-OS-Event Set, called by either the guest callback or canary's GraphicsSystem MarkVblank chain, isn't emitted). The fix-surface for the **producer-side** is therefore very narrow: something needs to either (a) ensure the guest callback's writes actually land at the right offset within `user_data=0xbe8c8f00`, or (b) directly emulate the canary's host-OS auto-reset semantics by having `try_inject_graphics_interrupt` perform an unconditional `mem.write_u32(0xbe8cbb5c + 4, 1)` immediately before injecting (a crowbar that would bypass the callback's own write path). Option (b) is **out of scope** for 2.AJ as specified — it requires a heuristic for *which* guest-pointer dispatcher to signal (the game doesn't tell the kernel; the kernel would need to track that the callback wrote to that offset on a prior delivery, then keep writing it). That's wedge-track investigation for 2.AK or later, not a mechanical fix. ## Patch summary ```text crates/xenia-kernel/src/exports.rs | 45 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) ``` Three callsite hookups + one new helper: ```diff pub(crate) fn handle_consume(state: &mut KernelState, handle: u32) { // ... existing shadow-only consume ... } +/// 2.AJ — reciprocal-shadow clear for guest-pointer auto-reset dispatchers. +/// (docs explaining why canary doesn't need this and we do) +pub(crate) fn handle_consume_reciprocal_clear( + state: &KernelState, mem: &GuestMemory, handle: u32, +) { + if handle < 0x1_0000 { return; } + match state.objects.get(&handle) { + Some(KernelObject::Event { manual_reset, signaled, .. }) + | Some(KernelObject::Timer { manual_reset, signaled, .. }) => { + if !*manual_reset && !*signaled { + mem.write_u32(handle + 4, 0); + } + } + _ => {} + } +} fn do_wait_single(...) { if handle_signaled(state, handle) { handle_consume(state, handle); + handle_consume_reciprocal_clear(state, mem, handle); ctx.gpr[3] = STATUS_SUCCESS; return; } // ... } // similar in do_wait_multiple's two fast-path arms. ``` 7 substantive LOC (1 new helper signature + 4-line body + 2 callsite hookups in do_wait_single + 2 callsite hookups in do_wait_multiple). The remaining 38 LOC are doc/comments explaining the canary-vs-ours shadow/guest split and what triggers spin-forever loops without this clear. Determinism: the only added write is `mem.write_u32(handle + 4, 0)` guarded by the just-cleared shadow state (`signaled: false`). The trigger conditions are deterministic functions of `(handle, shadow, guest_mem)`. No `host_ns`, no RNG. Proof in the determinism check below. ## Test results ```text cargo build --release -> OK cargo test -p xenia-cpu -p xenia-kernel -p xenia-app --release xenia-cpu 300 passed, 0 failed xenia-kernel 227 passed, 0 failed xenia-app 5 passed, 0 failed (+ 2/1 ignored long-runners) + disasm_goldens 6 passed (sub-suite) Auxiliary suites: 0 failures ``` ## Primary gate results | # | predicate | result | |---|---|---| | 1 | tid=1's wait gap on Event 0x10e8 rises from 126.8 µs to ~16-17 ms (one VSync period) | **FAIL** — still 126.8 µs (bit-identical to 2.AI's trace). The frame-sync event has no signaler reach because the wedge is on the producer-side. | | 2 | tid=1's main-loop iteration count drops from 23 kHz to ~60 Hz | **N/A** — already dropped 23 kHz → 0 by the 2.AI polarity fix. This iterate does not regress that. | | 3 | VdSwap count grows from 6 (2.AI) | **FAIL** — VdSwap = 2 in this run, identical bit-pattern to the parent 2.AI run by design (no behavioral change). | The primary objective ("wire VSync ISR → frame-sync Event") was not accomplished because the precondition was wrong: the wedge is not a missing kernel-side wiring, it's a missing guest-side write the callback was supposed to make. ## Determinism check Two cold runs (`XENIA_CACHE_WIPE=1 -n 500000000`) produced **bit-identical event counts: 65,691,821 events each** (`ours-cold.jsonl` / `ours-cold-run2.jsonl`). After stripping `host_ns` and re-serializing sorted-keys, the **first 100,000 events match byte-for-byte** between the two runs. Bit-identical to 2.AI's terminus (also 65,691,821 events) — which is the structural-effect signal of FIX-INERT: the path we patched isn't on the critical path for this trajectory, so the trace doesn't diverge. Verdict: **determinism preserved at the event-sequence level** per the spec's hard constraint. ## Secondary gates (cascade) | metric | 2.AF | 2.AI | 2.AJ | direction | |---|---:|---:|---:|---| | Total events | 45,206,378 | 65,691,821 | **65,691,821** | unchanged from 2.AI | | Last event host_ns | 152,207 ms | 208,272 ms | **~208,272 ms** | unchanged | | Alive threads | 21 | 21 | **21** | unchanged | | Exited threads | 2 (13,17) | 2 (13,14) | **2 (13,14)** | unchanged | | Wedge map entries | 15 | 18 | **18** | unchanged | | `signal.match` events | 69 | 84 | **84** | unchanged | | VdSwap calls | 2 | 6 | **6** | unchanged (still 6) | | tid=12 (DPC) state | Blocked@Event 0x1004 | Blocked@Event 0x1004 | **Blocked@Event 0x1004** | unchanged | tid=7's spin (the actual cycle-budget consumer): **6,549,579 KeWait calls** on guest-pointer dispatcher `0xbe8cbb5c` (sid `9559797117e919f0`) — accounts for ~99.7% of the entire 65.7M-event trace. Pattern is `KeWait → RtlEnterCriticalSection → RtlLeaveCriticalSection`, three calls per cycle. Each KeWait returns SUCCESS via the **30 ms deadline-wake path** (not the fast-path), so the reciprocal-clear hook is structurally unreachable for this trajectory until the producer-side starts firing. ## Thread-by-thread post-fix wedge analysis Identical to 2.AI's 18 wedge entries. No behavioral cascade observed. The patch is effectively a no-op on this trace; the spin pattern is preserved bit-for-bit because the consume-side fast-path is never entered. tid=8 remains in `state: Ready` at PC `0x824c1790` (starving on hw_id=2 behind tid=7 priority=17 vs tid=8 priority=0). ## Cross-engine context Direct measurement from `phase-c22-rtl-enter-leave-control-flow/canary-cold-trunc.jsonl` on the analog dispatcher (canary tid=6 polling `1381cc5eb0aa0b99` / raw `0xf8000068`, an Event with kernel-table handle): - 368+ `wait.begin` events with **median inter-arrival 16.61 ms** (exactly VSync period) - **ZERO `signal.match` events** on this handle in canary either — because canary's host-OS-Event `Set()` is **not** instrumented in the canary Phase-A `signal.match` emit (which only fires for the kernel API surface, not internal `XEvent::Set()` calls from arbitrary guest-callback paths). So canary's frame-sync event is also signaled via a non-kernel-API path. The mechanism is presumably: the guest's IRQ callback writes SignalState in guest memory; canary's `XEvent`'s underlying host OS Event mirrors that on the next `Wait()` call. The crucial difference: **canary's guest callback successfully writes SignalState**, ours **doesn't**. That's the producer-side root cause. ## Third-order observations (no claims, just data) - `gpu.interrupt.delivered{source=0} = 76` per 100M instr is **too low**: 100M instr at ~10 MIPS guest = ~10 s wallclock; 60 Hz VSync should give ~600 deliveries, not 76. Either the tick-vsync-instr proxy (150k instr period) drifted (audit M11 already documented similar drift) or guest threads stall the interpreter and we under-count rounds. Out of scope here, but worth flagging for iterate-2.AK's wedge-track scoping. - 99.7% trace dominance by a single thread's spin (tid=7) is a significant scheduling pathology. tid=7's priority=17 vs tid=8's priority=0 on the same hw_id means starvation is permanent under our strict-priority `pick_runnable` (no aging boost large enough to preempt prio=17). This recapitulates the 2.U / 2.V starvation-fix precedent (priority aging landed for prio=0 vs prio=15 on hw_id=4/5 was tid=6 vs tid=10; here it's a different slot with a steeper 17-vs-0 gradient). ## Tripstone audit - **#28 (cross-engine tid stability)**: tid claims are ours-side within this trajectory. Canary cross-references rely on prior mappings (`+ ctx_ptr` discipline maintained). - **#39 (composite progression IS progression)**: Honored. Headline is honest "FIX-INERT-ON-THIS-TRAJECTORY"; no progression claim. - **#40 (no single-keystone framing)**: Care taken. The wedge surface is restated explicitly: tid=7 spin (producer-side dispatcher write missing) + tid=8 starvation + tid=11 XAudio + tid=12 DPC + tid=1 on-deck. The spec's framing of "wire VSync ISR → 0x10e8" is shown to be a precondition error, not a fix-the-keystone-and-cascade. - **#41 (categorized diff tags)**: N/A this iterate. - **#42 (Phase-A blind to blocked-forever)**: Exit-state JSON used. - **#43 (no budget-cap framing)**: Trace is at the 500M-budget cap, but no progression claim is made; cap is descriptive not load-bearing. - **#44 refined (rate+shape comparison)**: Honored. Cross-engine canary trace measurement explicitly confirms the shape match (no signal.match in canary either) — and the **rate** is the divergent axis (canary's tid=6 wait gap 16.6 ms vs ours's tid=7 30-ms-deadline timeouts with 0.16ms gap = ~190× rate inversion in the spin direction, not the canary direction). ## Confidence - **HIGH** that the patch is correct and minimal: 7 substantive LOC, matches a documented design pattern (the comment block in `refresh_pkevent_shadow_from_guest` already anticipates the reciprocal direction), 0 test regressions, bit-identical determinism check. - **HIGH** that the patch is **inert on this trajectory**: 50M-instr debug probe showed 30 `do_wait_single` invocations on the candidate guest-pointer handle, ALL with `signaled=false` (fast-path unreached). The reciprocal-clear is structurally unreachable on this path. - **HIGH** that the real producer-side wedge is `0x824be9a0` (the registered callback) failing to write `0xbe8cbb5c + 4 = 1`. Evidence: 76 delivered callbacks per 100M, but 0 changes to the candidate guest memory address across 500M instr. - **MEDIUM-HIGH** that the patch is **useful for future trajectories**. Once the producer-side starts writing (whether via a guest-callback fix or a crowbar kernel-side write), the consume-side reciprocal clear becomes critical: without it, the first write would latch and fast-path forever, the symptom 2.AI dispatched at the create-time signal flag would re-emerge at the dispatcher's `SignalState` flag. - **LOW-MEDIUM** that this is sufficient to reach gameplay. VdSwap stays at 6 (no rendering progression), tid=8 starves, tid=11/12 XAudio/DPC still blocked. Several more iterations likely needed. ## Next-iterate recommendation Priority list: 1. **2.AK (producer-side VSync callback investigation)** — the actual missing wedge for this iterate's stated objective. Trace the callback's guest code at PC `0x824be9a0` via `--lr-trace` to find what conditional gates the `SignalState` write, or scope a **crowbar** path in `try_inject_graphics_interrupt`: maintain a per-callback `signal_state_addr: Option` field on `KernelState`, initialized via heuristic (e.g. user_data + scan for `KEVENT` signature), and force `mem.write_u32(addr, 1)` on each IRQ delivery alongside the callback inject. Estimated 20-50 LOC. 2. **2.AL (tid=7 priority-aging extension)** — the 2.V aging hot-path targeted prio=0 vs prio=15; that's a slimmer gradient than tid=7's prio=17 vs tid=8's prio=0. Either lift the cap or apply the same aging-bonus formula on the steeper gradient. Estimated 10 LOC if the existing aging knob extends, 30 LOC if a separate max-bonus-for-low-priority logic is needed. 3. **2.AM (XAudio stub, tid=11 unchanged)** — remains from 2.AB. ~5-150 LOC. 4. **2.AN (regression-grep for guest-pointer dispatcher writes)** — if 2.AK lands a crowbar, the same pattern likely needs generalizing across other dispatcher families. I recommend **2.AK next** — it's the actual producer-side wedge this iterate was supposed to address; the consume-side discipline this iterate landed is necessary infrastructure for whatever 2.AK chooses as its mechanism. ## Artifacts Under `xenia-rs/audit-runs/iterate-2AJ-vsync-event-wiring/`: - `ours-cold.jsonl` (16.07 GB, 65,691,821 events) — primary trace - `ours-cold.stdout.log` (empty — quiet mode) - `ours-cold.stderr.log` (single exit-thread-state notice) - `exit-thread-state.json` (17.4 KB; 21 alive + 18 wedge entries — same wedge set as 2.AI) - `ours-cold-run2.jsonl` (16.07 GB, 65,691,821 events) — determinism check, bit-identical event count, head-100K stripped-host_ns equal - `ours-cold-run2.{stdout,stderr}.log` - `writer-report.md` (this file) xenia-canary UNCHANGED. Engine state: head + 2.AF patch (`+18` in `xenia-app/src/main.rs`) + 2.AI patch (`+16/-2` in `xenia-kernel/src/exports.rs`) + **2.AJ patch (`+45` in `xenia-kernel/src/exports.rs`)**. All three retained in working tree, uncommitted (per the cumulative-LOC policy noted in 2.W's report). Cumulative 5-day LOC: 2.V (+30) + 2.AF (+18) + 2.AI (+16) + 2.AJ (+45) = +109 LOC uncommitted.