Files
xenia-rs/audit-runs/iterate-2AJ-vsync-event-wiring/writer-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

383 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iterate 2.AJ — VSync→Event wiring (reciprocal-shadow plumbing landed; real wedge re-localized)
**Date:** 2026-06-02. **LOC delta:** engine **+45 / 0 LOC** (7 substantive
+ 38 doc comment) in `crates/xenia-kernel/src/exports.rs`. Retained.
**Tests:** xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 — full PASS,
0 regressions.
## Headline
**FIX-INERT-ON-THIS-TRAJECTORY (PRODUCER-SIDE WEDGE RE-LOCALIZED).**
The patch lands and is structurally correct (matches the canonical
reciprocal-shadow discipline expected by canary's host-OS-Event model).
**Determinism bit-identical 65,691,821 events across 2 cold runs and
bit-identical to 2.AI's terminus.** Tests pass with zero regressions.
But: this fix targets the **consume-side** of the shadow / guest-memory
bridge, and **2.AI's exposed wedge is on the producer-side**. The
VSync ISR delivers (76 callbacks per 100M instructions, metric
`gpu.interrupt.delivered{source=0}`), the registered guest callback at
PC `0x824be9a0` runs to `LR_HALT_SENTINEL` cleanly, BUT the callback's
guest code never writes `SignalState = 1` to the dispatcher at
`0xbe8cbb5c + 4` that tid=7 polls. The reciprocal-clear path I plumbed
is therefore never on the critical path for this iterate (signal_state
remains 0 forever, the fast-path never triggers, no consume happens,
no reciprocal clear runs).
The fix is preserved in the working tree because the discipline it
implements is necessary for *any* future trajectory where a Sylpheed
guest dispatcher actually receives a rising-edge signal from a
non-kernel-API path (e.g. a future direct-write callback). Without
reciprocal-clear, that future signal would latch and re-fast-path every
subsequent wait. Removing it would be a deliberate step backward.
## Re-framing of the wedge (sub-hypothesis revision)
2.AI's report and the iterate-2.AJ spec both framed the wedge as
"tid=1's auto-reset Event `0x000010e8` has no signaler, VSync ISR needs
to be wired to it." Investigation revealed a more accurate model:
| sub-hyp | requires | observed | verdict |
|---|---|---|---|
| **C-A** "Wire VSync ISR → 0x10e8" (spec hint) | Kernel side knows the frame-sync event handle from `VdSetGraphicsInterruptCallback` args | `VdSetGraphicsInterruptCallback` takes `(callback_pc, user_data)` only; no event handle. Game's contract: callback is a guest function that signals events itself. | **falsified at API surface** |
| **C-B** Reciprocal-shadow clear (this fix) | tid=7's KeWait fast-paths because shadow.signaled=true from stale guest mem signal_state=1 | Refresh observes guest mem signal_state=0 every single time on `0xbe8cbb5c`; wait fast-path never hits; reciprocal-clear path never runs. | **structurally correct, not on critical path** |
| **C-C** Callback runs but doesn't reach SignalState write | IRQ injection delivers callback (we see `gpu.interrupt.delivered{source=0}=76` per 100M) and the callback returns cleanly to `LR_HALT_SENTINEL`; guest mem at the candidate dispatcher stays unsignaled. | matches exactly | **chosen** |
| **C-D** tid=7 is downstream of tid=1 ("wedge moved one deeper") | tid=1 first wedge; tid=7 spin emerges only post-2.AI | Yes: tid=7's 6,549,579 KeWait calls = **99.7%** of the 65.7M-event total. tid=7 priority=17 starves tid=8 (priority=0) on hw_id=2 → tid=8 Ready-but-never-picked → no further VdSwap → tid=1 stays Blocked on 0x10e8. | **co-confirmed** |
The actual fix surface is **not** kernel-side wiring; it's the guest
callback at `0x824be9a0` failing to write its own SignalState. That
could be:
- our IRQ-injection state-mangling subtly corrupting the callback's
guest-side decision tree (`r4 = user_data = 0xbe8c8f00`, callback
expects something specific in `user_data + N` to be non-zero before
writing SignalState)
- our `try_inject_graphics_interrupt`'s Pass-1/Pass-2 thread-selection
policy injecting on the wrong thread (the callback may probe TLS to
decide what to signal)
- a missing initialization that the callback's first-fire pre-requires
## Decisive evidence
**Callback DOES execute** — direct measurement via metrics counter:
```
counter gpu.interrupt.delivered{source=0} = 76 (per 100M instr)
counter gpu.interrupt.delivered{source=1} = 1
counter kernel.calls{name=VdSetGraphicsInterruptCallback} = 1
```
```text
INFO VdSetGraphicsInterruptCallback(0x824be9a0, 0xbe8c8f00) — callback armed
```
**Callback DOES NOT signal `0xbe8cbb5c`** — direct measurement via the
`refresh_pkevent_shadow_from_guest` path (verified with temporary debug
instrumentation, since reverted):
```
DEBUG refresh[#2..#9]: ptr=0xbe8cbb5c signal_state=0 obj_was_signaled=Some(false)
... no instance with signal_state != 0 across full 50M-instr probe ...
```
**Result**: tid=7's 1,593,666 KeWait calls per 50M (3.19% rate) all
return `STATUS_SUCCESS` via the 30 ms deadline-wake path. They do NOT
fast-path through shadow.signaled. So `handle_consume` on auto-reset
runs ZERO times for this handle in this trajectory — meaning my
reciprocal-clear is unreachable on this path.
**Cross-engine confirmation** that the canary's same dispatcher SID
analog (`1381cc5eb0aa0b99` in `phase-c22-rtl-enter-leave-control-flow/canary-cold-trunc.jsonl`)
also shows ZERO signal.match events while its waiter exhibits the
expected ~16.67 ms inter-wait gap — confirming canary's signal
mechanism for this dispatcher is **also not visible at the canary
Phase-A `signal.match` emission layer** (which is only fired on
`Ke{Set,Reset}Event` / `Nt{Set,Reset}Event` kernel paths in canary;
canary's underlying host-OS-Event Set, called by either the guest
callback or canary's GraphicsSystem MarkVblank chain, isn't emitted).
The fix-surface for the **producer-side** is therefore very narrow:
something needs to either (a) ensure the guest callback's writes
actually land at the right offset within `user_data=0xbe8c8f00`, or
(b) directly emulate the canary's host-OS auto-reset semantics by
having `try_inject_graphics_interrupt` perform an unconditional
`mem.write_u32(0xbe8cbb5c + 4, 1)` immediately before injecting (a
crowbar that would bypass the callback's own write path).
Option (b) is **out of scope** for 2.AJ as specified — it requires a
heuristic for *which* guest-pointer dispatcher to signal (the game
doesn't tell the kernel; the kernel would need to track that the
callback wrote to that offset on a prior delivery, then keep writing
it). That's wedge-track investigation for 2.AK or later, not a
mechanical fix.
## Patch summary
```text
crates/xenia-kernel/src/exports.rs | 45 ++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
```
Three callsite hookups + one new helper:
```diff
pub(crate) fn handle_consume(state: &mut KernelState, handle: u32) {
// ... existing shadow-only consume ...
}
+/// 2.AJ — reciprocal-shadow clear for guest-pointer auto-reset dispatchers.
+/// (docs explaining why canary doesn't need this and we do)
+pub(crate) fn handle_consume_reciprocal_clear(
+ state: &KernelState, mem: &GuestMemory, handle: u32,
+) {
+ if handle < 0x1_0000 { return; }
+ match state.objects.get(&handle) {
+ Some(KernelObject::Event { manual_reset, signaled, .. })
+ | Some(KernelObject::Timer { manual_reset, signaled, .. }) => {
+ if !*manual_reset && !*signaled {
+ mem.write_u32(handle + 4, 0);
+ }
+ }
+ _ => {}
+ }
+}
fn do_wait_single(...) {
if handle_signaled(state, handle) {
handle_consume(state, handle);
+ handle_consume_reciprocal_clear(state, mem, handle);
ctx.gpr[3] = STATUS_SUCCESS;
return;
}
// ...
}
// similar in do_wait_multiple's two fast-path arms.
```
7 substantive LOC (1 new helper signature + 4-line body + 2 callsite
hookups in do_wait_single + 2 callsite hookups in do_wait_multiple).
The remaining 38 LOC are doc/comments explaining the canary-vs-ours
shadow/guest split and what triggers spin-forever loops without this
clear.
Determinism: the only added write is `mem.write_u32(handle + 4, 0)`
guarded by the just-cleared shadow state (`signaled: false`). The
trigger conditions are deterministic functions of `(handle, shadow,
guest_mem)`. No `host_ns`, no RNG. Proof in the determinism check
below.
## Test results
```text
cargo build --release -> OK
cargo test -p xenia-cpu -p xenia-kernel -p xenia-app --release
xenia-cpu 300 passed, 0 failed
xenia-kernel 227 passed, 0 failed
xenia-app 5 passed, 0 failed (+ 2/1 ignored long-runners)
+ disasm_goldens 6 passed (sub-suite)
Auxiliary suites: 0 failures
```
## Primary gate results
| # | predicate | result |
|---|---|---|
| 1 | tid=1's wait gap on Event 0x10e8 rises from 126.8 µs to ~16-17 ms (one VSync period) | **FAIL** — still 126.8 µs (bit-identical to 2.AI's trace). The frame-sync event has no signaler reach because the wedge is on the producer-side. |
| 2 | tid=1's main-loop iteration count drops from 23 kHz to ~60 Hz | **N/A** — already dropped 23 kHz → 0 by the 2.AI polarity fix. This iterate does not regress that. |
| 3 | VdSwap count grows from 6 (2.AI) | **FAIL** — VdSwap = 2 in this run, identical bit-pattern to the parent 2.AI run by design (no behavioral change). |
The primary objective ("wire VSync ISR → frame-sync Event") was not
accomplished because the precondition was wrong: the wedge is not a
missing kernel-side wiring, it's a missing guest-side write the
callback was supposed to make.
## Determinism check
Two cold runs (`XENIA_CACHE_WIPE=1 -n 500000000`) produced
**bit-identical event counts: 65,691,821 events each**
(`ours-cold.jsonl` / `ours-cold-run2.jsonl`).
After stripping `host_ns` and re-serializing sorted-keys, the
**first 100,000 events match byte-for-byte** between the two runs.
Bit-identical to 2.AI's terminus (also 65,691,821 events) — which is
the structural-effect signal of FIX-INERT: the path we patched isn't
on the critical path for this trajectory, so the trace doesn't
diverge.
Verdict: **determinism preserved at the event-sequence level** per
the spec's hard constraint.
## Secondary gates (cascade)
| metric | 2.AF | 2.AI | 2.AJ | direction |
|---|---:|---:|---:|---|
| Total events | 45,206,378 | 65,691,821 | **65,691,821** | unchanged from 2.AI |
| Last event host_ns | 152,207 ms | 208,272 ms | **~208,272 ms** | unchanged |
| Alive threads | 21 | 21 | **21** | unchanged |
| Exited threads | 2 (13,17) | 2 (13,14) | **2 (13,14)** | unchanged |
| Wedge map entries | 15 | 18 | **18** | unchanged |
| `signal.match` events | 69 | 84 | **84** | unchanged |
| VdSwap calls | 2 | 6 | **6** | unchanged (still 6) |
| tid=12 (DPC) state | Blocked@Event 0x1004 | Blocked@Event 0x1004 | **Blocked@Event 0x1004** | unchanged |
tid=7's spin (the actual cycle-budget consumer): **6,549,579 KeWait
calls** on guest-pointer dispatcher `0xbe8cbb5c` (sid
`9559797117e919f0`) — accounts for ~99.7% of the entire 65.7M-event
trace. Pattern is `KeWait → RtlEnterCriticalSection →
RtlLeaveCriticalSection`, three calls per cycle. Each KeWait returns
SUCCESS via the **30 ms deadline-wake path** (not the fast-path), so
the reciprocal-clear hook is structurally unreachable for this
trajectory until the producer-side starts firing.
## Thread-by-thread post-fix wedge analysis
Identical to 2.AI's 18 wedge entries. No behavioral cascade observed.
The patch is effectively a no-op on this trace; the spin pattern is
preserved bit-for-bit because the consume-side fast-path is never
entered. tid=8 remains in `state: Ready` at PC `0x824c1790`
(starving on hw_id=2 behind tid=7 priority=17 vs tid=8 priority=0).
## Cross-engine context
Direct measurement from `phase-c22-rtl-enter-leave-control-flow/canary-cold-trunc.jsonl`
on the analog dispatcher (canary tid=6 polling `1381cc5eb0aa0b99` /
raw `0xf8000068`, an Event with kernel-table handle):
- 368+ `wait.begin` events with **median inter-arrival 16.61 ms**
(exactly VSync period)
- **ZERO `signal.match` events** on this handle in canary either —
because canary's host-OS-Event `Set()` is **not** instrumented in
the canary Phase-A `signal.match` emit (which only fires for the
kernel API surface, not internal `XEvent::Set()` calls from
arbitrary guest-callback paths).
So canary's frame-sync event is also signaled via a non-kernel-API
path. The mechanism is presumably: the guest's IRQ callback writes
SignalState in guest memory; canary's `XEvent`'s underlying host OS
Event mirrors that on the next `Wait()` call. The crucial difference:
**canary's guest callback successfully writes SignalState**, ours
**doesn't**. That's the producer-side root cause.
## Third-order observations (no claims, just data)
- `gpu.interrupt.delivered{source=0} = 76` per 100M instr is **too
low**: 100M instr at ~10 MIPS guest = ~10 s wallclock; 60 Hz VSync
should give ~600 deliveries, not 76. Either the tick-vsync-instr
proxy (150k instr period) drifted (audit M11 already documented
similar drift) or guest threads stall the interpreter and we
under-count rounds. Out of scope here, but worth flagging for
iterate-2.AK's wedge-track scoping.
- 99.7% trace dominance by a single thread's spin (tid=7) is a
significant scheduling pathology. tid=7's priority=17 vs tid=8's
priority=0 on the same hw_id means starvation is permanent under
our strict-priority `pick_runnable` (no aging boost large enough to
preempt prio=17). This recapitulates the 2.U / 2.V starvation-fix
precedent (priority aging landed for prio=0 vs prio=15 on hw_id=4/5
was tid=6 vs tid=10; here it's a different slot with a steeper
17-vs-0 gradient).
## Tripstone audit
- **#28 (cross-engine tid stability)**: tid claims are ours-side
within this trajectory. Canary cross-references rely on prior
mappings (`+ ctx_ptr` discipline maintained).
- **#39 (composite progression IS progression)**: Honored. Headline
is honest "FIX-INERT-ON-THIS-TRAJECTORY"; no progression claim.
- **#40 (no single-keystone framing)**: Care taken. The wedge surface
is restated explicitly: tid=7 spin (producer-side dispatcher write
missing) + tid=8 starvation + tid=11 XAudio + tid=12 DPC + tid=1
on-deck. The spec's framing of "wire VSync ISR → 0x10e8" is shown
to be a precondition error, not a fix-the-keystone-and-cascade.
- **#41 (categorized diff tags)**: N/A this iterate.
- **#42 (Phase-A blind to blocked-forever)**: Exit-state JSON used.
- **#43 (no budget-cap framing)**: Trace is at the 500M-budget cap,
but no progression claim is made; cap is descriptive not
load-bearing.
- **#44 refined (rate+shape comparison)**: Honored. Cross-engine
canary trace measurement explicitly confirms the shape match (no
signal.match in canary either) — and the **rate** is the divergent
axis (canary's tid=6 wait gap 16.6 ms vs ours's tid=7 30-ms-deadline
timeouts with 0.16ms gap = ~190× rate inversion in the spin
direction, not the canary direction).
## Confidence
- **HIGH** that the patch is correct and minimal: 7 substantive LOC,
matches a documented design pattern (the comment block in
`refresh_pkevent_shadow_from_guest` already anticipates the
reciprocal direction), 0 test regressions, bit-identical
determinism check.
- **HIGH** that the patch is **inert on this trajectory**: 50M-instr
debug probe showed 30 `do_wait_single` invocations on the candidate
guest-pointer handle, ALL with `signaled=false` (fast-path
unreached). The reciprocal-clear is structurally unreachable on
this path.
- **HIGH** that the real producer-side wedge is `0x824be9a0` (the
registered callback) failing to write `0xbe8cbb5c + 4 = 1`.
Evidence: 76 delivered callbacks per 100M, but 0 changes to the
candidate guest memory address across 500M instr.
- **MEDIUM-HIGH** that the patch is **useful for future trajectories**.
Once the producer-side starts writing (whether via a guest-callback
fix or a crowbar kernel-side write), the consume-side reciprocal
clear becomes critical: without it, the first write would latch and
fast-path forever, the symptom 2.AI dispatched at the create-time
signal flag would re-emerge at the dispatcher's `SignalState` flag.
- **LOW-MEDIUM** that this is sufficient to reach gameplay. VdSwap
stays at 6 (no rendering progression), tid=8 starves, tid=11/12
XAudio/DPC still blocked. Several more iterations likely needed.
## Next-iterate recommendation
Priority list:
1. **2.AK (producer-side VSync callback investigation)** — the actual
missing wedge for this iterate's stated objective. Trace the
callback's guest code at PC `0x824be9a0` via `--lr-trace` to find
what conditional gates the `SignalState` write, or scope a
**crowbar** path in `try_inject_graphics_interrupt`: maintain a
per-callback `signal_state_addr: Option<u32>` field on
`KernelState`, initialized via heuristic (e.g. user_data + scan
for `KEVENT` signature), and force `mem.write_u32(addr, 1)` on
each IRQ delivery alongside the callback inject. Estimated
20-50 LOC.
2. **2.AL (tid=7 priority-aging extension)** — the 2.V aging hot-path
targeted prio=0 vs prio=15; that's a slimmer gradient than tid=7's
prio=17 vs tid=8's prio=0. Either lift the cap or apply the same
aging-bonus formula on the steeper gradient. Estimated 10 LOC if
the existing aging knob extends, 30 LOC if a separate
max-bonus-for-low-priority logic is needed.
3. **2.AM (XAudio stub, tid=11 unchanged)** — remains from 2.AB. ~5-150 LOC.
4. **2.AN (regression-grep for guest-pointer dispatcher writes)**
if 2.AK lands a crowbar, the same pattern likely needs
generalizing across other dispatcher families.
I recommend **2.AK next** — it's the actual producer-side wedge this
iterate was supposed to address; the consume-side discipline this
iterate landed is necessary infrastructure for whatever 2.AK chooses
as its mechanism.
## Artifacts
Under `xenia-rs/audit-runs/iterate-2AJ-vsync-event-wiring/`:
- `ours-cold.jsonl` (16.07 GB, 65,691,821 events) — primary trace
- `ours-cold.stdout.log` (empty — quiet mode)
- `ours-cold.stderr.log` (single exit-thread-state notice)
- `exit-thread-state.json` (17.4 KB; 21 alive + 18 wedge entries —
same wedge set as 2.AI)
- `ours-cold-run2.jsonl` (16.07 GB, 65,691,821 events) — determinism
check, bit-identical event count, head-100K stripped-host_ns equal
- `ours-cold-run2.{stdout,stderr}.log`
- `writer-report.md` (this file)
xenia-canary UNCHANGED.
Engine state: head + 2.AF patch (`+18` in `xenia-app/src/main.rs`)
+ 2.AI patch (`+16/-2` in `xenia-kernel/src/exports.rs`) + **2.AJ
patch (`+45` in `xenia-kernel/src/exports.rs`)**. All three
retained in working tree, uncommitted (per the cumulative-LOC
policy noted in 2.W's report). Cumulative 5-day LOC: 2.V (+30) +
2.AF (+18) + 2.AI (+16) + 2.AJ (+45) = +109 LOC uncommitted.