Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.8 KiB
Iterate 2.AO — VBLANK MMIO Hardcode (C-1 candidate from 2.AN)
Headline: FIX-INERT-C2-CONFIRMED.
The 2.AN Angle-A fix (hardcode D1MODE_VBLANK_VLINE_STATUS / reg 0x1951
to return 1 on read, matching xenia-canary graphics_system.cc:309-310)
is applied, builds, passes all tests, preserves determinism — and is
fully inert. VdSwap stays at 6, the total event trace is bit-identical to
the 2.AI/2.AJ baseline (65,691,821 events), and the exit-thread-state /
wedge map are byte-for-byte identical to 2.AJ. C-1 (the VBLANK read
asymmetry) was not the active blocker. The deeper bottleneck C-2
(opt_callback at user_data+15144 never installed) is confirmed as the
prime suspect.
Patch summary
| File | Change | LOC | Notes |
|---|---|---|---|
crates/xenia-gpu/src/mmio_region.rs |
read arm reg::D1MODE_VBLANK_VLINE_STATUS now returns 1 unconditionally instead of read_vblank_status.load(Relaxed) |
+9 / -1 (1 substantive + 8 doc/let _ keep-alive) |
single match arm |
crates/xenia-kernel/src/exports.rs |
untouched — 2.AJ reciprocal-shadow patch | +45 (pre-existing) | left in place as instructed |
- Diff (
git diff --numstat):9 1 crates/xenia-gpu/src/mmio_region.rs— under the 10-LOC hard cap. - The captured
read_vblank_statusclone is held withlet _ = &read_vblank_status;so the closure still moves it and compiles clean. - The write closure's W1TC path and
tick_vsync_instrare untouched (write_vblank_statusstill used there). No refactor. - Branch
chore/portable-snapshot, HEADacd1656. Patch UNCOMMITTED in working tree (as required). 2.AJ exports.rs patch verified intact (+45).
Source confirmation
reg::D1MODE_VBLANK_VLINE_STATUS == 0x1951atgpu_system.rs:1430.- Canary
case 0x1951: return 1; // vblankatgraphics_system.cc:309-310— exact match. - The ours source comment at
gpu_system.rs:224-232independently documents 2.AN's premise: the Sylpheed vsync callback "gates all its work on reading bit 0 as set:lwz; rlwinm. r,r,0,31,31; bc 12,2,skip".
Verification gates
Build / Test (PRIMARY)
cargo build --release: SUCCEEDS (incremental, 0.88s). Only a pre-existing unrelateddead_codewarning inphase_b_snapshot.rs:245(walk_committed_regions) — not from this patch.cargo test -p xenia-gpu -p xenia-kernel -p xenia-app -p xenia-cpu: 687 pass, 0 fail, 0 regressions (xenia-app 300; xenia-kernel 227 + 149; xenia-cpu 6; xenia-gpu 5; + ignored doctests). Matches historical baseline exactly.
Determinism (PRIMARY) — PASS
- run1
ours-cold.jsonl: 65,691,821 events. - run2
ours-cold-run2.jsonl: 65,691,821 events. - Bit-identical line count across two cold runs (
XENIA_CACHE_WIPE=1,-n 500000000). (The ~763 KB byte-size delta between the two files is trailing-buffer noise, not an event-count divergence — line counts are exactly equal.)
VdSwap (PRIMARY) — NO CHANGE → C-1 not the gate
- run1 VdSwap: 6. run2 VdSwap: 6.
- 2.AI/2.AJ baseline: 6. No progression. Per the gate definition, an unchanged VdSwap means C-1 was not the active blocker.
Total event count vs baseline — IDENTICAL
- 2.AO = 65,691,821. 2.AJ baseline = 65,691,821. Exactly equal. The hardcode produced zero observable divergence in the execution trace.
Exit-state (tid=1 / tid=12) — byte-identical to 2.AJ
diff exit-thread-state.json(2AJ vs 2AO): BYTE-IDENTICAL. Same 21 alive threads, same 18 wedge entries.- tid=1:
Blocked@ PC0x824ac578, waiting on Event0x000010e8(sig=false, no signaler). Unchanged — the 2.AI/2.AJ wedge. - tid=12:
Blocked@ PC0x824ac578, waiting on Event0x00001004(sig=false, no signaler). Unchanged — the DPC-dispatcher wedge (2.AC/2.AM).
tid=1 wait gap on Event 0x10e8 (SECONDARY) — no improvement
- Event
0x000010e8↔ semantic SID9ad1bebb6cae28c4(handle.create at host_ns 819,544,956). - tid=1 issues exactly 2
wait.beginon this SID, at host_ns ~6.660s, 128.595 µs apart, then blocks permanently (no 3rd wait, never woken). This is the same two-wait-then-permanent-block pattern 2.AJ reported (~126.8 µs). The expected secondary effect ("wait gap may rise as more callbacks succeed") did not occur — the gate is downstream of C-2, so nothing changed.
gpu.interrupt.delivered rate (SECONDARY) — N/A
- The engine emits no
gpu.interrupt.deliveredevent kind (the 11 kinds in the trace are: import.call, kernel.call/return, wait.begin, handle.create/destroy, wake.requested, signal.match, thread.create/exit, schema_version).VdSetGraphicsInterruptCallbackis called 3× (callback IS registered) — consistent with 2.AJ's 76 ISR firings/100M. Not measurable from this trace; no regression.
Why the fix is inert (C-2 mechanism)
The hardcode correctly removes the read asymmetry 2.AN identified: the guest
VSync callback sub_824BE9A0 @ PC 0x824BEA38-0x824BEA44 now always reads
bit 0 = 1 and would take its frame-counter branch instead of the
beq loc_824BEAAC skip. But the trace is bit-identical to the
bit-clear baseline — meaning the frame-counter branch produces no
downstream observable signal either way.
Per 2.AN's C-2: the real signaller is the dynamically-installed
opt_callback stored at user_data+15144 (tail-called by
sub_824BE9A0 → sub_824BEA80). In the 65.7M-event run that opt_callback
is never installed (its setter sub_824C1920, reached only via
sub_822F1F20 ← sub_822F1EE0 ← dispatch-table slot 0x822F1AFC, requires a
deeper game-state event that does not fire). So even with the VBLANK gate
forced open, there is no installed callback to write SignalState=1 on
Event 0x10e8 — tid=1 stays wedged. C-1 was a real divergence-vs-canary but
not on the critical path; C-2 gates it.
This is consistent with the 5-iterate methodology lesson logged in 2.AN (variant #44): the "missing signal" is three layers below "what does the wait depend on" — and C-1, one layer up, was correctly fixed but is inert because layer-3 (opt_callback install) never happens.
Confidence + next-iterate recommendation
Confidence: HIGH that C-1 is inert and C-2 is the prime suspect. Evidence is decisive (bit-identical event count + byte-identical exit state + unchanged VdSwap across two deterministic cold runs). The fix is a correct canary-parity hardening (keep it; it eliminates a latent race) but not a cascade win.
Disposition of this patch: KEEP uncommitted as dormant correctness/parity infra (like the 2.AJ reciprocal-shadow patch). It costs nothing, matches canary exactly, and closes a real (if currently unreachable) race window.
Next iterate — make C-2 the explicit target. Recommended (in priority order, mirroring 2.AN's Angle B/C):
-
2.AP — opt_callback install/clear probe (~5-15 LOC tooling, 0 engine).
--lr-trace 0x824C1920(settersub_824C1920) over a 500M run to confirm install count == 0 and identify the nearest reached frame on the0x822F1AFCdispatch chain. This is the single highest-value next step: it pins down which upstream game-state event must fire. -
2.AQ — dispatch-chain reachability walk (~10-30 LOC tooling).
--lr-trace 0x822F1EE0/0x822F1F20to find where the0x822F1AFCdispatch slot stalls — i.e. the deeper game-state predicate that never evaluates true. Three layers up from the wait, this is the actual wedge root. -
(Deprioritized) The bilateral tid=12 DPC wedge (Event 0x1004, 2.AM) and tid=11 XAudio wedge (2.AL) remain independent and should follow C-2 resolution, not precede it.
Do not chase any further "force the signal" / "force the install" crowbars before 2.AP/2.AQ identify the gating game-state event — that has been the #44 reading-error trap five iterates running.