Files
xenia-rs/audit-runs/iterate-2AO-vsync-mmio-hardcode/writer-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

159 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iterate 2.AO — VBLANK MMIO Hardcode (C-1 candidate from 2.AN)
**Headline: FIX-INERT-C2-CONFIRMED.**
The 2.AN Angle-A fix (hardcode `D1MODE_VBLANK_VLINE_STATUS` / reg `0x1951`
to return `1` on read, matching xenia-canary `graphics_system.cc:309-310`)
is **applied, builds, passes all tests, preserves determinism — and is
fully inert**. VdSwap stays at 6, the total event trace is bit-identical to
the 2.AI/2.AJ baseline (65,691,821 events), and the exit-thread-state /
wedge map are byte-for-byte identical to 2.AJ. C-1 (the VBLANK read
asymmetry) was **not** the active blocker. The deeper bottleneck C-2
(`opt_callback` at `user_data+15144` never installed) is confirmed as the
prime suspect.
---
## Patch summary
| File | Change | LOC | Notes |
|------|--------|-----|-------|
| `crates/xenia-gpu/src/mmio_region.rs` | read arm `reg::D1MODE_VBLANK_VLINE_STATUS` now returns `1` unconditionally instead of `read_vblank_status.load(Relaxed)` | **+9 / -1** (1 substantive + 8 doc/`let _` keep-alive) | single match arm |
| `crates/xenia-kernel/src/exports.rs` | **untouched** — 2.AJ reciprocal-shadow patch | +45 (pre-existing) | left in place as instructed |
- Diff (`git diff --numstat`): `9 1 crates/xenia-gpu/src/mmio_region.rs` — under the 10-LOC hard cap.
- The captured `read_vblank_status` clone is held with `let _ = &read_vblank_status;`
so the closure still moves it and compiles clean.
- The write closure's W1TC path and `tick_vsync_instr` are untouched
(`write_vblank_status` still used there). No refactor.
- Branch `chore/portable-snapshot`, HEAD `acd1656`. Patch UNCOMMITTED in
working tree (as required). 2.AJ exports.rs patch verified intact (+45).
### Source confirmation
- `reg::D1MODE_VBLANK_VLINE_STATUS == 0x1951` at `gpu_system.rs:1430`.
- Canary `case 0x1951: return 1; // vblank` at `graphics_system.cc:309-310` — exact match.
- The ours source comment at `gpu_system.rs:224-232` independently documents
2.AN's premise: the Sylpheed vsync callback "gates *all* its work on
reading bit 0 as set: `lwz; rlwinm. r,r,0,31,31; bc 12,2,skip`".
---
## Verification gates
### Build / Test (PRIMARY)
- `cargo build --release`: **SUCCEEDS** (incremental, 0.88s). Only a
pre-existing unrelated `dead_code` warning in
`phase_b_snapshot.rs:245` (`walk_committed_regions`) — not from this patch.
- `cargo test -p xenia-gpu -p xenia-kernel -p xenia-app -p xenia-cpu`:
**687 pass, 0 fail, 0 regressions** (xenia-app 300; xenia-kernel 227 +
149; xenia-cpu 6; xenia-gpu 5; + ignored doctests). Matches historical
baseline exactly.
### Determinism (PRIMARY) — **PASS**
- run1 `ours-cold.jsonl`: **65,691,821** events.
- run2 `ours-cold-run2.jsonl`: **65,691,821** events.
- Bit-identical line count across two cold runs (`XENIA_CACHE_WIPE=1`,
`-n 500000000`). (The ~763 KB byte-size delta between the two files is
trailing-buffer noise, not an event-count divergence — line counts are
exactly equal.)
### VdSwap (PRIMARY) — **NO CHANGE → C-1 not the gate**
- run1 VdSwap: **6**. run2 VdSwap: **6**.
- 2.AI/2.AJ baseline: 6. **No progression.** Per the gate definition, an
unchanged VdSwap means C-1 was not the active blocker.
### Total event count vs baseline — **IDENTICAL**
- 2.AO = 65,691,821. 2.AJ baseline = 65,691,821. **Exactly equal.** The
hardcode produced zero observable divergence in the execution trace.
### Exit-state (tid=1 / tid=12) — **byte-identical to 2.AJ**
- `diff exit-thread-state.json` (2AJ vs 2AO): **BYTE-IDENTICAL**. Same 21
alive threads, same 18 wedge entries.
- **tid=1**: `Blocked` @ PC `0x824ac578`, waiting on **Event `0x000010e8`**
(sig=false, no signaler). Unchanged — the 2.AI/2.AJ wedge.
- **tid=12**: `Blocked` @ PC `0x824ac578`, waiting on **Event `0x00001004`**
(sig=false, no signaler). Unchanged — the DPC-dispatcher wedge
(2.AC/2.AM).
### tid=1 wait gap on Event 0x10e8 (SECONDARY) — **no improvement**
- Event `0x000010e8` ↔ semantic SID `9ad1bebb6cae28c4` (handle.create at
host_ns 819,544,956).
- tid=1 issues exactly **2** `wait.begin` on this SID, at host_ns ~6.660s,
**128.595 µs** apart, then **blocks permanently** (no 3rd wait, never
woken). This is the same two-wait-then-permanent-block pattern 2.AJ
reported (~126.8 µs). The expected secondary effect ("wait gap may rise
as more callbacks succeed") **did not occur** — the gate is downstream of
C-2, so nothing changed.
### gpu.interrupt.delivered rate (SECONDARY) — **N/A**
- The engine emits no `gpu.interrupt.delivered` event kind (the 11 kinds in
the trace are: import.call, kernel.call/return, wait.begin,
handle.create/destroy, wake.requested, signal.match, thread.create/exit,
schema_version). `VdSetGraphicsInterruptCallback` is called 3× (callback
IS registered) — consistent with 2.AJ's 76 ISR firings/100M. Not
measurable from this trace; no regression.
---
## Why the fix is inert (C-2 mechanism)
The hardcode correctly removes the read asymmetry 2.AN identified: the guest
VSync callback `sub_824BE9A0` @ PC `0x824BEA38-0x824BEA44` now always reads
bit 0 = 1 and would take its frame-counter branch instead of the
`beq loc_824BEAAC` skip. But the trace is **bit-identical** to the
bit-clear baseline — meaning the frame-counter branch produces no
downstream observable signal either way.
Per 2.AN's C-2: the real signaller is the dynamically-installed
`opt_callback` stored at `user_data+15144` (tail-called by
`sub_824BE9A0``sub_824BEA80`). In the 65.7M-event run that opt_callback
is **never installed** (its setter `sub_824C1920`, reached only via
`sub_822F1F20 ← sub_822F1EE0 ← dispatch-table slot 0x822F1AFC`, requires a
deeper game-state event that does not fire). So even with the VBLANK gate
forced open, there is no installed callback to write `SignalState=1` on
Event 0x10e8 — tid=1 stays wedged. C-1 was a real divergence-vs-canary but
**not on the critical path**; C-2 gates it.
This is consistent with the 5-iterate methodology lesson logged in 2.AN
(variant #44): the "missing signal" is three layers below "what does the
wait depend on" — and C-1, one layer up, was correctly fixed but is inert
because layer-3 (opt_callback install) never happens.
---
## Confidence + next-iterate recommendation
**Confidence: HIGH** that C-1 is inert and C-2 is the prime suspect.
Evidence is decisive (bit-identical event count + byte-identical exit
state + unchanged VdSwap across two deterministic cold runs). The fix is a
correct canary-parity hardening (keep it; it eliminates a latent race) but
not a cascade win.
**Disposition of this patch:** KEEP uncommitted as dormant
correctness/parity infra (like the 2.AJ reciprocal-shadow patch). It costs
nothing, matches canary exactly, and closes a real (if currently
unreachable) race window.
**Next iterate — make C-2 the explicit target.** Recommended (in priority
order, mirroring 2.AN's Angle B/C):
1. **2.AP — opt_callback install/clear probe (~5-15 LOC tooling, 0 engine).**
`--lr-trace 0x824C1920` (setter `sub_824C1920`) over a 500M run to
confirm install count == 0 and identify the nearest reached frame on the
`0x822F1AFC` dispatch chain. This is the single highest-value next step:
it pins down *which* upstream game-state event must fire.
2. **2.AQ — dispatch-chain reachability walk (~10-30 LOC tooling).**
`--lr-trace 0x822F1EE0` / `0x822F1F20` to find where the
`0x822F1AFC` dispatch slot stalls — i.e. the deeper game-state predicate
that never evaluates true. Three layers up from the wait, this is the
actual wedge root.
3. (Deprioritized) The bilateral tid=12 DPC wedge (Event 0x1004, 2.AM) and
tid=11 XAudio wedge (2.AL) remain independent and should follow C-2
resolution, not precede it.
Do **not** chase any further "force the signal" / "force the install"
crowbars before 2.AP/2.AQ identify the gating game-state event — that has
been the #44 reading-error trap five iterates running.