handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,158 @@
# Iterate 2.AO — VBLANK MMIO Hardcode (C-1 candidate from 2.AN)
**Headline: FIX-INERT-C2-CONFIRMED.**
The 2.AN Angle-A fix (hardcode `D1MODE_VBLANK_VLINE_STATUS` / reg `0x1951`
to return `1` on read, matching xenia-canary `graphics_system.cc:309-310`)
is **applied, builds, passes all tests, preserves determinism — and is
fully inert**. VdSwap stays at 6, the total event trace is bit-identical to
the 2.AI/2.AJ baseline (65,691,821 events), and the exit-thread-state /
wedge map are byte-for-byte identical to 2.AJ. C-1 (the VBLANK read
asymmetry) was **not** the active blocker. The deeper bottleneck C-2
(`opt_callback` at `user_data+15144` never installed) is confirmed as the
prime suspect.
---
## Patch summary
| File | Change | LOC | Notes |
|------|--------|-----|-------|
| `crates/xenia-gpu/src/mmio_region.rs` | read arm `reg::D1MODE_VBLANK_VLINE_STATUS` now returns `1` unconditionally instead of `read_vblank_status.load(Relaxed)` | **+9 / -1** (1 substantive + 8 doc/`let _` keep-alive) | single match arm |
| `crates/xenia-kernel/src/exports.rs` | **untouched** — 2.AJ reciprocal-shadow patch | +45 (pre-existing) | left in place as instructed |
- Diff (`git diff --numstat`): `9 1 crates/xenia-gpu/src/mmio_region.rs` — under the 10-LOC hard cap.
- The captured `read_vblank_status` clone is held with `let _ = &read_vblank_status;`
so the closure still moves it and compiles clean.
- The write closure's W1TC path and `tick_vsync_instr` are untouched
(`write_vblank_status` still used there). No refactor.
- Branch `chore/portable-snapshot`, HEAD `acd1656`. Patch UNCOMMITTED in
working tree (as required). 2.AJ exports.rs patch verified intact (+45).
### Source confirmation
- `reg::D1MODE_VBLANK_VLINE_STATUS == 0x1951` at `gpu_system.rs:1430`.
- Canary `case 0x1951: return 1; // vblank` at `graphics_system.cc:309-310` — exact match.
- The ours source comment at `gpu_system.rs:224-232` independently documents
2.AN's premise: the Sylpheed vsync callback "gates *all* its work on
reading bit 0 as set: `lwz; rlwinm. r,r,0,31,31; bc 12,2,skip`".
---
## Verification gates
### Build / Test (PRIMARY)
- `cargo build --release`: **SUCCEEDS** (incremental, 0.88s). Only a
pre-existing unrelated `dead_code` warning in
`phase_b_snapshot.rs:245` (`walk_committed_regions`) — not from this patch.
- `cargo test -p xenia-gpu -p xenia-kernel -p xenia-app -p xenia-cpu`:
**687 pass, 0 fail, 0 regressions** (xenia-app 300; xenia-kernel 227 +
149; xenia-cpu 6; xenia-gpu 5; + ignored doctests). Matches historical
baseline exactly.
### Determinism (PRIMARY) — **PASS**
- run1 `ours-cold.jsonl`: **65,691,821** events.
- run2 `ours-cold-run2.jsonl`: **65,691,821** events.
- Bit-identical line count across two cold runs (`XENIA_CACHE_WIPE=1`,
`-n 500000000`). (The ~763 KB byte-size delta between the two files is
trailing-buffer noise, not an event-count divergence — line counts are
exactly equal.)
### VdSwap (PRIMARY) — **NO CHANGE → C-1 not the gate**
- run1 VdSwap: **6**. run2 VdSwap: **6**.
- 2.AI/2.AJ baseline: 6. **No progression.** Per the gate definition, an
unchanged VdSwap means C-1 was not the active blocker.
### Total event count vs baseline — **IDENTICAL**
- 2.AO = 65,691,821. 2.AJ baseline = 65,691,821. **Exactly equal.** The
hardcode produced zero observable divergence in the execution trace.
### Exit-state (tid=1 / tid=12) — **byte-identical to 2.AJ**
- `diff exit-thread-state.json` (2AJ vs 2AO): **BYTE-IDENTICAL**. Same 21
alive threads, same 18 wedge entries.
- **tid=1**: `Blocked` @ PC `0x824ac578`, waiting on **Event `0x000010e8`**
(sig=false, no signaler). Unchanged — the 2.AI/2.AJ wedge.
- **tid=12**: `Blocked` @ PC `0x824ac578`, waiting on **Event `0x00001004`**
(sig=false, no signaler). Unchanged — the DPC-dispatcher wedge
(2.AC/2.AM).
### tid=1 wait gap on Event 0x10e8 (SECONDARY) — **no improvement**
- Event `0x000010e8` ↔ semantic SID `9ad1bebb6cae28c4` (handle.create at
host_ns 819,544,956).
- tid=1 issues exactly **2** `wait.begin` on this SID, at host_ns ~6.660s,
**128.595 µs** apart, then **blocks permanently** (no 3rd wait, never
woken). This is the same two-wait-then-permanent-block pattern 2.AJ
reported (~126.8 µs). The expected secondary effect ("wait gap may rise
as more callbacks succeed") **did not occur** — the gate is downstream of
C-2, so nothing changed.
### gpu.interrupt.delivered rate (SECONDARY) — **N/A**
- The engine emits no `gpu.interrupt.delivered` event kind (the 11 kinds in
the trace are: import.call, kernel.call/return, wait.begin,
handle.create/destroy, wake.requested, signal.match, thread.create/exit,
schema_version). `VdSetGraphicsInterruptCallback` is called 3× (callback
IS registered) — consistent with 2.AJ's 76 ISR firings/100M. Not
measurable from this trace; no regression.
---
## Why the fix is inert (C-2 mechanism)
The hardcode correctly removes the read asymmetry 2.AN identified: the guest
VSync callback `sub_824BE9A0` @ PC `0x824BEA38-0x824BEA44` now always reads
bit 0 = 1 and would take its frame-counter branch instead of the
`beq loc_824BEAAC` skip. But the trace is **bit-identical** to the
bit-clear baseline — meaning the frame-counter branch produces no
downstream observable signal either way.
Per 2.AN's C-2: the real signaller is the dynamically-installed
`opt_callback` stored at `user_data+15144` (tail-called by
`sub_824BE9A0``sub_824BEA80`). In the 65.7M-event run that opt_callback
is **never installed** (its setter `sub_824C1920`, reached only via
`sub_822F1F20 ← sub_822F1EE0 ← dispatch-table slot 0x822F1AFC`, requires a
deeper game-state event that does not fire). So even with the VBLANK gate
forced open, there is no installed callback to write `SignalState=1` on
Event 0x10e8 — tid=1 stays wedged. C-1 was a real divergence-vs-canary but
**not on the critical path**; C-2 gates it.
This is consistent with the 5-iterate methodology lesson logged in 2.AN
(variant #44): the "missing signal" is three layers below "what does the
wait depend on" — and C-1, one layer up, was correctly fixed but is inert
because layer-3 (opt_callback install) never happens.
---
## Confidence + next-iterate recommendation
**Confidence: HIGH** that C-1 is inert and C-2 is the prime suspect.
Evidence is decisive (bit-identical event count + byte-identical exit
state + unchanged VdSwap across two deterministic cold runs). The fix is a
correct canary-parity hardening (keep it; it eliminates a latent race) but
not a cascade win.
**Disposition of this patch:** KEEP uncommitted as dormant
correctness/parity infra (like the 2.AJ reciprocal-shadow patch). It costs
nothing, matches canary exactly, and closes a real (if currently
unreachable) race window.
**Next iterate — make C-2 the explicit target.** Recommended (in priority
order, mirroring 2.AN's Angle B/C):
1. **2.AP — opt_callback install/clear probe (~5-15 LOC tooling, 0 engine).**
`--lr-trace 0x824C1920` (setter `sub_824C1920`) over a 500M run to
confirm install count == 0 and identify the nearest reached frame on the
`0x822F1AFC` dispatch chain. This is the single highest-value next step:
it pins down *which* upstream game-state event must fire.
2. **2.AQ — dispatch-chain reachability walk (~10-30 LOC tooling).**
`--lr-trace 0x822F1EE0` / `0x822F1F20` to find where the
`0x822F1AFC` dispatch slot stalls — i.e. the deeper game-state predicate
that never evaluates true. Three layers up from the wait, this is the
actual wedge root.
3. (Deprioritized) The bilateral tid=12 DPC wedge (Event 0x1004, 2.AM) and
tid=11 XAudio wedge (2.AL) remain independent and should follow C-2
resolution, not precede it.
Do **not** chase any further "force the signal" / "force the install"
crowbars before 2.AP/2.AQ identify the gating game-state event — that has
been the #44 reading-error trap five iterates running.