handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
158
audit-runs/iterate-2AO-vsync-mmio-hardcode/writer-report.md
Normal file
158
audit-runs/iterate-2AO-vsync-mmio-hardcode/writer-report.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# Iterate 2.AO — VBLANK MMIO Hardcode (C-1 candidate from 2.AN)
|
||||
|
||||
**Headline: FIX-INERT-C2-CONFIRMED.**
|
||||
|
||||
The 2.AN Angle-A fix (hardcode `D1MODE_VBLANK_VLINE_STATUS` / reg `0x1951`
|
||||
to return `1` on read, matching xenia-canary `graphics_system.cc:309-310`)
|
||||
is **applied, builds, passes all tests, preserves determinism — and is
|
||||
fully inert**. VdSwap stays at 6, the total event trace is bit-identical to
|
||||
the 2.AI/2.AJ baseline (65,691,821 events), and the exit-thread-state /
|
||||
wedge map are byte-for-byte identical to 2.AJ. C-1 (the VBLANK read
|
||||
asymmetry) was **not** the active blocker. The deeper bottleneck C-2
|
||||
(`opt_callback` at `user_data+15144` never installed) is confirmed as the
|
||||
prime suspect.
|
||||
|
||||
---
|
||||
|
||||
## Patch summary
|
||||
|
||||
| File | Change | LOC | Notes |
|
||||
|------|--------|-----|-------|
|
||||
| `crates/xenia-gpu/src/mmio_region.rs` | read arm `reg::D1MODE_VBLANK_VLINE_STATUS` now returns `1` unconditionally instead of `read_vblank_status.load(Relaxed)` | **+9 / -1** (1 substantive + 8 doc/`let _` keep-alive) | single match arm |
|
||||
| `crates/xenia-kernel/src/exports.rs` | **untouched** — 2.AJ reciprocal-shadow patch | +45 (pre-existing) | left in place as instructed |
|
||||
|
||||
- Diff (`git diff --numstat`): `9 1 crates/xenia-gpu/src/mmio_region.rs` — under the 10-LOC hard cap.
|
||||
- The captured `read_vblank_status` clone is held with `let _ = &read_vblank_status;`
|
||||
so the closure still moves it and compiles clean.
|
||||
- The write closure's W1TC path and `tick_vsync_instr` are untouched
|
||||
(`write_vblank_status` still used there). No refactor.
|
||||
- Branch `chore/portable-snapshot`, HEAD `acd1656`. Patch UNCOMMITTED in
|
||||
working tree (as required). 2.AJ exports.rs patch verified intact (+45).
|
||||
|
||||
### Source confirmation
|
||||
- `reg::D1MODE_VBLANK_VLINE_STATUS == 0x1951` at `gpu_system.rs:1430`.
|
||||
- Canary `case 0x1951: return 1; // vblank` at `graphics_system.cc:309-310` — exact match.
|
||||
- The ours source comment at `gpu_system.rs:224-232` independently documents
|
||||
2.AN's premise: the Sylpheed vsync callback "gates *all* its work on
|
||||
reading bit 0 as set: `lwz; rlwinm. r,r,0,31,31; bc 12,2,skip`".
|
||||
|
||||
---
|
||||
|
||||
## Verification gates
|
||||
|
||||
### Build / Test (PRIMARY)
|
||||
- `cargo build --release`: **SUCCEEDS** (incremental, 0.88s). Only a
|
||||
pre-existing unrelated `dead_code` warning in
|
||||
`phase_b_snapshot.rs:245` (`walk_committed_regions`) — not from this patch.
|
||||
- `cargo test -p xenia-gpu -p xenia-kernel -p xenia-app -p xenia-cpu`:
|
||||
**687 pass, 0 fail, 0 regressions** (xenia-app 300; xenia-kernel 227 +
|
||||
149; xenia-cpu 6; xenia-gpu 5; + ignored doctests). Matches historical
|
||||
baseline exactly.
|
||||
|
||||
### Determinism (PRIMARY) — **PASS**
|
||||
- run1 `ours-cold.jsonl`: **65,691,821** events.
|
||||
- run2 `ours-cold-run2.jsonl`: **65,691,821** events.
|
||||
- Bit-identical line count across two cold runs (`XENIA_CACHE_WIPE=1`,
|
||||
`-n 500000000`). (The ~763 KB byte-size delta between the two files is
|
||||
trailing-buffer noise, not an event-count divergence — line counts are
|
||||
exactly equal.)
|
||||
|
||||
### VdSwap (PRIMARY) — **NO CHANGE → C-1 not the gate**
|
||||
- run1 VdSwap: **6**. run2 VdSwap: **6**.
|
||||
- 2.AI/2.AJ baseline: 6. **No progression.** Per the gate definition, an
|
||||
unchanged VdSwap means C-1 was not the active blocker.
|
||||
|
||||
### Total event count vs baseline — **IDENTICAL**
|
||||
- 2.AO = 65,691,821. 2.AJ baseline = 65,691,821. **Exactly equal.** The
|
||||
hardcode produced zero observable divergence in the execution trace.
|
||||
|
||||
### Exit-state (tid=1 / tid=12) — **byte-identical to 2.AJ**
|
||||
- `diff exit-thread-state.json` (2AJ vs 2AO): **BYTE-IDENTICAL**. Same 21
|
||||
alive threads, same 18 wedge entries.
|
||||
- **tid=1**: `Blocked` @ PC `0x824ac578`, waiting on **Event `0x000010e8`**
|
||||
(sig=false, no signaler). Unchanged — the 2.AI/2.AJ wedge.
|
||||
- **tid=12**: `Blocked` @ PC `0x824ac578`, waiting on **Event `0x00001004`**
|
||||
(sig=false, no signaler). Unchanged — the DPC-dispatcher wedge
|
||||
(2.AC/2.AM).
|
||||
|
||||
### tid=1 wait gap on Event 0x10e8 (SECONDARY) — **no improvement**
|
||||
- Event `0x000010e8` ↔ semantic SID `9ad1bebb6cae28c4` (handle.create at
|
||||
host_ns 819,544,956).
|
||||
- tid=1 issues exactly **2** `wait.begin` on this SID, at host_ns ~6.660s,
|
||||
**128.595 µs** apart, then **blocks permanently** (no 3rd wait, never
|
||||
woken). This is the same two-wait-then-permanent-block pattern 2.AJ
|
||||
reported (~126.8 µs). The expected secondary effect ("wait gap may rise
|
||||
as more callbacks succeed") **did not occur** — the gate is downstream of
|
||||
C-2, so nothing changed.
|
||||
|
||||
### gpu.interrupt.delivered rate (SECONDARY) — **N/A**
|
||||
- The engine emits no `gpu.interrupt.delivered` event kind (the 11 kinds in
|
||||
the trace are: import.call, kernel.call/return, wait.begin,
|
||||
handle.create/destroy, wake.requested, signal.match, thread.create/exit,
|
||||
schema_version). `VdSetGraphicsInterruptCallback` is called 3× (callback
|
||||
IS registered) — consistent with 2.AJ's 76 ISR firings/100M. Not
|
||||
measurable from this trace; no regression.
|
||||
|
||||
---
|
||||
|
||||
## Why the fix is inert (C-2 mechanism)
|
||||
|
||||
The hardcode correctly removes the read asymmetry 2.AN identified: the guest
|
||||
VSync callback `sub_824BE9A0` @ PC `0x824BEA38-0x824BEA44` now always reads
|
||||
bit 0 = 1 and would take its frame-counter branch instead of the
|
||||
`beq loc_824BEAAC` skip. But the trace is **bit-identical** to the
|
||||
bit-clear baseline — meaning the frame-counter branch produces no
|
||||
downstream observable signal either way.
|
||||
|
||||
Per 2.AN's C-2: the real signaller is the dynamically-installed
|
||||
`opt_callback` stored at `user_data+15144` (tail-called by
|
||||
`sub_824BE9A0` → `sub_824BEA80`). In the 65.7M-event run that opt_callback
|
||||
is **never installed** (its setter `sub_824C1920`, reached only via
|
||||
`sub_822F1F20 ← sub_822F1EE0 ← dispatch-table slot 0x822F1AFC`, requires a
|
||||
deeper game-state event that does not fire). So even with the VBLANK gate
|
||||
forced open, there is no installed callback to write `SignalState=1` on
|
||||
Event 0x10e8 — tid=1 stays wedged. C-1 was a real divergence-vs-canary but
|
||||
**not on the critical path**; C-2 gates it.
|
||||
|
||||
This is consistent with the 5-iterate methodology lesson logged in 2.AN
|
||||
(variant #44): the "missing signal" is three layers below "what does the
|
||||
wait depend on" — and C-1, one layer up, was correctly fixed but is inert
|
||||
because layer-3 (opt_callback install) never happens.
|
||||
|
||||
---
|
||||
|
||||
## Confidence + next-iterate recommendation
|
||||
|
||||
**Confidence: HIGH** that C-1 is inert and C-2 is the prime suspect.
|
||||
Evidence is decisive (bit-identical event count + byte-identical exit
|
||||
state + unchanged VdSwap across two deterministic cold runs). The fix is a
|
||||
correct canary-parity hardening (keep it; it eliminates a latent race) but
|
||||
not a cascade win.
|
||||
|
||||
**Disposition of this patch:** KEEP uncommitted as dormant
|
||||
correctness/parity infra (like the 2.AJ reciprocal-shadow patch). It costs
|
||||
nothing, matches canary exactly, and closes a real (if currently
|
||||
unreachable) race window.
|
||||
|
||||
**Next iterate — make C-2 the explicit target.** Recommended (in priority
|
||||
order, mirroring 2.AN's Angle B/C):
|
||||
|
||||
1. **2.AP — opt_callback install/clear probe (~5-15 LOC tooling, 0 engine).**
|
||||
`--lr-trace 0x824C1920` (setter `sub_824C1920`) over a 500M run to
|
||||
confirm install count == 0 and identify the nearest reached frame on the
|
||||
`0x822F1AFC` dispatch chain. This is the single highest-value next step:
|
||||
it pins down *which* upstream game-state event must fire.
|
||||
|
||||
2. **2.AQ — dispatch-chain reachability walk (~10-30 LOC tooling).**
|
||||
`--lr-trace 0x822F1EE0` / `0x822F1F20` to find where the
|
||||
`0x822F1AFC` dispatch slot stalls — i.e. the deeper game-state predicate
|
||||
that never evaluates true. Three layers up from the wait, this is the
|
||||
actual wedge root.
|
||||
|
||||
3. (Deprioritized) The bilateral tid=12 DPC wedge (Event 0x1004, 2.AM) and
|
||||
tid=11 XAudio wedge (2.AL) remain independent and should follow C-2
|
||||
resolution, not precede it.
|
||||
|
||||
Do **not** chase any further "force the signal" / "force the install"
|
||||
crowbars before 2.AP/2.AQ identify the gating game-state event — that has
|
||||
been the #44 reading-error trap five iterates running.
|
||||
Reference in New Issue
Block a user