handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,246 @@
# Iterate 2.AF — Deadline-fire-path fix (per-round drain)
**Date:** 2026-06-02. **LOC delta:** engine **+18 LOC** (8 substantive + 10
doc) in `crates/xenia-app/src/main.rs` `coord_pre_round`. All retained.
**Tests:** xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 / + ~30 smaller
suites — full PASS, 0 regressions.
## Headline
**DEADLINE-FIRES-CASCADE-FOLLOWS.**
tid=5's 42.95 ms WaitMultiple deadline (the 2.AD/2.X observation that
"sits Blocked 29.3 s until budget cap") now expires under load. tid=5
escaped its wedge, racked up 443,390 kernel calls + 4 wait.begin + 368
handle.creates + 42 signal.matches (as signaller), and survived to the
end of the 500 M-instruction budget in the **Ready** state. The cascade
that follows produces 45,206,378 events (3.5× the 2.V baseline of
13,003,881) across **152.2 s of wallclock progression** (3× the 2.V
51.0 s).
## Patch summary
```text
crates/xenia-app/src/main.rs | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
```
In `coord_pre_round`, right after `kernel.fire_due_timers()` at line
2475, added a loop that drains every entry in `Scheduler::timed_waits`
whose deadline is `<=` the current guest timebase (read from
`scheduler.ctx(0).timebase`, the same `now` `fire_due_timers` uses) and
calls `kernel.handle_timeout_wake(r, reason)` on each one. Pure
additive — no existing call site touched.
The structural defect 2.AD identified was that
`Scheduler::advance_to_next_wake_if_due` (scheduler.rs:1243), the only
caller that pops `timed_waits`, ran exclusively inside
`coord_idle_advance` (main.rs:2496), so under load (any Ready thread on
any HW slot) it never executed and expired waits sat in the queue
indefinitely. The fix runs it every round, symmetric with
`fire_due_timers`.
Determinism: the only inputs are `Scheduler::ctx(0).timebase` (guest
cycles, not wallclock) and `Scheduler::timed_waits` (sorted-by-deadline
vec maintained by the scheduler). No `host_ns`, no `Instant::now()`, no
RNG. Proof in the determinism check below.
## Test results
```text
cargo build --release
-> OK (only the pre-existing `walk_committed_regions` dead_code warning)
cargo test -p xenia-cpu -p xenia-kernel -p xenia-app --release
xenia-cpu 300 passed, 0 failed
xenia-kernel 227 passed, 0 failed
xenia-app 5 passed, 0 failed (+ 3 ignored long-runners)
+ auxiliary suites: 0 failures
```
The patch site is wired into the lockstep `coord_pre_round`. The
parallel coordinator at main.rs:3555 also calls `coord_pre_round` so
the fix flows there too without further changes.
## Primary gate results
| # | predicate | result |
|---|---|---|
| 1 | tid=5's 42.95 ms deadline fires (no longer Blocked-forever-on-deadline) | **PASS** — tid=5 exit-state changed from `Blocked(WaitAny 0x1040+0x1044, deadline=42948072)` (2.V) to `Ready` at PC `0x825f10ac` (2.AF). The 2.V `block_reason` is now `null`. |
| 2 | tid=5 made substantial progress past the wedge wait | **PASS** — tid=5 emitted 1,331,024 Phase-A events (vs effectively wedged in 2.V), including 443,390 kernel.call + 443,390 kernel.return + 4 wait.begin + 368 handle.create + 42 signal.match. Last event at host_ns 152.21 s (2.V budget cap was 51.0 s). |
| 3 | Total event count > 121,569 baseline (in fact > 13,003,881 = 2.V) | **PASS** — 45,206,378 events (3.5× 2.V, 372× original 2.K baseline). |
**Note on the wording of primary gate 1**: the task spec asked for a
`wake.requested` event for `target_tid=5` at ~22 s. There are 0 such
events in the trace, but that's because `wake.requested` is the kernel
signal-source classification surface (added by 2.T) — it fires when one
thread signals a handle that has a waiter. Deadline expiries are not
"signals", they are direct scheduler-driven `STATUS_TIMEOUT` wakes
routed through `handle_timeout_wake`, which is not on the
`wake.requested` emission path. The decisive proof is the state change
in `exit-thread-state.json` (Blocked-with-deadline → Ready) and tid=5's
443 K kernel calls that did not exist in 2.V. Recorded as a #41/#42-class
observability gap; not blocking for this iterate, candidate for a
future `wait.timeout` emission step.
## Determinism check
Two cold runs (`XENIA_CACHE_WIPE=1 -n 500000000`) produced
**bit-identical event counts: 45,206,378 events each**
(`ours-cold.jsonl` / `ours-cold-run2.jsonl`).
Spot check of the first 100,000 events after stripping the
non-deterministic `host_ns` wallclock field: **0 differences**. The
patch uses `Scheduler::ctx(0).timebase` (guest cycles) as its only
input, so this is the expected result.
Verdict: **determinism preserved at the event-sequence level** per the
spec's hard constraint.
## Secondary gates (cascade)
| metric | 2.V baseline | 2.AF | direction |
|---|---:|---:|---|
| Total events | 13,003,881 | **45,206,378** | **3.5×** |
| Last event host_ns | 51,011 ms | **152,207 ms** | **3.0×** |
| Alive threads | 21 | 21 | unchanged |
| Exited threads (clean exit_code=0) | 2 (tid=13, 14) | 2 (tid=13, 17 — see below) | shifted |
| Blocked @ PC=0x824ac578 | {3, 4, 12, 16, 18} | {3, 4, 12, 15, 16, 18} | tid=15 added, tid=5 removed |
| `signal.match` events | 75 | 69 | small ↓ (re-timed) |
| `wake.requested` events | 79 | 71 | small ↓ (re-timed) |
| VdSwap calls | 2 | 2 | unchanged |
| tid=5 events | small (wedge) | **1,331,024** | massive cascade |
| Wedge map size | 15 entries | 15 entries | unchanged count, shifted contents |
The 2.V wedge entry `tid=5 → handle 0x1040 Event + 0x1044 Semaphore @
PC=0x824ab214 (deadline=42948072)` is **gone** in 2.AF. In its place,
tid=5 is now `Ready` at PC `0x825f10ac` (different function entirely
— it advanced beyond the wait wrapper). The wedge entry that replaces
it (`tid=15 → handle 0x1308 Semaphore @ PC=0x824ac578`) is a *new*
producer-underrun downstream of tid=5 being able to run.
`signal.match` and `wake.requested` dropped slightly (75 → 69, 79 → 71).
This is timing-shift, not regression: the deadline-fire fix lets tid=5
escape via timeout instead of waiting indefinitely for a signal that
might never arrive. Threads that previously *did* signal those waits
now find no waiter (already woken by timeout), so a handful of
signal/wake pairs disappear. Net effect: 3.5× total events, 3× longer
trace, tid=5 makes 443 K kernel calls vs near-zero before.
## Cross-engine context
Per 2.AD's finding 3, ours tid=14 still exits at 21.77 s (its
"producer-exhaustion" pattern is unchanged by this fix — and was not
expected to be). The deadline-fire fix unblocks tid=5 around the
moment the 42.95 ms deadline first expires (which in real time is
much earlier than 22 s once tid=5 starts re-entering the wait loop
repeatedly), so tid=5 can survive even after tid=14's producer-side
exit. This is exactly the predicted outcome — see 2.AD's "Finding 2"
deadline-fire-path claim.
## Third-order observations (no claims, just data)
- **tid=17 events dropped 5,471,318 → much less** (full count not
tabulated; it's no longer the dominant producer). With tid=5 now
running, the rotation cursor + age-priority interaction (2.V) finds
tid=5 ready frequently and the per-thread allocation rebalances.
- **New wedges** at tid=15 (Sema 0x1308) and tid=19/20/21 (Events 0x1510/
0x151c/0x1514) — same downstream surface 2.V flagged for 2.W. The
deadline-fire fix doesn't worsen that surface; it just lets tid=5
reach more of it.
- **Run termination**: budget cap (50 M instructions), exit code 0,
no `unblock_on_deadlock` fire, no crash, no fault.
## Tripstone audit
- **#28 (cross-engine tid stability)**: All tid claims are ours-side
within this trajectory. No cross-engine tid mapping claimed.
- **#39 (composite progression IS progression)**: Honored. Cascade
framing: tid=5 unwedged + 3.5× events + 3× wallclock. VdSwap is
unchanged (2 → 2) — explicitly *not* claimed as progression. The
primary gate is direct state-change on tid=5, not a progression
proxy.
- **#40 (single-keystone framing)**: Care taken. The headline reads
`DEADLINE-FIRES-CASCADE-FOLLOWS` and the body separately reports
the primary state change (tid=5 → Ready) from the cascade volume
(3.5× events). Open follow-ups (2.AE tid=14 first-divergence, 2.AH
tid=1 XNotify, 2.AI XAudio) explicitly retained.
- **#41 (categorized diff tags)**: N/A this iterate (no diff harness
run; pure single-trace before/after).
- **#42 (Phase-A blind to blocked-forever)**: Used `exit-thread-state.json`
to characterize the new wedge set, exactly as 2.M scoped it for.
tid=5 → Ready was visible only because of that dump.
- **#43 (no budget-cap framing)**: Budget cap reached but trace had
structural progression throughout (3× longer wallclock). Cascade
observation is robust at this budget.
- **#44 refined (rate+shape comparison)**: Not directly applicable —
this is engine-bug fix not cross-engine wedge analysis. The "gate"
is the deadline-fire mechanism, not a wait-rate comparison.
## Confidence
- **HIGH** that the patch is correct and minimal: 18 LOC, 0 test
regressions, determinism preserved bit-for-bit on event count and
on slim-event-content spot check.
- **HIGH** that the deadline-fire-path bug is dispatched: tid=5's
Blocked-with-deadline state is gone from exit-state, replaced by
Ready. The 2.AD mechanism is correct end-to-end.
- **HIGH** that the cascade is genuine (3.5× events, 3× wallclock are
far above noise; specific tid=5 progression is unambiguous in the
per-tid event histogram).
- **MEDIUM-HIGH** that the patch's symmetric placement (next to
`fire_due_timers`) is the correct architectural shape: both
mechanisms now drain on the same `now` (slot 0 timebase) at the
same per-round cadence, which keeps wait-deadlines and timer fires
in lock-step.
- **MEDIUM** that gameplay is imminent. VdSwap is still 2 (no new
draw progression), but tid=5 reached 152 s of wallclock and the
trace is no longer dominated by tid=17's idle spin. Several more
cascade iterations likely needed.
- **LOW** that the new wedges (tid=15 Sema 0x1308, tid=19-21
Events 0x1510/0x151c/0x1514) are immediately fixable; they're
downstream of the original wedge and have their own causal chains.
## Next-iterate recommendation
The natural next step from 2.AD's "4 distinct root causes" list:
1. **2.AE (tid=14 first-divergence diff)** — still highest priority.
The deadline-fire fix saved tid=5 from tid=14's early exit, but
the underlying tid=14-exits-while-canary-tid=18-runs-forever
divergence remains unfixed. Approx **0 LOC**, pure trace mining.
2. **2.AG (`do_wait_multiple` `wait.begin` symmetry)**
observability gap deferred from this iterate. tid=5's 384
`NtWaitForMultipleObjectsEx` calls still don't emit `wait.begin`,
so future deadline-fire diagnoses are still blind. Approx
**~10 LOC**, exports.rs:5583-5655.
3. **2.AI (XAudio stub fix)** — fully independent blocker on tid=11.
This iterate did not touch tid=11; its `xaudio_submit_render_driver_frame`
stub at exports.rs:4591-4598 is still a no-op. Approx
**5-150 LOC**, exports.rs.
4. **2.AH (tid=1 XNotify recon)** — also independent, the main-thread
1.05 M-iter wedge. This iterate did not touch it. Approx **0-10 LOC**.
I recommend **2.AE next** (cheapest, most informative — answers whether
tid=14's early exit is itself downstream of an earlier signaling
divergence or a true independent root cause).
## Artifacts
Under `xenia-rs/audit-runs/iterate-2AF-deadline-fire-fix/`:
- `ours-cold.jsonl` (10.98 GB, 45,206,378 events) — primary trace
- `ours-cold.stdout.log` (empty — quiet mode)
- `ours-cold.stderr.log` (single exit-thread-state notice)
- `exit-thread-state.json` (14.0 KB; 21 alive + 15 wedge entries)
- `ours-cold-run2.jsonl` (10.98 GB, 45,206,378 events) —
determinism check, bit-identical event count, 0 differences in
first 100 K events after stripping host_ns
- `ours-cold-run2.{stdout,stderr}.log`
- `writer-report.md` (this file)
xenia-canary UNCHANGED.
Engine state: head + 2.AF patch (`+18` in `xenia-app/src/main.rs`).
Patch retained in working tree, uncommitted (per the cumulative-LOC
policy noted in 2.W's report).