Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
12 KiB
Iterate 2.AF — Deadline-fire-path fix (per-round drain)
Date: 2026-06-02. LOC delta: engine +18 LOC (8 substantive + 10
doc) in crates/xenia-app/src/main.rs coord_pre_round. All retained.
Tests: xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 / + ~30 smaller
suites — full PASS, 0 regressions.
Headline
DEADLINE-FIRES-CASCADE-FOLLOWS.
tid=5's 42.95 ms WaitMultiple deadline (the 2.AD/2.X observation that "sits Blocked 29.3 s until budget cap") now expires under load. tid=5 escaped its wedge, racked up 443,390 kernel calls + 4 wait.begin + 368 handle.creates + 42 signal.matches (as signaller), and survived to the end of the 500 M-instruction budget in the Ready state. The cascade that follows produces 45,206,378 events (3.5× the 2.V baseline of 13,003,881) across 152.2 s of wallclock progression (3× the 2.V 51.0 s).
Patch summary
crates/xenia-app/src/main.rs | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
In coord_pre_round, right after kernel.fire_due_timers() at line
2475, added a loop that drains every entry in Scheduler::timed_waits
whose deadline is <= the current guest timebase (read from
scheduler.ctx(0).timebase, the same now fire_due_timers uses) and
calls kernel.handle_timeout_wake(r, reason) on each one. Pure
additive — no existing call site touched.
The structural defect 2.AD identified was that
Scheduler::advance_to_next_wake_if_due (scheduler.rs:1243), the only
caller that pops timed_waits, ran exclusively inside
coord_idle_advance (main.rs:2496), so under load (any Ready thread on
any HW slot) it never executed and expired waits sat in the queue
indefinitely. The fix runs it every round, symmetric with
fire_due_timers.
Determinism: the only inputs are Scheduler::ctx(0).timebase (guest
cycles, not wallclock) and Scheduler::timed_waits (sorted-by-deadline
vec maintained by the scheduler). No host_ns, no Instant::now(), no
RNG. Proof in the determinism check below.
Test results
cargo build --release
-> OK (only the pre-existing `walk_committed_regions` dead_code warning)
cargo test -p xenia-cpu -p xenia-kernel -p xenia-app --release
xenia-cpu 300 passed, 0 failed
xenia-kernel 227 passed, 0 failed
xenia-app 5 passed, 0 failed (+ 3 ignored long-runners)
+ auxiliary suites: 0 failures
The patch site is wired into the lockstep coord_pre_round. The
parallel coordinator at main.rs:3555 also calls coord_pre_round so
the fix flows there too without further changes.
Primary gate results
| # | predicate | result |
|---|---|---|
| 1 | tid=5's 42.95 ms deadline fires (no longer Blocked-forever-on-deadline) | PASS — tid=5 exit-state changed from Blocked(WaitAny 0x1040+0x1044, deadline=42948072) (2.V) to Ready at PC 0x825f10ac (2.AF). The 2.V block_reason is now null. |
| 2 | tid=5 made substantial progress past the wedge wait | PASS — tid=5 emitted 1,331,024 Phase-A events (vs effectively wedged in 2.V), including 443,390 kernel.call + 443,390 kernel.return + 4 wait.begin + 368 handle.create + 42 signal.match. Last event at host_ns 152.21 s (2.V budget cap was 51.0 s). |
| 3 | Total event count > 121,569 baseline (in fact > 13,003,881 = 2.V) | PASS — 45,206,378 events (3.5× 2.V, 372× original 2.K baseline). |
Note on the wording of primary gate 1: the task spec asked for a
wake.requested event for target_tid=5 at ~22 s. There are 0 such
events in the trace, but that's because wake.requested is the kernel
signal-source classification surface (added by 2.T) — it fires when one
thread signals a handle that has a waiter. Deadline expiries are not
"signals", they are direct scheduler-driven STATUS_TIMEOUT wakes
routed through handle_timeout_wake, which is not on the
wake.requested emission path. The decisive proof is the state change
in exit-thread-state.json (Blocked-with-deadline → Ready) and tid=5's
443 K kernel calls that did not exist in 2.V. Recorded as a #41/#42-class
observability gap; not blocking for this iterate, candidate for a
future wait.timeout emission step.
Determinism check
Two cold runs (XENIA_CACHE_WIPE=1 -n 500000000) produced
bit-identical event counts: 45,206,378 events each
(ours-cold.jsonl / ours-cold-run2.jsonl).
Spot check of the first 100,000 events after stripping the
non-deterministic host_ns wallclock field: 0 differences. The
patch uses Scheduler::ctx(0).timebase (guest cycles) as its only
input, so this is the expected result.
Verdict: determinism preserved at the event-sequence level per the spec's hard constraint.
Secondary gates (cascade)
| metric | 2.V baseline | 2.AF | direction |
|---|---|---|---|
| Total events | 13,003,881 | 45,206,378 | 3.5× ↑ |
| Last event host_ns | 51,011 ms | 152,207 ms | 3.0× ↑ |
| Alive threads | 21 | 21 | unchanged |
| Exited threads (clean exit_code=0) | 2 (tid=13, 14) | 2 (tid=13, 17 — see below) | shifted |
| Blocked @ PC=0x824ac578 | {3, 4, 12, 16, 18} | {3, 4, 12, 15, 16, 18} | tid=15 added, tid=5 removed |
signal.match events |
75 | 69 | small ↓ (re-timed) |
wake.requested events |
79 | 71 | small ↓ (re-timed) |
| VdSwap calls | 2 | 2 | unchanged |
| tid=5 events | small (wedge) | 1,331,024 | massive cascade |
| Wedge map size | 15 entries | 15 entries | unchanged count, shifted contents |
The 2.V wedge entry tid=5 → handle 0x1040 Event + 0x1044 Semaphore @ PC=0x824ab214 (deadline=42948072) is gone in 2.AF. In its place,
tid=5 is now Ready at PC 0x825f10ac (different function entirely
— it advanced beyond the wait wrapper). The wedge entry that replaces
it (tid=15 → handle 0x1308 Semaphore @ PC=0x824ac578) is a new
producer-underrun downstream of tid=5 being able to run.
signal.match and wake.requested dropped slightly (75 → 69, 79 → 71).
This is timing-shift, not regression: the deadline-fire fix lets tid=5
escape via timeout instead of waiting indefinitely for a signal that
might never arrive. Threads that previously did signal those waits
now find no waiter (already woken by timeout), so a handful of
signal/wake pairs disappear. Net effect: 3.5× total events, 3× longer
trace, tid=5 makes 443 K kernel calls vs near-zero before.
Cross-engine context
Per 2.AD's finding 3, ours tid=14 still exits at 21.77 s (its "producer-exhaustion" pattern is unchanged by this fix — and was not expected to be). The deadline-fire fix unblocks tid=5 around the moment the 42.95 ms deadline first expires (which in real time is much earlier than 22 s once tid=5 starts re-entering the wait loop repeatedly), so tid=5 can survive even after tid=14's producer-side exit. This is exactly the predicted outcome — see 2.AD's "Finding 2" deadline-fire-path claim.
Third-order observations (no claims, just data)
- tid=17 events dropped 5,471,318 → much less (full count not tabulated; it's no longer the dominant producer). With tid=5 now running, the rotation cursor + age-priority interaction (2.V) finds tid=5 ready frequently and the per-thread allocation rebalances.
- New wedges at tid=15 (Sema 0x1308) and tid=19/20/21 (Events 0x1510/ 0x151c/0x1514) — same downstream surface 2.V flagged for 2.W. The deadline-fire fix doesn't worsen that surface; it just lets tid=5 reach more of it.
- Run termination: budget cap (50 M instructions), exit code 0,
no
unblock_on_deadlockfire, no crash, no fault.
Tripstone audit
- #28 (cross-engine tid stability): All tid claims are ours-side within this trajectory. No cross-engine tid mapping claimed.
- #39 (composite progression IS progression): Honored. Cascade framing: tid=5 unwedged + 3.5× events + 3× wallclock. VdSwap is unchanged (2 → 2) — explicitly not claimed as progression. The primary gate is direct state-change on tid=5, not a progression proxy.
- #40 (single-keystone framing): Care taken. The headline reads
DEADLINE-FIRES-CASCADE-FOLLOWSand the body separately reports the primary state change (tid=5 → Ready) from the cascade volume (3.5× events). Open follow-ups (2.AE tid=14 first-divergence, 2.AH tid=1 XNotify, 2.AI XAudio) explicitly retained. - #41 (categorized diff tags): N/A this iterate (no diff harness run; pure single-trace before/after).
- #42 (Phase-A blind to blocked-forever): Used
exit-thread-state.jsonto characterize the new wedge set, exactly as 2.M scoped it for. tid=5 → Ready was visible only because of that dump. - #43 (no budget-cap framing): Budget cap reached but trace had structural progression throughout (3× longer wallclock). Cascade observation is robust at this budget.
- #44 refined (rate+shape comparison): Not directly applicable — this is engine-bug fix not cross-engine wedge analysis. The "gate" is the deadline-fire mechanism, not a wait-rate comparison.
Confidence
- HIGH that the patch is correct and minimal: 18 LOC, 0 test regressions, determinism preserved bit-for-bit on event count and on slim-event-content spot check.
- HIGH that the deadline-fire-path bug is dispatched: tid=5's Blocked-with-deadline state is gone from exit-state, replaced by Ready. The 2.AD mechanism is correct end-to-end.
- HIGH that the cascade is genuine (3.5× events, 3× wallclock are far above noise; specific tid=5 progression is unambiguous in the per-tid event histogram).
- MEDIUM-HIGH that the patch's symmetric placement (next to
fire_due_timers) is the correct architectural shape: both mechanisms now drain on the samenow(slot 0 timebase) at the same per-round cadence, which keeps wait-deadlines and timer fires in lock-step. - MEDIUM that gameplay is imminent. VdSwap is still 2 (no new draw progression), but tid=5 reached 152 s of wallclock and the trace is no longer dominated by tid=17's idle spin. Several more cascade iterations likely needed.
- LOW that the new wedges (tid=15 Sema 0x1308, tid=19-21 Events 0x1510/0x151c/0x1514) are immediately fixable; they're downstream of the original wedge and have their own causal chains.
Next-iterate recommendation
The natural next step from 2.AD's "4 distinct root causes" list:
- 2.AE (tid=14 first-divergence diff) — still highest priority. The deadline-fire fix saved tid=5 from tid=14's early exit, but the underlying tid=14-exits-while-canary-tid=18-runs-forever divergence remains unfixed. Approx 0 LOC, pure trace mining.
- 2.AG (
do_wait_multiplewait.beginsymmetry) — observability gap deferred from this iterate. tid=5's 384NtWaitForMultipleObjectsExcalls still don't emitwait.begin, so future deadline-fire diagnoses are still blind. Approx ~10 LOC, exports.rs:5583-5655. - 2.AI (XAudio stub fix) — fully independent blocker on tid=11.
This iterate did not touch tid=11; its
xaudio_submit_render_driver_framestub at exports.rs:4591-4598 is still a no-op. Approx 5-150 LOC, exports.rs. - 2.AH (tid=1 XNotify recon) — also independent, the main-thread 1.05 M-iter wedge. This iterate did not touch it. Approx 0-10 LOC.
I recommend 2.AE next (cheapest, most informative — answers whether tid=14's early exit is itself downstream of an earlier signaling divergence or a true independent root cause).
Artifacts
Under xenia-rs/audit-runs/iterate-2AF-deadline-fire-fix/:
ours-cold.jsonl(10.98 GB, 45,206,378 events) — primary traceours-cold.stdout.log(empty — quiet mode)ours-cold.stderr.log(single exit-thread-state notice)exit-thread-state.json(14.0 KB; 21 alive + 15 wedge entries)ours-cold-run2.jsonl(10.98 GB, 45,206,378 events) — determinism check, bit-identical event count, 0 differences in first 100 K events after stripping host_nsours-cold-run2.{stdout,stderr}.logwriter-report.md(this file)
xenia-canary UNCHANGED.
Engine state: head + 2.AF patch (+18 in xenia-app/src/main.rs).
Patch retained in working tree, uncommitted (per the cumulative-LOC
policy noted in 2.W's report).