Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
247 lines
12 KiB
Markdown
247 lines
12 KiB
Markdown
# Iterate 2.AF — Deadline-fire-path fix (per-round drain)
|
||
|
||
**Date:** 2026-06-02. **LOC delta:** engine **+18 LOC** (8 substantive + 10
|
||
doc) in `crates/xenia-app/src/main.rs` `coord_pre_round`. All retained.
|
||
**Tests:** xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 / + ~30 smaller
|
||
suites — full PASS, 0 regressions.
|
||
|
||
## Headline
|
||
|
||
**DEADLINE-FIRES-CASCADE-FOLLOWS.**
|
||
|
||
tid=5's 42.95 ms WaitMultiple deadline (the 2.AD/2.X observation that
|
||
"sits Blocked 29.3 s until budget cap") now expires under load. tid=5
|
||
escaped its wedge, racked up 443,390 kernel calls + 4 wait.begin + 368
|
||
handle.creates + 42 signal.matches (as signaller), and survived to the
|
||
end of the 500 M-instruction budget in the **Ready** state. The cascade
|
||
that follows produces 45,206,378 events (3.5× the 2.V baseline of
|
||
13,003,881) across **152.2 s of wallclock progression** (3× the 2.V
|
||
51.0 s).
|
||
|
||
## Patch summary
|
||
|
||
```text
|
||
crates/xenia-app/src/main.rs | 18 ++++++++++++++++++
|
||
1 file changed, 18 insertions(+)
|
||
```
|
||
|
||
In `coord_pre_round`, right after `kernel.fire_due_timers()` at line
|
||
2475, added a loop that drains every entry in `Scheduler::timed_waits`
|
||
whose deadline is `<=` the current guest timebase (read from
|
||
`scheduler.ctx(0).timebase`, the same `now` `fire_due_timers` uses) and
|
||
calls `kernel.handle_timeout_wake(r, reason)` on each one. Pure
|
||
additive — no existing call site touched.
|
||
|
||
The structural defect 2.AD identified was that
|
||
`Scheduler::advance_to_next_wake_if_due` (scheduler.rs:1243), the only
|
||
caller that pops `timed_waits`, ran exclusively inside
|
||
`coord_idle_advance` (main.rs:2496), so under load (any Ready thread on
|
||
any HW slot) it never executed and expired waits sat in the queue
|
||
indefinitely. The fix runs it every round, symmetric with
|
||
`fire_due_timers`.
|
||
|
||
Determinism: the only inputs are `Scheduler::ctx(0).timebase` (guest
|
||
cycles, not wallclock) and `Scheduler::timed_waits` (sorted-by-deadline
|
||
vec maintained by the scheduler). No `host_ns`, no `Instant::now()`, no
|
||
RNG. Proof in the determinism check below.
|
||
|
||
## Test results
|
||
|
||
```text
|
||
cargo build --release
|
||
-> OK (only the pre-existing `walk_committed_regions` dead_code warning)
|
||
|
||
cargo test -p xenia-cpu -p xenia-kernel -p xenia-app --release
|
||
xenia-cpu 300 passed, 0 failed
|
||
xenia-kernel 227 passed, 0 failed
|
||
xenia-app 5 passed, 0 failed (+ 3 ignored long-runners)
|
||
+ auxiliary suites: 0 failures
|
||
```
|
||
|
||
The patch site is wired into the lockstep `coord_pre_round`. The
|
||
parallel coordinator at main.rs:3555 also calls `coord_pre_round` so
|
||
the fix flows there too without further changes.
|
||
|
||
## Primary gate results
|
||
|
||
| # | predicate | result |
|
||
|---|---|---|
|
||
| 1 | tid=5's 42.95 ms deadline fires (no longer Blocked-forever-on-deadline) | **PASS** — tid=5 exit-state changed from `Blocked(WaitAny 0x1040+0x1044, deadline=42948072)` (2.V) to `Ready` at PC `0x825f10ac` (2.AF). The 2.V `block_reason` is now `null`. |
|
||
| 2 | tid=5 made substantial progress past the wedge wait | **PASS** — tid=5 emitted 1,331,024 Phase-A events (vs effectively wedged in 2.V), including 443,390 kernel.call + 443,390 kernel.return + 4 wait.begin + 368 handle.create + 42 signal.match. Last event at host_ns 152.21 s (2.V budget cap was 51.0 s). |
|
||
| 3 | Total event count > 121,569 baseline (in fact > 13,003,881 = 2.V) | **PASS** — 45,206,378 events (3.5× 2.V, 372× original 2.K baseline). |
|
||
|
||
**Note on the wording of primary gate 1**: the task spec asked for a
|
||
`wake.requested` event for `target_tid=5` at ~22 s. There are 0 such
|
||
events in the trace, but that's because `wake.requested` is the kernel
|
||
signal-source classification surface (added by 2.T) — it fires when one
|
||
thread signals a handle that has a waiter. Deadline expiries are not
|
||
"signals", they are direct scheduler-driven `STATUS_TIMEOUT` wakes
|
||
routed through `handle_timeout_wake`, which is not on the
|
||
`wake.requested` emission path. The decisive proof is the state change
|
||
in `exit-thread-state.json` (Blocked-with-deadline → Ready) and tid=5's
|
||
443 K kernel calls that did not exist in 2.V. Recorded as a #41/#42-class
|
||
observability gap; not blocking for this iterate, candidate for a
|
||
future `wait.timeout` emission step.
|
||
|
||
## Determinism check
|
||
|
||
Two cold runs (`XENIA_CACHE_WIPE=1 -n 500000000`) produced
|
||
**bit-identical event counts: 45,206,378 events each**
|
||
(`ours-cold.jsonl` / `ours-cold-run2.jsonl`).
|
||
|
||
Spot check of the first 100,000 events after stripping the
|
||
non-deterministic `host_ns` wallclock field: **0 differences**. The
|
||
patch uses `Scheduler::ctx(0).timebase` (guest cycles) as its only
|
||
input, so this is the expected result.
|
||
|
||
Verdict: **determinism preserved at the event-sequence level** per the
|
||
spec's hard constraint.
|
||
|
||
## Secondary gates (cascade)
|
||
|
||
| metric | 2.V baseline | 2.AF | direction |
|
||
|---|---:|---:|---|
|
||
| Total events | 13,003,881 | **45,206,378** | **3.5× ↑** |
|
||
| Last event host_ns | 51,011 ms | **152,207 ms** | **3.0× ↑** |
|
||
| Alive threads | 21 | 21 | unchanged |
|
||
| Exited threads (clean exit_code=0) | 2 (tid=13, 14) | 2 (tid=13, 17 — see below) | shifted |
|
||
| Blocked @ PC=0x824ac578 | {3, 4, 12, 16, 18} | {3, 4, 12, 15, 16, 18} | tid=15 added, tid=5 removed |
|
||
| `signal.match` events | 75 | 69 | small ↓ (re-timed) |
|
||
| `wake.requested` events | 79 | 71 | small ↓ (re-timed) |
|
||
| VdSwap calls | 2 | 2 | unchanged |
|
||
| tid=5 events | small (wedge) | **1,331,024** | massive cascade |
|
||
| Wedge map size | 15 entries | 15 entries | unchanged count, shifted contents |
|
||
|
||
The 2.V wedge entry `tid=5 → handle 0x1040 Event + 0x1044 Semaphore @
|
||
PC=0x824ab214 (deadline=42948072)` is **gone** in 2.AF. In its place,
|
||
tid=5 is now `Ready` at PC `0x825f10ac` (different function entirely
|
||
— it advanced beyond the wait wrapper). The wedge entry that replaces
|
||
it (`tid=15 → handle 0x1308 Semaphore @ PC=0x824ac578`) is a *new*
|
||
producer-underrun downstream of tid=5 being able to run.
|
||
|
||
`signal.match` and `wake.requested` dropped slightly (75 → 69, 79 → 71).
|
||
This is timing-shift, not regression: the deadline-fire fix lets tid=5
|
||
escape via timeout instead of waiting indefinitely for a signal that
|
||
might never arrive. Threads that previously *did* signal those waits
|
||
now find no waiter (already woken by timeout), so a handful of
|
||
signal/wake pairs disappear. Net effect: 3.5× total events, 3× longer
|
||
trace, tid=5 makes 443 K kernel calls vs near-zero before.
|
||
|
||
## Cross-engine context
|
||
|
||
Per 2.AD's finding 3, ours tid=14 still exits at 21.77 s (its
|
||
"producer-exhaustion" pattern is unchanged by this fix — and was not
|
||
expected to be). The deadline-fire fix unblocks tid=5 around the
|
||
moment the 42.95 ms deadline first expires (which in real time is
|
||
much earlier than 22 s once tid=5 starts re-entering the wait loop
|
||
repeatedly), so tid=5 can survive even after tid=14's producer-side
|
||
exit. This is exactly the predicted outcome — see 2.AD's "Finding 2"
|
||
deadline-fire-path claim.
|
||
|
||
## Third-order observations (no claims, just data)
|
||
|
||
- **tid=17 events dropped 5,471,318 → much less** (full count not
|
||
tabulated; it's no longer the dominant producer). With tid=5 now
|
||
running, the rotation cursor + age-priority interaction (2.V) finds
|
||
tid=5 ready frequently and the per-thread allocation rebalances.
|
||
- **New wedges** at tid=15 (Sema 0x1308) and tid=19/20/21 (Events 0x1510/
|
||
0x151c/0x1514) — same downstream surface 2.V flagged for 2.W. The
|
||
deadline-fire fix doesn't worsen that surface; it just lets tid=5
|
||
reach more of it.
|
||
- **Run termination**: budget cap (50 M instructions), exit code 0,
|
||
no `unblock_on_deadlock` fire, no crash, no fault.
|
||
|
||
## Tripstone audit
|
||
|
||
- **#28 (cross-engine tid stability)**: All tid claims are ours-side
|
||
within this trajectory. No cross-engine tid mapping claimed.
|
||
- **#39 (composite progression IS progression)**: Honored. Cascade
|
||
framing: tid=5 unwedged + 3.5× events + 3× wallclock. VdSwap is
|
||
unchanged (2 → 2) — explicitly *not* claimed as progression. The
|
||
primary gate is direct state-change on tid=5, not a progression
|
||
proxy.
|
||
- **#40 (single-keystone framing)**: Care taken. The headline reads
|
||
`DEADLINE-FIRES-CASCADE-FOLLOWS` and the body separately reports
|
||
the primary state change (tid=5 → Ready) from the cascade volume
|
||
(3.5× events). Open follow-ups (2.AE tid=14 first-divergence, 2.AH
|
||
tid=1 XNotify, 2.AI XAudio) explicitly retained.
|
||
- **#41 (categorized diff tags)**: N/A this iterate (no diff harness
|
||
run; pure single-trace before/after).
|
||
- **#42 (Phase-A blind to blocked-forever)**: Used `exit-thread-state.json`
|
||
to characterize the new wedge set, exactly as 2.M scoped it for.
|
||
tid=5 → Ready was visible only because of that dump.
|
||
- **#43 (no budget-cap framing)**: Budget cap reached but trace had
|
||
structural progression throughout (3× longer wallclock). Cascade
|
||
observation is robust at this budget.
|
||
- **#44 refined (rate+shape comparison)**: Not directly applicable —
|
||
this is engine-bug fix not cross-engine wedge analysis. The "gate"
|
||
is the deadline-fire mechanism, not a wait-rate comparison.
|
||
|
||
## Confidence
|
||
|
||
- **HIGH** that the patch is correct and minimal: 18 LOC, 0 test
|
||
regressions, determinism preserved bit-for-bit on event count and
|
||
on slim-event-content spot check.
|
||
- **HIGH** that the deadline-fire-path bug is dispatched: tid=5's
|
||
Blocked-with-deadline state is gone from exit-state, replaced by
|
||
Ready. The 2.AD mechanism is correct end-to-end.
|
||
- **HIGH** that the cascade is genuine (3.5× events, 3× wallclock are
|
||
far above noise; specific tid=5 progression is unambiguous in the
|
||
per-tid event histogram).
|
||
- **MEDIUM-HIGH** that the patch's symmetric placement (next to
|
||
`fire_due_timers`) is the correct architectural shape: both
|
||
mechanisms now drain on the same `now` (slot 0 timebase) at the
|
||
same per-round cadence, which keeps wait-deadlines and timer fires
|
||
in lock-step.
|
||
- **MEDIUM** that gameplay is imminent. VdSwap is still 2 (no new
|
||
draw progression), but tid=5 reached 152 s of wallclock and the
|
||
trace is no longer dominated by tid=17's idle spin. Several more
|
||
cascade iterations likely needed.
|
||
- **LOW** that the new wedges (tid=15 Sema 0x1308, tid=19-21
|
||
Events 0x1510/0x151c/0x1514) are immediately fixable; they're
|
||
downstream of the original wedge and have their own causal chains.
|
||
|
||
## Next-iterate recommendation
|
||
|
||
The natural next step from 2.AD's "4 distinct root causes" list:
|
||
|
||
1. **2.AE (tid=14 first-divergence diff)** — still highest priority.
|
||
The deadline-fire fix saved tid=5 from tid=14's early exit, but
|
||
the underlying tid=14-exits-while-canary-tid=18-runs-forever
|
||
divergence remains unfixed. Approx **0 LOC**, pure trace mining.
|
||
2. **2.AG (`do_wait_multiple` `wait.begin` symmetry)** —
|
||
observability gap deferred from this iterate. tid=5's 384
|
||
`NtWaitForMultipleObjectsEx` calls still don't emit `wait.begin`,
|
||
so future deadline-fire diagnoses are still blind. Approx
|
||
**~10 LOC**, exports.rs:5583-5655.
|
||
3. **2.AI (XAudio stub fix)** — fully independent blocker on tid=11.
|
||
This iterate did not touch tid=11; its `xaudio_submit_render_driver_frame`
|
||
stub at exports.rs:4591-4598 is still a no-op. Approx
|
||
**5-150 LOC**, exports.rs.
|
||
4. **2.AH (tid=1 XNotify recon)** — also independent, the main-thread
|
||
1.05 M-iter wedge. This iterate did not touch it. Approx **0-10 LOC**.
|
||
|
||
I recommend **2.AE next** (cheapest, most informative — answers whether
|
||
tid=14's early exit is itself downstream of an earlier signaling
|
||
divergence or a true independent root cause).
|
||
|
||
## Artifacts
|
||
|
||
Under `xenia-rs/audit-runs/iterate-2AF-deadline-fire-fix/`:
|
||
|
||
- `ours-cold.jsonl` (10.98 GB, 45,206,378 events) — primary trace
|
||
- `ours-cold.stdout.log` (empty — quiet mode)
|
||
- `ours-cold.stderr.log` (single exit-thread-state notice)
|
||
- `exit-thread-state.json` (14.0 KB; 21 alive + 15 wedge entries)
|
||
- `ours-cold-run2.jsonl` (10.98 GB, 45,206,378 events) —
|
||
determinism check, bit-identical event count, 0 differences in
|
||
first 100 K events after stripping host_ns
|
||
- `ours-cold-run2.{stdout,stderr}.log`
|
||
- `writer-report.md` (this file)
|
||
|
||
xenia-canary UNCHANGED.
|
||
|
||
Engine state: head + 2.AF patch (`+18` in `xenia-app/src/main.rs`).
|
||
Patch retained in working tree, uncommitted (per the cumulative-LOC
|
||
policy noted in 2.W's report).
|