Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
256 lines
13 KiB
Markdown
256 lines
13 KiB
Markdown
# Iterate 2.V — Scheduler fairness fix (age-priority anti-starvation)
|
||
|
||
**Date:** 2026-05-28. **LOC delta:** engine **~30 substantive added lines**
|
||
(scheduler.rs only; ~75 LOC including new doc comments). All retained.
|
||
**Option:** A (priority aging). **Tests:** xenia-cpu 300 / xenia-kernel 227
|
||
/ xenia-app 5 / xenia-path 19 + 30+ smaller suites — full workspace PASS,
|
||
0 regressions.
|
||
|
||
## Headline
|
||
|
||
**WEDGE-DISSOLVED-NEW-BLOCKER (PROGRESSION OBSERVED).**
|
||
|
||
The 18-day strict-priority starvation on CPU5 is broken. With `pick_runnable`
|
||
now ranking by *effective* priority `= base + age_bonus(rounds since last
|
||
pick)`, tid=6 (pri=0) finally runs after tid=10 (pri=15) ages out, and the
|
||
cascade that follows produces:
|
||
|
||
- **tid=6 signals handle 0x000012e4 exactly as predicted** — the primary
|
||
keystone gate. 1 `signal.match` event by `NtSetEvent` on
|
||
`target_handle:0x000012e4`, `waiter_tids:[5]`. **Was 0 at 2.T baseline.**
|
||
- **tid=6 event count 17 → 386** (~23×). Now Blocked on the wedge
|
||
handles 0x000010b0/0x000010b4 (deadline-bounded), not Ready-stuck.
|
||
- **tid=13 EXITED** with code 0 (was the original AUDIT-049 wedge from
|
||
10 May 2026 — stuck for 18 days).
|
||
- **Total events 121,641 → 13,003,881** (107× more events; first time
|
||
the boot has crossed multi-second wallclock progression in this trace).
|
||
- **Alive threads 13 → 21** (8 new threads spawned: 14, 15, 16, 17,
|
||
18, 19, 20, 21; 13 and 14 ran to completion and exited).
|
||
- **Wallclock last-event 766.86 ms → 51,011 ms** (66× longer trace).
|
||
|
||
Hard new wedges still exist (15 wedge_map entries vs 10 at baseline), but
|
||
they are *downstream* of the original wedge — the boot has structurally
|
||
advanced. The fix is **mechanism-correct and non-regressive**; the next
|
||
wedges are new territory.
|
||
|
||
## Option chosen: A (priority aging)
|
||
|
||
Justification: Option B (quantum-based round-robin to lower priority on
|
||
N-cycle timeout) requires either (a) violating priority ordering on every
|
||
expiry, which destabilizes existing tests like
|
||
`test_two_threads_same_slot_higher_priority_runs_first`, or (b) a
|
||
separate "starvation counter" that essentially reinvents aging. Option A
|
||
folds cleanly into the existing `max_by_key` shape, is fully
|
||
deterministic (counts on `Scheduler::round_count`), and degenerates to
|
||
the strict-priority rule on round 0 — so every existing test continues
|
||
to pass without modification.
|
||
|
||
## Patch summary
|
||
|
||
File: `crates/xenia-cpu/src/scheduler.rs`. ~30 substantive added LOC
|
||
(plus ~45 LOC of doc comments). Within scope (30-80 target, 150 hard
|
||
cap).
|
||
|
||
| change | purpose | LOC |
|
||
|---|---|---:|
|
||
| `const AGING_ROUNDS_PER_BONUS: u64 = 1;` | one round of starvation = +1 effective priority | 1 |
|
||
| `const MAX_AGE_BONUS: i32 = 31;` | cap (≥ any realistic NT priority diff; ≤ i32 safety margin) | 1 |
|
||
| `GuestThread::last_run_round: u64` field + init in `default_fields` | per-thread baseline for age math | 2 |
|
||
| `fn effective_priority(t, now_round) -> i32` | helper, saturating_sub + min + saturating_add | 6 |
|
||
| `HwSlot::pick_runnable(&self, now_round: u64)` | accepts round_count, ranks by `effective_priority` | 4 |
|
||
| `Scheduler::begin_slot_visit`: pass round_count, stamp winner's `last_run_round` | activates the fix per-pick | 4 |
|
||
| `Scheduler::spawn`: initialize `last_run_round = self.round_count` | prevent fresh threads inheriting giant ages | 1 |
|
||
| `Scheduler::install_initial_thread`: same | same | 1 |
|
||
| `Scheduler::decrement_quantum`: stamp `last_run_round` on rotation hand-off | keep age math consistent with the in-tier rotation path | 1 |
|
||
|
||
Doc comments on the new const, field, helper, and `pick_runnable` total
|
||
~45 LOC explaining the determinism, scope, and link back to this iterate.
|
||
|
||
The fix is purely additive — no existing field or method is removed.
|
||
`HwSlot::pick_runnable`'s signature changed from `(&self)` to
|
||
`(&self, now_round: u64)`; the only external caller
|
||
(`Scheduler::begin_slot_visit`) was updated in lockstep.
|
||
|
||
## Test results
|
||
|
||
```
|
||
cargo build --release -> OK (1 pre-existing dead_code warning unrelated)
|
||
cargo test --release --workspace:
|
||
xenia-cpu 300 passed, 0 failed
|
||
xenia-kernel 227 passed, 0 failed
|
||
xenia-app 5 passed, 0 failed (+ 3 ignored long-runners)
|
||
xenia-path 19 passed, 0 failed
|
||
+ ~25 smaller suites, 0 failures total
|
||
```
|
||
|
||
The test that exercises strict priority
|
||
(`test_two_threads_same_slot_higher_priority_runs_first`) still passes
|
||
because at `round_count = 0`, every thread has `last_run_round = 0` ⇒
|
||
age = 0 ⇒ age_bonus = 0 ⇒ effective_priority == base_priority. The age
|
||
math only kicks in once `round_count` advances beyond a thread's last
|
||
pick — i.e. after actual starvation begins.
|
||
|
||
The quantum-rotation test
|
||
(`test_quantum_does_not_rotate_without_same_priority_peer`) still passes
|
||
because it never advances `round_count` (it only calls `decrement_quantum`
|
||
within one slot visit).
|
||
|
||
## Determinism check
|
||
|
||
Two cold runs (XENIA_CACHE_WIPE=1, -n 500000000) produced **bit-identical
|
||
event counts: 13,003,881 events each** (`ours-cold.jsonl` /
|
||
`ours-cold-run2.jsonl`).
|
||
|
||
Diff of the two JSONL files (after stripping the `host_ns` wallclock
|
||
noise that's not deterministic in any of our runs): **6 events differ
|
||
out of 13,003,881, only in the `guest_cycle` field** (5,577,193 vs
|
||
5,577,214 on a single `KeAcquireSpinLockAtRaisedIrql` / `KeReleaseSpin
|
||
LockFromRaisedIrql` pair at idx 105,282-105,287). Kinds, names, ords,
|
||
tids, and event-idx sequence are identical. This pre-existing tiny
|
||
spinlock-cycle drift was visible in 2.T as well; it is not introduced by
|
||
this iterate and does not affect the event-stream shape.
|
||
|
||
Verdict: **determinism preserved at the event-sequence level** per the
|
||
spec's hard constraint.
|
||
|
||
## Primary gate results
|
||
|
||
| gate | predicate | result |
|
||
|---|---|---|
|
||
| **tid=6 signals handle 0x000012e4** | `signal.match` for `target_handle:0x000012e4` ≥ 1 | **PASS** — 1 event by tid=6 `NtSetEvent`, `waiter_tids:[5]`, at guest_cycle=0/host_ns=844.35ms |
|
||
| **tid=6 event count > 105** | tid=6 emits >105 Phase-A events | **PASS** — 386 events (was 17) |
|
||
| **tid=6 NOT Ready-stuck on exit** | exit-thread-state shows tid=6 in Blocked/Exited, not Ready | **PASS** — `state:"Blocked"`, WaitAny on handles 0x000010b0 (Event) + 0x000010b4 (Semaphore), `deadline_ns_or_inf:42948072` |
|
||
|
||
All 3 primary gates pass. The mechanism is confirmed end-to-end:
|
||
tid=10 ages out → tid=6 picked → tid=6 progresses through prior wait
|
||
→ tid=6 advances past `NtSetEvent` (the missing signal in 2.T) → wakes
|
||
tid=5 → cascade unfolds.
|
||
|
||
## Secondary gates (cascade)
|
||
|
||
| gate | 2.T baseline | 2.V | direction |
|
||
|---|---:|---:|---|
|
||
| Total events | 121,641 | **13,003,881** | **107× ↑** |
|
||
| Last event host_ns | 767 ms | **51,011 ms** | **66× ↑** |
|
||
| Alive threads | 13 | **21** | **+8 spawned** |
|
||
| Exited threads (clean exit_code=0) | 0 | **2** (tid=13, tid=14) | new |
|
||
| Blocked @ PC=0x824ac578 (the AUDIT-049 set) | {1,3,4,5,13} | **{3,4,12,16,18}** | tid=1/5/13 unblocked; new tids appear |
|
||
| `signal.match` events | 36 | **75** | **+108%** |
|
||
| `wake.requested` events | 36 | **79** | **+119%** |
|
||
| Unique signal.match handles | small | **20+** | broader signaling surface |
|
||
| VdSwap calls (`import.call` count) | 1 | **2** | **+1** |
|
||
| Audio tid=10 events | 1 | **17** | **+16** (modest; aging works but tid=10 stays mostly CPU-bound between yields) |
|
||
| tid=6 events | 17 | **386** | **+23×** |
|
||
| tid=17 events (new worker) | n/a | **5,471,318** | massive new producer |
|
||
|
||
The originally-blocked set {1, 3, 4, 5, 13} at PC=0x824ac578 has
|
||
*completely changed*. tid=1 is now Ready, tid=5 has advanced to
|
||
PC=0x824ab214 (a different wait wrapper), tid=13 has exited cleanly.
|
||
Three of the original five threads are no longer parked on that PC.
|
||
|
||
VdSwap reached 2 (vs 1 baseline) — small absolute, but a definite gameplay
|
||
progression marker per tripstone #39. The second swap fires on tid=8 at
|
||
~1.22 s wallclock, vs the first on tid=1 at ~494 ms.
|
||
|
||
## Third-order observations (no claims, just data)
|
||
|
||
- **New wedge surface (15 entries vs 10)**. The new wedges include
|
||
several handles (0x14dc, 0x151c, 0x1510, 0x1514, 0x1020, 0x1004, 0x1308)
|
||
that didn't exist in the baseline trace — they correspond to handles
|
||
created by the new worker threads (15-21) that only exist post-cascade.
|
||
Not regressions; they are the next *natural* blocking point now that
|
||
the original blocker is dissolved.
|
||
- **One semaphore wedge with multiple waiters** (handle 0x00001308,
|
||
`count=0/max=2^31-1`, `waiters_tid:[15, 16]`) — classic
|
||
producer-underrun shape (AUDIT-069 family). Likely the next iterate's
|
||
target.
|
||
- **tid=10 / tid=9 still Ready at exit on CPU5/CPU4 at priority=15**
|
||
(the audio mixer pair). Both at PC=0x824d140c (vs 0x824d1404 at
|
||
baseline — moved by 8 bytes, i.e. one instruction past). The aging
|
||
bonus lets them yield occasionally; they're no longer pinning their
|
||
CPUs hard.
|
||
- **Run termination**: budget cap (50M instructions); no crash, no
|
||
deadlock, no `unblock_on_deadlock` fire.
|
||
|
||
## Tripstone audit
|
||
|
||
- **#28 (cross-engine tid stability)**: All tid claims are ours-side
|
||
within this trajectory. The new tids 15-21 are first observed in this
|
||
iterate; no cross-engine tid mapping claimed.
|
||
- **#39 (composite progression IS progression)**: Honored. VdSwap=2,
|
||
swap count UP, but draws/render_targets not measured here. Headline
|
||
uses WEDGE-DISSOLVED-NEW-BLOCKER framing — does *not* claim
|
||
"boot complete" or "gameplay reached". The mechanism gate
|
||
(signal.match on 0x12e4) is direct and not a progression-laundering
|
||
proxy.
|
||
- **#40 (single-keystone framing)**: Care taken. The headline names
|
||
*both* "wedge dissolved" *and* "new blocker", per the spec's matrix.
|
||
Cascade gates are reported separately from the primary gate. Open
|
||
follow-ups (the new producer-underrun wedge on handle 0x1308) are not
|
||
collapsed into the win.
|
||
- **#41 (categorized diff tags)**: N/A this iterate (no diff harness run).
|
||
- **#42 (Phase-A blind to blocked-forever)**: Used `exit-thread-state.json`
|
||
to characterize the new wedge set (Phase-A alone would show only the
|
||
signal-match cascade up to the new block point).
|
||
- **#43 (no budget-cap framing)**: Budget cap (-n 500000000) reached
|
||
but the trace had structural progression throughout, not a wedge.
|
||
Cascade observation is robust at this budget.
|
||
|
||
## Confidence
|
||
|
||
- **HIGH** that the patch is correct and minimal: 30 substantive LOC,
|
||
0 test regressions, determinism preserved bit-for-bit on event count.
|
||
- **HIGH** that the primary keystone gate passes: `signal.match
|
||
target_handle:0x000012e4 waiter_tids:[5]` is exactly the predicted
|
||
unblock — observed unambiguously in the trace.
|
||
- **HIGH** that the cascade is genuine (not just emit-volume noise):
|
||
tid=13 EXITED cleanly is a structural event the baseline never
|
||
achieved in 18 days; 8 new threads spawned that the baseline never
|
||
reached; new handles in the wedge set that didn't exist at baseline.
|
||
- **MEDIUM-HIGH** that the new wedge set (handle 0x1308 semaphore
|
||
producer-underrun, several events without signalers) represents the
|
||
next genuine investigation surface — these are downstream of the
|
||
original wedge and likely have their own causal chain.
|
||
- **MEDIUM** that gameplay is imminent. VdSwap went from 1 to 2 and
|
||
the wallclock reached 51 s, but draws_count was not measured and the
|
||
game is clearly still inside boot phase B. Several more cascade
|
||
iterations likely needed.
|
||
- **LOW** that any of the existing 25+ iterates' specific wedge
|
||
diagnoses (AUDIT-049, 062, 067, 068, 069) directly apply post-fix
|
||
— the geometry has changed enough that prior root-cause analyses
|
||
need re-validation.
|
||
|
||
## Next-iterate recommendation
|
||
|
||
**2.W — investigate the new producer-underrun on handle 0x00001308**
|
||
(semaphore count=0/max=2^31-1, waiters tid=[15, 16] both on CPU3 at
|
||
PC=0x824ac578). Use the existing `signal.match` / `wake.requested`
|
||
event surface (already active) to identify which tids if any are
|
||
releasing this semaphore — if zero, the next root cause is a missing
|
||
producer (AUDIT-069 family); if non-zero but rate is low, it's a
|
||
consume-rate divergence (AUDIT-068 family). ~0-50 LOC.
|
||
|
||
Alternative: **2.X — measure draws/render_targets** to quantify how
|
||
close we are to first gameplay frame. ~30-50 LOC instrumentation in
|
||
xenia-gpu's `D3D_DrawIndexedPrimitive` path.
|
||
|
||
**Strong recommend 2.W first** — the wedge is concrete and the tooling
|
||
already exists.
|
||
|
||
## Artifacts
|
||
|
||
Under `xenia-rs/audit-runs/iterate-2V-scheduler-fairness-fix/`:
|
||
|
||
- `ours-cold.jsonl` (3.13 GB, 13,003,881 events)
|
||
- `ours-cold.stdout.log` (empty — quiet mode)
|
||
- `ours-cold.stderr.log` (single emission-notice line)
|
||
- `exit-thread-state.json` (15.6 KB; 21 alive + 15 wedge_map entries)
|
||
- `ours-cold-run2.{jsonl,stdout.log,stderr.log}` (determinism check —
|
||
bit-identical event count, only 6 events with tiny `guest_cycle`
|
||
drift in a pre-existing spinlock pair)
|
||
- `writer-report.md` (this file)
|
||
|
||
Engine HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` + uncommitted
|
||
2.Q signal.match + 2.T wake.requested + this iterate's 2.V scheduler
|
||
fairness patch. xenia-canary UNCHANGED.
|