# Iterate 2.V — Scheduler fairness fix (age-priority anti-starvation) **Date:** 2026-05-28. **LOC delta:** engine **~30 substantive added lines** (scheduler.rs only; ~75 LOC including new doc comments). All retained. **Option:** A (priority aging). **Tests:** xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 / xenia-path 19 + 30+ smaller suites — full workspace PASS, 0 regressions. ## Headline **WEDGE-DISSOLVED-NEW-BLOCKER (PROGRESSION OBSERVED).** The 18-day strict-priority starvation on CPU5 is broken. With `pick_runnable` now ranking by *effective* priority `= base + age_bonus(rounds since last pick)`, tid=6 (pri=0) finally runs after tid=10 (pri=15) ages out, and the cascade that follows produces: - **tid=6 signals handle 0x000012e4 exactly as predicted** — the primary keystone gate. 1 `signal.match` event by `NtSetEvent` on `target_handle:0x000012e4`, `waiter_tids:[5]`. **Was 0 at 2.T baseline.** - **tid=6 event count 17 → 386** (~23×). Now Blocked on the wedge handles 0x000010b0/0x000010b4 (deadline-bounded), not Ready-stuck. - **tid=13 EXITED** with code 0 (was the original AUDIT-049 wedge from 10 May 2026 — stuck for 18 days). - **Total events 121,641 → 13,003,881** (107× more events; first time the boot has crossed multi-second wallclock progression in this trace). - **Alive threads 13 → 21** (8 new threads spawned: 14, 15, 16, 17, 18, 19, 20, 21; 13 and 14 ran to completion and exited). - **Wallclock last-event 766.86 ms → 51,011 ms** (66× longer trace). Hard new wedges still exist (15 wedge_map entries vs 10 at baseline), but they are *downstream* of the original wedge — the boot has structurally advanced. The fix is **mechanism-correct and non-regressive**; the next wedges are new territory. ## Option chosen: A (priority aging) Justification: Option B (quantum-based round-robin to lower priority on N-cycle timeout) requires either (a) violating priority ordering on every expiry, which destabilizes existing tests like `test_two_threads_same_slot_higher_priority_runs_first`, or (b) a separate "starvation counter" that essentially reinvents aging. Option A folds cleanly into the existing `max_by_key` shape, is fully deterministic (counts on `Scheduler::round_count`), and degenerates to the strict-priority rule on round 0 — so every existing test continues to pass without modification. ## Patch summary File: `crates/xenia-cpu/src/scheduler.rs`. ~30 substantive added LOC (plus ~45 LOC of doc comments). Within scope (30-80 target, 150 hard cap). | change | purpose | LOC | |---|---|---:| | `const AGING_ROUNDS_PER_BONUS: u64 = 1;` | one round of starvation = +1 effective priority | 1 | | `const MAX_AGE_BONUS: i32 = 31;` | cap (≥ any realistic NT priority diff; ≤ i32 safety margin) | 1 | | `GuestThread::last_run_round: u64` field + init in `default_fields` | per-thread baseline for age math | 2 | | `fn effective_priority(t, now_round) -> i32` | helper, saturating_sub + min + saturating_add | 6 | | `HwSlot::pick_runnable(&self, now_round: u64)` | accepts round_count, ranks by `effective_priority` | 4 | | `Scheduler::begin_slot_visit`: pass round_count, stamp winner's `last_run_round` | activates the fix per-pick | 4 | | `Scheduler::spawn`: initialize `last_run_round = self.round_count` | prevent fresh threads inheriting giant ages | 1 | | `Scheduler::install_initial_thread`: same | same | 1 | | `Scheduler::decrement_quantum`: stamp `last_run_round` on rotation hand-off | keep age math consistent with the in-tier rotation path | 1 | Doc comments on the new const, field, helper, and `pick_runnable` total ~45 LOC explaining the determinism, scope, and link back to this iterate. The fix is purely additive — no existing field or method is removed. `HwSlot::pick_runnable`'s signature changed from `(&self)` to `(&self, now_round: u64)`; the only external caller (`Scheduler::begin_slot_visit`) was updated in lockstep. ## Test results ``` cargo build --release -> OK (1 pre-existing dead_code warning unrelated) cargo test --release --workspace: xenia-cpu 300 passed, 0 failed xenia-kernel 227 passed, 0 failed xenia-app 5 passed, 0 failed (+ 3 ignored long-runners) xenia-path 19 passed, 0 failed + ~25 smaller suites, 0 failures total ``` The test that exercises strict priority (`test_two_threads_same_slot_higher_priority_runs_first`) still passes because at `round_count = 0`, every thread has `last_run_round = 0` ⇒ age = 0 ⇒ age_bonus = 0 ⇒ effective_priority == base_priority. The age math only kicks in once `round_count` advances beyond a thread's last pick — i.e. after actual starvation begins. The quantum-rotation test (`test_quantum_does_not_rotate_without_same_priority_peer`) still passes because it never advances `round_count` (it only calls `decrement_quantum` within one slot visit). ## Determinism check Two cold runs (XENIA_CACHE_WIPE=1, -n 500000000) produced **bit-identical event counts: 13,003,881 events each** (`ours-cold.jsonl` / `ours-cold-run2.jsonl`). Diff of the two JSONL files (after stripping the `host_ns` wallclock noise that's not deterministic in any of our runs): **6 events differ out of 13,003,881, only in the `guest_cycle` field** (5,577,193 vs 5,577,214 on a single `KeAcquireSpinLockAtRaisedIrql` / `KeReleaseSpin LockFromRaisedIrql` pair at idx 105,282-105,287). Kinds, names, ords, tids, and event-idx sequence are identical. This pre-existing tiny spinlock-cycle drift was visible in 2.T as well; it is not introduced by this iterate and does not affect the event-stream shape. Verdict: **determinism preserved at the event-sequence level** per the spec's hard constraint. ## Primary gate results | gate | predicate | result | |---|---|---| | **tid=6 signals handle 0x000012e4** | `signal.match` for `target_handle:0x000012e4` ≥ 1 | **PASS** — 1 event by tid=6 `NtSetEvent`, `waiter_tids:[5]`, at guest_cycle=0/host_ns=844.35ms | | **tid=6 event count > 105** | tid=6 emits >105 Phase-A events | **PASS** — 386 events (was 17) | | **tid=6 NOT Ready-stuck on exit** | exit-thread-state shows tid=6 in Blocked/Exited, not Ready | **PASS** — `state:"Blocked"`, WaitAny on handles 0x000010b0 (Event) + 0x000010b4 (Semaphore), `deadline_ns_or_inf:42948072` | All 3 primary gates pass. The mechanism is confirmed end-to-end: tid=10 ages out → tid=6 picked → tid=6 progresses through prior wait → tid=6 advances past `NtSetEvent` (the missing signal in 2.T) → wakes tid=5 → cascade unfolds. ## Secondary gates (cascade) | gate | 2.T baseline | 2.V | direction | |---|---:|---:|---| | Total events | 121,641 | **13,003,881** | **107× ↑** | | Last event host_ns | 767 ms | **51,011 ms** | **66× ↑** | | Alive threads | 13 | **21** | **+8 spawned** | | Exited threads (clean exit_code=0) | 0 | **2** (tid=13, tid=14) | new | | Blocked @ PC=0x824ac578 (the AUDIT-049 set) | {1,3,4,5,13} | **{3,4,12,16,18}** | tid=1/5/13 unblocked; new tids appear | | `signal.match` events | 36 | **75** | **+108%** | | `wake.requested` events | 36 | **79** | **+119%** | | Unique signal.match handles | small | **20+** | broader signaling surface | | VdSwap calls (`import.call` count) | 1 | **2** | **+1** | | Audio tid=10 events | 1 | **17** | **+16** (modest; aging works but tid=10 stays mostly CPU-bound between yields) | | tid=6 events | 17 | **386** | **+23×** | | tid=17 events (new worker) | n/a | **5,471,318** | massive new producer | The originally-blocked set {1, 3, 4, 5, 13} at PC=0x824ac578 has *completely changed*. tid=1 is now Ready, tid=5 has advanced to PC=0x824ab214 (a different wait wrapper), tid=13 has exited cleanly. Three of the original five threads are no longer parked on that PC. VdSwap reached 2 (vs 1 baseline) — small absolute, but a definite gameplay progression marker per tripstone #39. The second swap fires on tid=8 at ~1.22 s wallclock, vs the first on tid=1 at ~494 ms. ## Third-order observations (no claims, just data) - **New wedge surface (15 entries vs 10)**. The new wedges include several handles (0x14dc, 0x151c, 0x1510, 0x1514, 0x1020, 0x1004, 0x1308) that didn't exist in the baseline trace — they correspond to handles created by the new worker threads (15-21) that only exist post-cascade. Not regressions; they are the next *natural* blocking point now that the original blocker is dissolved. - **One semaphore wedge with multiple waiters** (handle 0x00001308, `count=0/max=2^31-1`, `waiters_tid:[15, 16]`) — classic producer-underrun shape (AUDIT-069 family). Likely the next iterate's target. - **tid=10 / tid=9 still Ready at exit on CPU5/CPU4 at priority=15** (the audio mixer pair). Both at PC=0x824d140c (vs 0x824d1404 at baseline — moved by 8 bytes, i.e. one instruction past). The aging bonus lets them yield occasionally; they're no longer pinning their CPUs hard. - **Run termination**: budget cap (50M instructions); no crash, no deadlock, no `unblock_on_deadlock` fire. ## Tripstone audit - **#28 (cross-engine tid stability)**: All tid claims are ours-side within this trajectory. The new tids 15-21 are first observed in this iterate; no cross-engine tid mapping claimed. - **#39 (composite progression IS progression)**: Honored. VdSwap=2, swap count UP, but draws/render_targets not measured here. Headline uses WEDGE-DISSOLVED-NEW-BLOCKER framing — does *not* claim "boot complete" or "gameplay reached". The mechanism gate (signal.match on 0x12e4) is direct and not a progression-laundering proxy. - **#40 (single-keystone framing)**: Care taken. The headline names *both* "wedge dissolved" *and* "new blocker", per the spec's matrix. Cascade gates are reported separately from the primary gate. Open follow-ups (the new producer-underrun wedge on handle 0x1308) are not collapsed into the win. - **#41 (categorized diff tags)**: N/A this iterate (no diff harness run). - **#42 (Phase-A blind to blocked-forever)**: Used `exit-thread-state.json` to characterize the new wedge set (Phase-A alone would show only the signal-match cascade up to the new block point). - **#43 (no budget-cap framing)**: Budget cap (-n 500000000) reached but the trace had structural progression throughout, not a wedge. Cascade observation is robust at this budget. ## Confidence - **HIGH** that the patch is correct and minimal: 30 substantive LOC, 0 test regressions, determinism preserved bit-for-bit on event count. - **HIGH** that the primary keystone gate passes: `signal.match target_handle:0x000012e4 waiter_tids:[5]` is exactly the predicted unblock — observed unambiguously in the trace. - **HIGH** that the cascade is genuine (not just emit-volume noise): tid=13 EXITED cleanly is a structural event the baseline never achieved in 18 days; 8 new threads spawned that the baseline never reached; new handles in the wedge set that didn't exist at baseline. - **MEDIUM-HIGH** that the new wedge set (handle 0x1308 semaphore producer-underrun, several events without signalers) represents the next genuine investigation surface — these are downstream of the original wedge and likely have their own causal chain. - **MEDIUM** that gameplay is imminent. VdSwap went from 1 to 2 and the wallclock reached 51 s, but draws_count was not measured and the game is clearly still inside boot phase B. Several more cascade iterations likely needed. - **LOW** that any of the existing 25+ iterates' specific wedge diagnoses (AUDIT-049, 062, 067, 068, 069) directly apply post-fix — the geometry has changed enough that prior root-cause analyses need re-validation. ## Next-iterate recommendation **2.W — investigate the new producer-underrun on handle 0x00001308** (semaphore count=0/max=2^31-1, waiters tid=[15, 16] both on CPU3 at PC=0x824ac578). Use the existing `signal.match` / `wake.requested` event surface (already active) to identify which tids if any are releasing this semaphore — if zero, the next root cause is a missing producer (AUDIT-069 family); if non-zero but rate is low, it's a consume-rate divergence (AUDIT-068 family). ~0-50 LOC. Alternative: **2.X — measure draws/render_targets** to quantify how close we are to first gameplay frame. ~30-50 LOC instrumentation in xenia-gpu's `D3D_DrawIndexedPrimitive` path. **Strong recommend 2.W first** — the wedge is concrete and the tooling already exists. ## Artifacts Under `xenia-rs/audit-runs/iterate-2V-scheduler-fairness-fix/`: - `ours-cold.jsonl` (3.13 GB, 13,003,881 events) - `ours-cold.stdout.log` (empty — quiet mode) - `ours-cold.stderr.log` (single emission-notice line) - `exit-thread-state.json` (15.6 KB; 21 alive + 15 wedge_map entries) - `ours-cold-run2.{jsonl,stdout.log,stderr.log}` (determinism check — bit-identical event count, only 6 events with tiny `guest_cycle` drift in a pre-existing spinlock pair) - `writer-report.md` (this file) Engine HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` + uncommitted 2.Q signal.match + 2.T wake.requested + this iterate's 2.V scheduler fairness patch. xenia-canary UNCHANGED.