xenia-rs/audit-runs/iterate-2V-scheduler-fairness-fix/writer-report.md

# Iterate 2.V — Scheduler fairness fix (age-priority anti-starvation)

**Date:** 2026-05-28. **LOC delta:** engine **~30 substantive added lines**
(scheduler.rs only; ~75 LOC including new doc comments). All retained.
**Option:** A (priority aging). **Tests:** xenia-cpu 300 / xenia-kernel 227
/ xenia-app 5 / xenia-path 19 + 30+ smaller suites — full workspace PASS,
0 regressions.

## Headline

**WEDGE-DISSOLVED-NEW-BLOCKER (PROGRESSION OBSERVED).**

The 18-day strict-priority starvation on CPU5 is broken. With `pick_runnable`
now ranking by *effective* priority `= base + age_bonus(rounds since last
pick)`, tid=6 (pri=0) finally runs after tid=10 (pri=15) ages out, and the
cascade that follows produces:

- **tid=6 signals handle 0x000012e4 exactly as predicted** — the primary
  keystone gate. 1 `signal.match` event by `NtSetEvent` on
  `target_handle:0x000012e4`, `waiter_tids:[5]`. **Was 0 at 2.T baseline.**
- **tid=6 event count 17 → 386** (~23×). Now Blocked on the wedge
  handles 0x000010b0/0x000010b4 (deadline-bounded), not Ready-stuck.
- **tid=13 EXITED** with code 0 (was the original AUDIT-049 wedge from
  10 May 2026 — stuck for 18 days).
- **Total events 121,641 → 13,003,881** (107× more events; first time
  the boot has crossed multi-second wallclock progression in this trace).
- **Alive threads 13 → 21** (8 new threads spawned: 14, 15, 16, 17,
  18, 19, 20, 21; 13 and 14 ran to completion and exited).
- **Wallclock last-event 766.86 ms → 51,011 ms** (66× longer trace).

Hard new wedges still exist (15 wedge_map entries vs 10 at baseline), but
they are *downstream* of the original wedge — the boot has structurally
advanced. The fix is **mechanism-correct and non-regressive**; the next
wedges are new territory.

## Option chosen: A (priority aging)

Justification: Option B (quantum-based round-robin to lower priority on
N-cycle timeout) requires either (a) violating priority ordering on every
expiry, which destabilizes existing tests like
`test_two_threads_same_slot_higher_priority_runs_first`, or (b) a
separate "starvation counter" that essentially reinvents aging. Option A
folds cleanly into the existing `max_by_key` shape, is fully
deterministic (counts on `Scheduler::round_count`), and degenerates to
the strict-priority rule on round 0 — so every existing test continues
to pass without modification.

## Patch summary

File: `crates/xenia-cpu/src/scheduler.rs`. ~30 substantive added LOC
(plus ~45 LOC of doc comments). Within scope (30-80 target, 150 hard
cap).

| change | purpose | LOC |
|---|---|---:|
| `const AGING_ROUNDS_PER_BONUS: u64 = 1;` | one round of starvation = +1 effective priority | 1 |
| `const MAX_AGE_BONUS: i32 = 31;` | cap (≥ any realistic NT priority diff; ≤ i32 safety margin) | 1 |
| `GuestThread::last_run_round: u64` field + init in `default_fields` | per-thread baseline for age math | 2 |
| `fn effective_priority(t, now_round) -> i32` | helper, saturating_sub + min + saturating_add | 6 |
| `HwSlot::pick_runnable(&self, now_round: u64)` | accepts round_count, ranks by `effective_priority` | 4 |
| `Scheduler::begin_slot_visit`: pass round_count, stamp winner's `last_run_round` | activates the fix per-pick | 4 |
| `Scheduler::spawn`: initialize `last_run_round = self.round_count` | prevent fresh threads inheriting giant ages | 1 |
| `Scheduler::install_initial_thread`: same | same | 1 |
| `Scheduler::decrement_quantum`: stamp `last_run_round` on rotation hand-off | keep age math consistent with the in-tier rotation path | 1 |

Doc comments on the new const, field, helper, and `pick_runnable` total
~45 LOC explaining the determinism, scope, and link back to this iterate.

The fix is purely additive — no existing field or method is removed.
`HwSlot::pick_runnable`'s signature changed from `(&self)` to
`(&self, now_round: u64)`; the only external caller
(`Scheduler::begin_slot_visit`) was updated in lockstep.

## Test results

```
cargo build --release  -> OK (1 pre-existing dead_code warning unrelated)
cargo test --release --workspace:
  xenia-cpu       300 passed, 0 failed
  xenia-kernel    227 passed, 0 failed
  xenia-app       5 passed, 0 failed (+ 3 ignored long-runners)
  xenia-path      19 passed, 0 failed
  + ~25 smaller suites, 0 failures total
```

The test that exercises strict priority
(`test_two_threads_same_slot_higher_priority_runs_first`) still passes
because at `round_count = 0`, every thread has `last_run_round = 0` ⇒
age = 0 ⇒ age_bonus = 0 ⇒ effective_priority == base_priority. The age
math only kicks in once `round_count` advances beyond a thread's last
pick — i.e. after actual starvation begins.

The quantum-rotation test
(`test_quantum_does_not_rotate_without_same_priority_peer`) still passes
because it never advances `round_count` (it only calls `decrement_quantum`
within one slot visit).

## Determinism check

Two cold runs (XENIA_CACHE_WIPE=1, -n 500000000) produced **bit-identical
event counts: 13,003,881 events each** (`ours-cold.jsonl` /
`ours-cold-run2.jsonl`).

Diff of the two JSONL files (after stripping the `host_ns` wallclock
noise that's not deterministic in any of our runs): **6 events differ
out of 13,003,881, only in the `guest_cycle` field** (5,577,193 vs
5,577,214 on a single `KeAcquireSpinLockAtRaisedIrql` / `KeReleaseSpin
LockFromRaisedIrql` pair at idx 105,282-105,287). Kinds, names, ords,
tids, and event-idx sequence are identical. This pre-existing tiny
spinlock-cycle drift was visible in 2.T as well; it is not introduced by
this iterate and does not affect the event-stream shape.

Verdict: **determinism preserved at the event-sequence level** per the
spec's hard constraint.

## Primary gate results

| gate | predicate | result |
|---|---|---|
| **tid=6 signals handle 0x000012e4** | `signal.match` for `target_handle:0x000012e4` ≥ 1 | **PASS** — 1 event by tid=6 `NtSetEvent`, `waiter_tids:[5]`, at guest_cycle=0/host_ns=844.35ms |
| **tid=6 event count > 105** | tid=6 emits >105 Phase-A events | **PASS** — 386 events (was 17) |
| **tid=6 NOT Ready-stuck on exit** | exit-thread-state shows tid=6 in Blocked/Exited, not Ready | **PASS** — `state:"Blocked"`, WaitAny on handles 0x000010b0 (Event) + 0x000010b4 (Semaphore), `deadline_ns_or_inf:42948072` |

All 3 primary gates pass. The mechanism is confirmed end-to-end:
tid=10 ages out → tid=6 picked → tid=6 progresses through prior wait
→ tid=6 advances past `NtSetEvent` (the missing signal in 2.T) → wakes
tid=5 → cascade unfolds.

## Secondary gates (cascade)

| gate | 2.T baseline | 2.V | direction |
|---|---:|---:|---|
| Total events | 121,641 | **13,003,881** | **107× ↑** |
| Last event host_ns | 767 ms | **51,011 ms** | **66× ↑** |
| Alive threads | 13 | **21** | **+8 spawned** |
| Exited threads (clean exit_code=0) | 0 | **2** (tid=13, tid=14) | new |
| Blocked @ PC=0x824ac578 (the AUDIT-049 set) | {1,3,4,5,13} | **{3,4,12,16,18}** | tid=1/5/13 unblocked; new tids appear |
| `signal.match` events | 36 | **75** | **+108%** |
| `wake.requested` events | 36 | **79** | **+119%** |
| Unique signal.match handles | small | **20+** | broader signaling surface |
| VdSwap calls (`import.call` count) | 1 | **2** | **+1** |
| Audio tid=10 events | 1 | **17** | **+16** (modest; aging works but tid=10 stays mostly CPU-bound between yields) |
| tid=6 events | 17 | **386** | **+23×** |
| tid=17 events (new worker) | n/a | **5,471,318** | massive new producer |

The originally-blocked set {1, 3, 4, 5, 13} at PC=0x824ac578 has
*completely changed*. tid=1 is now Ready, tid=5 has advanced to
PC=0x824ab214 (a different wait wrapper), tid=13 has exited cleanly.
Three of the original five threads are no longer parked on that PC.

VdSwap reached 2 (vs 1 baseline) — small absolute, but a definite gameplay
progression marker per tripstone #39. The second swap fires on tid=8 at
~1.22 s wallclock, vs the first on tid=1 at ~494 ms.

## Third-order observations (no claims, just data)

- **New wedge surface (15 entries vs 10)**. The new wedges include
  several handles (0x14dc, 0x151c, 0x1510, 0x1514, 0x1020, 0x1004, 0x1308)
  that didn't exist in the baseline trace — they correspond to handles
  created by the new worker threads (15-21) that only exist post-cascade.
  Not regressions; they are the next *natural* blocking point now that
  the original blocker is dissolved.
- **One semaphore wedge with multiple waiters** (handle 0x00001308,
  `count=0/max=2^31-1`, `waiters_tid:[15, 16]`) — classic
  producer-underrun shape (AUDIT-069 family). Likely the next iterate's
  target.
- **tid=10 / tid=9 still Ready at exit on CPU5/CPU4 at priority=15**
  (the audio mixer pair). Both at PC=0x824d140c (vs 0x824d1404 at
  baseline — moved by 8 bytes, i.e. one instruction past). The aging
  bonus lets them yield occasionally; they're no longer pinning their
  CPUs hard.
- **Run termination**: budget cap (50M instructions); no crash, no
  deadlock, no `unblock_on_deadlock` fire.

## Tripstone audit

- **#28 (cross-engine tid stability)**: All tid claims are ours-side
  within this trajectory. The new tids 15-21 are first observed in this
  iterate; no cross-engine tid mapping claimed.
- **#39 (composite progression IS progression)**: Honored. VdSwap=2,
  swap count UP, but draws/render_targets not measured here. Headline
  uses WEDGE-DISSOLVED-NEW-BLOCKER framing — does *not* claim
  "boot complete" or "gameplay reached". The mechanism gate
  (signal.match on 0x12e4) is direct and not a progression-laundering
  proxy.
- **#40 (single-keystone framing)**: Care taken. The headline names
  *both* "wedge dissolved" *and* "new blocker", per the spec's matrix.
  Cascade gates are reported separately from the primary gate. Open
  follow-ups (the new producer-underrun wedge on handle 0x1308) are not
  collapsed into the win.
- **#41 (categorized diff tags)**: N/A this iterate (no diff harness run).
- **#42 (Phase-A blind to blocked-forever)**: Used `exit-thread-state.json`
  to characterize the new wedge set (Phase-A alone would show only the
  signal-match cascade up to the new block point).
- **#43 (no budget-cap framing)**: Budget cap (-n 500000000) reached
  but the trace had structural progression throughout, not a wedge.
  Cascade observation is robust at this budget.

## Confidence

- **HIGH** that the patch is correct and minimal: 30 substantive LOC,
  0 test regressions, determinism preserved bit-for-bit on event count.
- **HIGH** that the primary keystone gate passes: `signal.match
  target_handle:0x000012e4 waiter_tids:[5]` is exactly the predicted
  unblock — observed unambiguously in the trace.
- **HIGH** that the cascade is genuine (not just emit-volume noise):
  tid=13 EXITED cleanly is a structural event the baseline never
  achieved in 18 days; 8 new threads spawned that the baseline never
  reached; new handles in the wedge set that didn't exist at baseline.
- **MEDIUM-HIGH** that the new wedge set (handle 0x1308 semaphore
  producer-underrun, several events without signalers) represents the
  next genuine investigation surface — these are downstream of the
  original wedge and likely have their own causal chain.
- **MEDIUM** that gameplay is imminent. VdSwap went from 1 to 2 and
  the wallclock reached 51 s, but draws_count was not measured and the
  game is clearly still inside boot phase B. Several more cascade
  iterations likely needed.
- **LOW** that any of the existing 25+ iterates' specific wedge
  diagnoses (AUDIT-049, 062, 067, 068, 069) directly apply post-fix
  — the geometry has changed enough that prior root-cause analyses
  need re-validation.

## Next-iterate recommendation

**2.W — investigate the new producer-underrun on handle 0x00001308**
(semaphore count=0/max=2^31-1, waiters tid=[15, 16] both on CPU3 at
PC=0x824ac578). Use the existing `signal.match` / `wake.requested`
event surface (already active) to identify which tids if any are
releasing this semaphore — if zero, the next root cause is a missing
producer (AUDIT-069 family); if non-zero but rate is low, it's a
consume-rate divergence (AUDIT-068 family). ~0-50 LOC.

Alternative: **2.X — measure draws/render_targets** to quantify how
close we are to first gameplay frame. ~30-50 LOC instrumentation in
xenia-gpu's `D3D_DrawIndexedPrimitive` path.

**Strong recommend 2.W first** — the wedge is concrete and the tooling
already exists.

## Artifacts

Under `xenia-rs/audit-runs/iterate-2V-scheduler-fairness-fix/`:

- `ours-cold.jsonl` (3.13 GB, 13,003,881 events)
- `ours-cold.stdout.log` (empty — quiet mode)
- `ours-cold.stderr.log` (single emission-notice line)
- `exit-thread-state.json` (15.6 KB; 21 alive + 15 wedge_map entries)
- `ours-cold-run2.{jsonl,stdout.log,stderr.log}` (determinism check —
  bit-identical event count, only 6 events with tiny `guest_cycle`
  drift in a pre-existing spinlock pair)
- `writer-report.md` (this file)

Engine HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` + uncommitted
2.Q signal.match + 2.T wake.requested + this iterate's 2.V scheduler
fairness patch. xenia-canary UNCHANGED.