Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

13 KiB

Raw Blame History

Iterate 2.V — Scheduler fairness fix (age-priority anti-starvation)

Date: 2026-05-28. LOC delta: engine ~30 substantive added lines (scheduler.rs only; ~75 LOC including new doc comments). All retained. Option: A (priority aging). Tests: xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 / xenia-path 19 + 30+ smaller suites — full workspace PASS, 0 regressions.

Headline

WEDGE-DISSOLVED-NEW-BLOCKER (PROGRESSION OBSERVED).

The 18-day strict-priority starvation on CPU5 is broken. With pick_runnable now ranking by effective priority = base + age_bonus(rounds since last pick), tid=6 (pri=0) finally runs after tid=10 (pri=15) ages out, and the cascade that follows produces:

tid=6 signals handle 0x000012e4 exactly as predicted — the primary keystone gate. 1 signal.match event by NtSetEvent on target_handle:0x000012e4, waiter_tids:[5]. Was 0 at 2.T baseline.
tid=6 event count 17 → 386 (~23×). Now Blocked on the wedge handles 0x000010b0/0x000010b4 (deadline-bounded), not Ready-stuck.
tid=13 EXITED with code 0 (was the original AUDIT-049 wedge from 10 May 2026 — stuck for 18 days).
Total events 121,641 → 13,003,881 (107× more events; first time the boot has crossed multi-second wallclock progression in this trace).
Alive threads 13 → 21 (8 new threads spawned: 14, 15, 16, 17, 18, 19, 20, 21; 13 and 14 ran to completion and exited).
Wallclock last-event 766.86 ms → 51,011 ms (66× longer trace).

Hard new wedges still exist (15 wedge_map entries vs 10 at baseline), but they are downstream of the original wedge — the boot has structurally advanced. The fix is mechanism-correct and non-regressive; the next wedges are new territory.

Option chosen: A (priority aging)

Justification: Option B (quantum-based round-robin to lower priority on N-cycle timeout) requires either (a) violating priority ordering on every expiry, which destabilizes existing tests like test_two_threads_same_slot_higher_priority_runs_first, or (b) a separate "starvation counter" that essentially reinvents aging. Option A folds cleanly into the existing max_by_key shape, is fully deterministic (counts on Scheduler::round_count), and degenerates to the strict-priority rule on round 0 — so every existing test continues to pass without modification.

Patch summary

File: crates/xenia-cpu/src/scheduler.rs. ~30 substantive added LOC (plus ~45 LOC of doc comments). Within scope (30-80 target, 150 hard cap).

change	purpose	LOC
`const AGING_ROUNDS_PER_BONUS: u64 = 1;`	one round of starvation = +1 effective priority	1
`const MAX_AGE_BONUS: i32 = 31;`	cap (≥ any realistic NT priority diff; ≤ i32 safety margin)	1
`GuestThread::last_run_round: u64` field + init in `default_fields`	per-thread baseline for age math	2
`fn effective_priority(t, now_round) -> i32`	helper, saturating_sub + min + saturating_add	6
`HwSlot::pick_runnable(&self, now_round: u64)`	accepts round_count, ranks by `effective_priority`	4
`Scheduler::begin_slot_visit`: pass round_count, stamp winner's `last_run_round`	activates the fix per-pick	4
`Scheduler::spawn`: initialize `last_run_round = self.round_count`	prevent fresh threads inheriting giant ages	1
`Scheduler::install_initial_thread`: same	same	1
`Scheduler::decrement_quantum`: stamp `last_run_round` on rotation hand-off	keep age math consistent with the in-tier rotation path	1

Doc comments on the new const, field, helper, and pick_runnable total ~45 LOC explaining the determinism, scope, and link back to this iterate.

The fix is purely additive — no existing field or method is removed. HwSlot::pick_runnable's signature changed from (&self) to (&self, now_round: u64); the only external caller (Scheduler::begin_slot_visit) was updated in lockstep.

Test results

cargo build --release  -> OK (1 pre-existing dead_code warning unrelated)
cargo test --release --workspace:
  xenia-cpu       300 passed, 0 failed
  xenia-kernel    227 passed, 0 failed
  xenia-app       5 passed, 0 failed (+ 3 ignored long-runners)
  xenia-path      19 passed, 0 failed
  + ~25 smaller suites, 0 failures total

The test that exercises strict priority (test_two_threads_same_slot_higher_priority_runs_first) still passes because at round_count = 0, every thread has last_run_round = 0 ⇒ age = 0 ⇒ age_bonus = 0 ⇒ effective_priority == base_priority. The age math only kicks in once round_count advances beyond a thread's last pick — i.e. after actual starvation begins.

The quantum-rotation test (test_quantum_does_not_rotate_without_same_priority_peer) still passes because it never advances round_count (it only calls decrement_quantum within one slot visit).

Determinism check

Two cold runs (XENIA_CACHE_WIPE=1, -n 500000000) produced bit-identical event counts: 13,003,881 events each (ours-cold.jsonl / ours-cold-run2.jsonl).

Diff of the two JSONL files (after stripping the host_ns wallclock noise that's not deterministic in any of our runs): 6 events differ out of 13,003,881, only in the guest_cycle field (5,577,193 vs 5,577,214 on a single KeAcquireSpinLockAtRaisedIrql / KeReleaseSpin LockFromRaisedIrql pair at idx 105,282-105,287). Kinds, names, ords, tids, and event-idx sequence are identical. This pre-existing tiny spinlock-cycle drift was visible in 2.T as well; it is not introduced by this iterate and does not affect the event-stream shape.

Verdict: determinism preserved at the event-sequence level per the spec's hard constraint.

Primary gate results

gate	predicate	result
tid=6 signals handle 0x000012e4	`signal.match` for `target_handle:0x000012e4` ≥ 1	PASS — 1 event by tid=6 `NtSetEvent`, `waiter_tids:[5]`, at guest_cycle=0/host_ns=844.35ms
tid=6 event count > 105	tid=6 emits >105 Phase-A events	PASS — 386 events (was 17)
tid=6 NOT Ready-stuck on exit	exit-thread-state shows tid=6 in Blocked/Exited, not Ready	PASS — `state:"Blocked"`, WaitAny on handles 0x000010b0 (Event) + 0x000010b4 (Semaphore), `deadline_ns_or_inf:42948072`

All 3 primary gates pass. The mechanism is confirmed end-to-end: tid=10 ages out → tid=6 picked → tid=6 progresses through prior wait → tid=6 advances past NtSetEvent (the missing signal in 2.T) → wakes tid=5 → cascade unfolds.

Secondary gates (cascade)

gate	2.T baseline	2.V	direction
Total events	121,641	13,003,881	107× ↑
Last event host_ns	767 ms	51,011 ms	66× ↑
Alive threads	13	21	+8 spawned
Exited threads (clean exit_code=0)	0	2 (tid=13, tid=14)	new
Blocked @ PC=0x824ac578 (the AUDIT-049 set)	{1,3,4,5,13}	{3,4,12,16,18}	tid=1/5/13 unblocked; new tids appear
`signal.match` events	36	75	+108%
`wake.requested` events	36	79	+119%
Unique signal.match handles	small	20+	broader signaling surface
VdSwap calls (`import.call` count)	1	2	+1
Audio tid=10 events	1	17	+16 (modest; aging works but tid=10 stays mostly CPU-bound between yields)
tid=6 events	17	386	+23×
tid=17 events (new worker)	n/a	5,471,318	massive new producer

The originally-blocked set {1, 3, 4, 5, 13} at PC=0x824ac578 has completely changed. tid=1 is now Ready, tid=5 has advanced to PC=0x824ab214 (a different wait wrapper), tid=13 has exited cleanly. Three of the original five threads are no longer parked on that PC.

VdSwap reached 2 (vs 1 baseline) — small absolute, but a definite gameplay progression marker per tripstone #39. The second swap fires on tid=8 at ~1.22 s wallclock, vs the first on tid=1 at ~494 ms.

Third-order observations (no claims, just data)

New wedge surface (15 entries vs 10). The new wedges include several handles (0x14dc, 0x151c, 0x1510, 0x1514, 0x1020, 0x1004, 0x1308) that didn't exist in the baseline trace — they correspond to handles created by the new worker threads (15-21) that only exist post-cascade. Not regressions; they are the next natural blocking point now that the original blocker is dissolved.
One semaphore wedge with multiple waiters (handle 0x00001308, count=0/max=2^31-1, waiters_tid:[15, 16]) — classic producer-underrun shape (AUDIT-069 family). Likely the next iterate's target.
tid=10 / tid=9 still Ready at exit on CPU5/CPU4 at priority=15 (the audio mixer pair). Both at PC=0x824d140c (vs 0x824d1404 at baseline — moved by 8 bytes, i.e. one instruction past). The aging bonus lets them yield occasionally; they're no longer pinning their CPUs hard.
Run termination: budget cap (50M instructions); no crash, no deadlock, no unblock_on_deadlock fire.

Tripstone audit

#28 (cross-engine tid stability): All tid claims are ours-side within this trajectory. The new tids 15-21 are first observed in this iterate; no cross-engine tid mapping claimed.
#39 (composite progression IS progression): Honored. VdSwap=2, swap count UP, but draws/render_targets not measured here. Headline uses WEDGE-DISSOLVED-NEW-BLOCKER framing — does not claim "boot complete" or "gameplay reached". The mechanism gate (signal.match on 0x12e4) is direct and not a progression-laundering proxy.
#40 (single-keystone framing): Care taken. The headline names both "wedge dissolved" and "new blocker", per the spec's matrix. Cascade gates are reported separately from the primary gate. Open follow-ups (the new producer-underrun wedge on handle 0x1308) are not collapsed into the win.
#41 (categorized diff tags): N/A this iterate (no diff harness run).
#42 (Phase-A blind to blocked-forever): Used exit-thread-state.json to characterize the new wedge set (Phase-A alone would show only the signal-match cascade up to the new block point).
#43 (no budget-cap framing): Budget cap (-n 500000000) reached but the trace had structural progression throughout, not a wedge. Cascade observation is robust at this budget.

Confidence

HIGH that the patch is correct and minimal: 30 substantive LOC, 0 test regressions, determinism preserved bit-for-bit on event count.
HIGH that the primary keystone gate passes: signal.match target_handle:0x000012e4 waiter_tids:[5] is exactly the predicted unblock — observed unambiguously in the trace.
HIGH that the cascade is genuine (not just emit-volume noise): tid=13 EXITED cleanly is a structural event the baseline never achieved in 18 days; 8 new threads spawned that the baseline never reached; new handles in the wedge set that didn't exist at baseline.
MEDIUM-HIGH that the new wedge set (handle 0x1308 semaphore producer-underrun, several events without signalers) represents the next genuine investigation surface — these are downstream of the original wedge and likely have their own causal chain.
MEDIUM that gameplay is imminent. VdSwap went from 1 to 2 and the wallclock reached 51 s, but draws_count was not measured and the game is clearly still inside boot phase B. Several more cascade iterations likely needed.
LOW that any of the existing 25+ iterates' specific wedge diagnoses (AUDIT-049, 062, 067, 068, 069) directly apply post-fix — the geometry has changed enough that prior root-cause analyses need re-validation.

Next-iterate recommendation

2.W — investigate the new producer-underrun on handle 0x00001308 (semaphore count=0/max=2^31-1, waiters tid=[15, 16] both on CPU3 at PC=0x824ac578). Use the existing signal.match / wake.requested event surface (already active) to identify which tids if any are releasing this semaphore — if zero, the next root cause is a missing producer (AUDIT-069 family); if non-zero but rate is low, it's a consume-rate divergence (AUDIT-068 family). ~0-50 LOC.

Alternative: 2.X — measure draws/render_targets to quantify how close we are to first gameplay frame. ~30-50 LOC instrumentation in xenia-gpu's D3D_DrawIndexedPrimitive path.

Strong recommend 2.W first — the wedge is concrete and the tooling already exists.

Artifacts

Under xenia-rs/audit-runs/iterate-2V-scheduler-fairness-fix/:

ours-cold.jsonl (3.13 GB, 13,003,881 events)
ours-cold.stdout.log (empty — quiet mode)
ours-cold.stderr.log (single emission-notice line)
exit-thread-state.json (15.6 KB; 21 alive + 15 wedge_map entries)
ours-cold-run2.{jsonl,stdout.log,stderr.log} (determinism check — bit-identical event count, only 6 events with tiny guest_cycle drift in a pre-existing spinlock pair)
writer-report.md (this file)

Engine HEAD e6d43a23ac393004d2e5adf2f0395fd0b5e6448b + uncommitted 2.Q signal.match + 2.T wake.requested + this iterate's 2.V scheduler fairness patch. xenia-canary UNCHANGED.

13 KiB Raw Blame History Unescape Escape