Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

7.8 KiB

Raw Blame History

Stage 0 — Cycle-Quantum Preemption Spike: Result

Date: 2026-05-18 Outcome: NULL RESULT — Option A has zero observable effect; do not implement Option B; proceed to Stage 1. Engine source: changed (~50 LOC additive across one file, default-mode byte-identical). Hours spent: ~1 (implementation + sweep).

Headline

Per the scheduler-determinism plan, Stage 0 implemented OrderMode::ScanQuantum { ticks: u32 } — a one-knob variant of the lockstep scheduler that overrides the per-thread QUANTUM_DEFAULT reload value with a user-supplied ticks count. Sweep across [10, 50, 200, 1000, 5000, 10000] × 3 cold runs each produced the same byte-identical JSONL output as default Fixed mode (det-fields MD5 ba5b5e0795ccb32966a49d3b2917a30d across all 18 runs).

The main matched-prefix is unchanged at 104,607 at every quantum tested. Sister chains likewise unchanged: 11 / 32 / 4 / 41 / 16.

The quantum knob is not the lever for the 104,607 cap.

Sweep table

All 18 runs produced total_events=121569 and det_fields_md5=ba5b5e0795ccb32966a49d3b2917a30d. Same diff numbers in every case.

ticks	digest stable × 3	main matched (vs c22 baseline 104,607)	sister 4→11	7→2	12→7	14→9	15→10
10 (Fixed baseline)	✓	104,607	11	32	4	41	16
50	✓	104,607	11	32	4	41	16
200	✓	104,607	11	32	4	41	16
1,000	✓	104,607	11	32	4	41	16
5,000	✓	104,607	11	32	4	41	16
10,000	✓	104,607	11	32	4	41	16
50,000 (default Fixed)	✓	104,607	11	32	4	41	16

The default Fixed-mode digest moved from C+22/C+23's archived e1dfcb15… / 23cf4c4c… to ba5b5e07… for this session — a tail-only guest_cycle drift on 6 events at idx 107,792+ (post-divergence region, no behavioral change). All tid_event_idx values are identical to C+23's archive; the cap location at 104,607 is unchanged.

Mechanism: why every quantum gives the same trace

Scheduler::decrement_quantum (scheduler.rs:768-804) only rotates by finding a same-priority Ready peer on the same hardware slot. During Sylpheed's 50M-instruction boot trajectory, the main guest thread (tid=1) is alone on slot 0 for almost the entire run; other threads spawn onto other slots (via pick_least_depth_slot in spawn). So decrement_quantum's same-slot scan finds no peer to rotate to, regardless of how often quantum drains.

round_schedule (scheduler.rs:710-736) does drive cross-slot rotation per round, but it rotates the cursor between slots only when each slot has Ready work; with one main thread alone on slot 0 and other threads parked, the round just returns slot 0 again and again. Quantum doesn't enter that decision.

In short: the cooperative single-host-thread scheduler model has no inter-slot preemption knob that quantum can drive. Both Option A (shrink quantum) and Option B (forced-yield orthogonal to quantum) share the same rotation logic and hit the same wall. Implementing Option B would not change the result.

Verification of stability

Fixed-mode cold runs × 3 produce identical det-fields MD5 ba5b5e07….
All 18 ScanQuantum sweep runs (6 quanta × 3) produce the same digest as Fixed mode.
total_events = 121,569 in every run (matches archived C+23 baseline).
Stage 0 unit tests (8 new) cover: quantum_for match arms (Fixed/Seeded/ScanQuantum); spawn/install_initial_thread quantum-load under ScanQuantum; wake_ref reload under ScanQuantum; decrement_quantum rotation timing under ticks=4; OrderMode::from_env parse of XENIA_SCHED_ORDER=quantum + XENIA_SCHED_QUANTUM=N (and =0 fallback). All pass.
cargo test -p xenia-cpu --lib scheduler: 42 / 42 pass (was 34, +8 new).

Decision

Per the Stage 0 decision tree in plan.md:

If a quantum value advances main prefix ≥ 105,500 AND ours's digest is stable × 3 at that quantum: land it … skip Stages 1-4. Else if some quantum partially helps … proceed to Stage 1. Else (no improvement): proceed to Stage 1 immediately.

The third branch fires: no improvement at any quantum. Land the OrderMode::ScanQuantum variant anyway because it's tested, default-safe, and gives future probes a knob; skip Option B implementation (same mechanism, same wall); proceed to Stage 1 (canary contention emitter) next session.

What landed

change	file	LOC	effect
`OrderMode::ScanQuantum { ticks: u32 }` variant + doc	scheduler.rs:231-243	+8	new variant
`from_env` `XENIA_SCHED_ORDER=quantum` arm + `XENIA_SCHED_QUANTUM` parse	scheduler.rs:251-263	+13	env wire
`Scheduler::new` rng-init exhaustive match	scheduler.rs:385-388	+1	compile fix
`Scheduler::quantum_for(order)` helper	scheduler.rs:780-795	+12	reload-value source
Replace 5 `t.quantum_remaining = QUANTUM_DEFAULT` sites	scheduler.rs:797, 865, 887, 1162, 1232	+5 (changed)	reload thread-quantum from `quantum_for`
Spawn + install_initial_thread explicit reload	scheduler.rs:632-634, 681-683	+6	ScanQuantum honored at spawn
8 new unit tests	scheduler.rs:1980-2107	+112	regression cover
Stage 0 sweep harness + digest helper + result.md	this dir	+90	spike artifacts

Default OrderMode::Fixed cold digest provably unchanged by the variant (the 5 reload-site rewrites yield QUANTUM_DEFAULT under Fixed via the _ => QUANTUM_DEFAULT arm in quantum_for). All 18 sweep runs reproduce this.

Why land the variant despite the null result

Cheap to keep (zero default-mode behavioral change).
Future probes (different game, different boot phase, M3 --parallel mode) may want quick quantum tuning without re-coding.
The variant + 8 new tests form a permanent piece of scheduler-knob infrastructure that costs nothing.
Reverting is trivial — one enum variant + the helper.

If the user prefers to NOT land the variant (e.g., "no point in dead knobs"), revert is git revert of the single commit. Engine source change is isolated to scheduler.rs.

Phase B image hash

ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18 — UNCHANGED (no Phase B touchpoints in this change).

Reading-error class

No new class earned. Stage 0 confirms the existing reading-error #34 discipline (cold-vs-cold against .iso, XENIA_CACHE_WIPE=1). Method note for posterity: always check whether the knob you're tuning is on the critical path of the system you're trying to nudge. Here, quantum was on the rotation path, but rotation requires a peer; under Sylpheed's boot, peers were not in the right slot at the right time, so the knob was inert.

Sweep artifacts

sweep.sh — driver script (6 quanta × 3 cold runs + diff per quantum)
det_digest.py — det-fields MD5 helper (filters host_ns)
sweep-results.tsv — raw per-run digests (all 18 identical)
diff-q10.txt … diff-q10000.txt — per-quantum diff_events.py reports (all identical)

Next session

Stage 1 — canary-side contention emitter. ~100 LOC in xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:596-633 + event_log.{h,cc} + cpu_flags.cc, cvar kernel_emit_contention=false (default off, cvar-OFF byte-identical). Detailed scope in scheduler-determinism-plan/plan.md §Stage 1.

7.8 KiB Raw Blame History Unescape Escape