Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.0 KiB
Ours threading model — Phase C+23 characterization
Re-verifies xenia-rs's threading model in the current tree (HEAD per session start). Source-of-truth files re-read this session:
xenia-rs/crates/xenia-cpu/src/scheduler.rs(2094 lines)xenia-rs/crates/xenia-kernel/src/state.rs(2383 lines)xenia-rs/crates/xenia-kernel/src/exports.rs(9370 lines)xenia-rs/crates/xenia-kernel/src/contention_manifest.rs(342 lines)
1. Threading abstraction: single host thread, 6 cooperative HW slots
scheduler.rs defines HW_THREAD_COUNT and Scheduler::round_schedule
(line 730). The Scheduler holds 6 HwSlot runqueues; each runqueue
holds N guest XThreads. There is no host std::thread per guest
thread. The single host thread that owns the CPU walks the slots in
rotation_cursor order, picks the highest-priority Ready thread per
slot, executes a quantum-worth of guest instructions, and moves on.
Compared to canary's 1-host-per-1-guest model, this is cooperative in two senses: only one guest thread runs at a time (no true SMP), and context switches happen only at well-defined emulator boundaries (quantum exhaustion, explicit park, end-of-step).
2. OrderMode enum (scheduler.rs:232)
pub enum OrderMode {
Fixed, // default; ours digest e1dfcb15…
Seeded { seed: u64 }, // pseudo-random shuffle of the round
ScanQuantum { ticks: u32 },// Stage 0 spike, landed but null-result
}
Selected via XENIA_SCHED_ORDER env var (from_env at line 244).
Defaults to Fixed. Plus the env-var XENIA_SCHED_QUANTUM for
ScanQuantum reload.
There is no ContentionReplay variant in the current source today —
the Phase D Stage 3 work landed instead a manifest-consultation
inside rtl_enter_critical_section (exports.rs), not a new
OrderMode (planner's hindsight: putting it in OrderMode would be
cleaner; this is documented as a deviation from the original plan).
3. Per-slot quantum + decrement_quantum (scheduler.rs:800)
decrement_quantum decrements the running thread's
quantum_remaining. On reach-zero it reloads (per quantum_for(order)
at line 793) and scans the slot's runqueue for a same-priority Ready
peer to rotate to. If no peer exists, no rotation happens — the
quantum reload is benign.
Stage 0 (2026-05-18) sweep validated:
- Fixed → ours digest
ba5b5e07…(since Stage 0 baseline; prior baseline wase1dfcb15…before Stage 0 changed default-mode emission). - ScanQuantum × [10, 50, 200, 1000, 5000, 10000] → all byte-identical to Fixed default. Why: tid=1 alone on slot 0 during boot; no peer to rotate to regardless of quantum. Option B (forced-yield across slots) would face the same constraint (and was skipped).
The lesson: rotating within a slot doesn't help; tid=1's monolithic boot region has no other thread on its slot to rotate to.
4. park_current / wake_ref (scheduler.rs:840)
park_current(BlockReason) is the canonical primitive for parking the
currently-running thread. Used by:
RtlEnterCriticalSectionparking onBlockReason::CriticalSection(cs_ptr)(exports.rs ~2927).KeWaitForSingleObjectparking onBlockReason::WaitSingle(handle).- Other primitives.
The wake side calls Scheduler::wake_ref(ref) which transitions
HwState::Blocked → HwState::Ready and re-marks the slot's
non_empty_runnable mask. FIFO queues for each blocking object
(cs_waiters[cs_ptr] etc) live in kernel-state.rs style data.
Key property: parking + waking is deterministic per (host run, input), because every cross-thread interaction goes through the Scheduler which has no host-OS dependency.
5. rtl_enter_critical_section (exports.rs:2886-2946)
Re-read for Phase C+23 verification. Branches:
owner == 0 || !owner_is_live→ claim uncontended.owner == current_tid→ recursive bump.- otherwise → push self onto
cs_waiters[cs_ptr],park_current(BlockReason::CriticalSection(cs_ptr)).
No spin loop. Goes straight to park. This is the deliberate
asymmetry vs canary's cs->header.absolute*256 spin. Documented and
intentional — adding spin to ours would not help; the only way ours
"contends" is if a peer thread has the lock at the exact moment
ours's tid=1 reaches the call.
In the boot region around event 104,604, ours tid=1 is the only runnable thread on slot 0 — no peer is even Ready to take the CS first. So ours invariably fast-paths.
6. Contention manifest loader (contention_manifest.rs)
Phase D Stage 3 landed crates/xenia-kernel/src/contention_manifest.rs
(342 LOC) with consume_at_peek(tid, peek_idx) that translates ours's
per-tid idx back to canary's idx space (subtracts prior
contention.observed emits). XENIA_CONTENTION_MANIFEST_PATH env var
opts in. Per the Stage 3+4 result: replay-mode digest 1d7c6b45…
stable × 3 cold runs, but main matched-prefix still 104,607 — the
manifest's forced-contention entries fire at wrong logical positions
because the divergence is upstream of any contention event.
This is a critical input to C+23's recommendation: the Phase D replay infrastructure is built and stable, but it does NOT unblock the 104,607 cap. The actual cap-unblock came from the D-extension diff-tool absorber (band-aid, Phase D 2026-05-18). The structural fix never landed and has no clear next step.
7. Existing determinism guarantees
- Default-mode ours cold digest
ba5b5e07…× 3 reproducible (Stage 0 / Phase D baseline). Priore1dfcb15…baseline is the C+19 era constant; the Stage 0 emission tweak shifted it without changing logic. - Phase B
image_loaded_sha256 ea8d160e…unchanged across all 23+ phases. - All emitted Phase A events are stable on (input, cvars).
8. Mismatch surfaces with canary
| dimension | canary | ours |
|---|---|---|
| host threads | 1 per XThread | 1 total |
| inter-thread arbiter | host OS | Scheduler |
| RtlEnterCS spin | spin then wait | park immediately |
| Clock | wallclock (rdtsc) | fixed FILETIME 132_500_000_000_000_000 |
| Wait wakeup ordering | pthread_cond_broadcast race | FIFO cs_waiters |
| Yield primitive | host yield | decrement_quantum rotation |
Of these, the clock and the wait wakeup ordering are the two surfaces beyond CS-contention where canary→ours divergence has potential to surface. So far Sylpheed exercises them lightly: 2 KeQuerySystemTime calls, 34 wait.begin events total.
9. Existing scheduler cvars / lockstep modes
There is no lockstep cvar in ours. The closest mode is
OrderMode::Fixed (default), which produces a deterministic schedule
keyed entirely on the spawn/wake sequence. Replay via manifest is
opt-in via XENIA_CONTENTION_MANIFEST_PATH.
10. Implication: ours is the strict side
In any cross-engine deterministic-replay scheme, ours has to bend toward canary, not the other way. Canary's host-OS scheduling cannot be tamed without rewriting it (out of scope; would also invalidate it as the oracle, since the "real" Xbox 360 wasn't deterministic in this sense either). The Phase D plan's H'/H broad landed Stages 1-4 of this bend — the engine infrastructure is built, just not load-bearing for the 104,607 cap.