handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions
--- a/audit-runs/scheduler-determinism-plan/approach-matrix.md
+++ b/audit-runs/scheduler-determinism-plan/approach-matrix.md
@@ -0,0 +1,76 @@
+# Approach Tradeoff Matrix
+
+Each approach is evaluated against the same criteria. The recommended approach is **H'** (manifest replay, scoped to RtlEnterCS), gated on Stage 0 spike.
+
+## Criteria
+
+- **Eng LOC**: estimated engine-source modification (ours + canary).
+- **Tool LOC**: estimated diff-tool / python tooling.
+- **Test LOC**: estimated tests.
+- **Unblocks 104,607?**: probability of advancing the main matched-prefix past the current cap.
+- **Preserves ours digest**: whether `e1dfcb1559f987b35012a7f2dc6d93f5` (Phase A) and `ea8d160e…` (Phase B) remain unchanged in the *default* mode.
+- **Preserves canary default**: whether canary's default-mode (no new cvar) cold-run behavior is byte-identical.
+- **Wine-constraint**: whether the approach requires changing Wine itself (always: NO — out of scope).
+- **Reading-error risk**: which class of reading error this approach risks crossing.
+
+## Matrix
+
+| approach | Eng LOC | Tool LOC | Test LOC | Unblocks 104,607? | Preserves ours digest | Preserves canary default | Reading-error risk | Verdict |
+|---|---|---|---|---|---|---|---|---|
+| **A — cycle-counted clock in canary** | ~200 (`base/clock.cc`) | 0 | ~50 | NO (Sylpheed: 2 KeQuerySystemTime calls) | yes | yes (cvar-gated) | #19 (wrong-target) | **WRONG TARGET** |
+| **B — single-thread cooperative canary** | ~2000-3000 (`xthread.cc`, `threading*.cc`, `processor.cc`) | ~50 | ~300 | YES | yes | NO — fundamentally changes scheduling | #28 (rewrite-without-verify) | **OVERSCOPED** |
+| **C/H — manifest replay, broad (CS + wait)** | ~600-700 | ~200 | ~200 | YES (with risk in wait-side semantics) | yes (default-off) | yes (cvar-off) | #23 (synthetic events) | **2nd choice** |
+| **H' — manifest replay, scoped to RtlEnterCS** | ~450-500 | ~180 | ~150 | YES | yes (default-off) | yes (cvar-off) | #23 (bounded) | **RECOMMENDED** |
+| **D — diff-harness absorption extension** | 0 | ~150 (diff_events.py) | ~50 | PARTIAL (10-100 idx) | yes | yes | #23 (FOLDS REAL GUEST CODE) | **fallback only** |
+| **E — A+D hybrid** | ~200 | ~150 | ~100 | LOW (clock isn't the lever; D hits #23 wall) | yes | yes | #19 + #23 | **band-aid** |
+| **F — make ours preemptive** | ~500 (`scheduler.rs`) | 0 | ~100 | UNKNOWN (no replay anchor) | NO — destabilizes cold digest | n/a | #28 (loses 23 phases of stabilization) | **WRONG DIRECTION** |
+| **Stage 0 spike — cycle-quantum preemption** | ~80 (`scheduler.rs`) | 0 | ~40 | TBD by spike | TBD (default `Fixed` unchanged) | n/a | #19 (premature optimization if not validated) | **GATE** |
+| spin-then-wait fix in ours | ~50 (`exports.rs:2886`) | 0 | ~30 | NO (wrong direction: adding spin makes contention *less* likely on ours's side) | yes | n/a | #28 (verified — would not help 104,607) | **document, defer** |
+
+## Detailed reasoning
+
+### Why H' over C/H (broad)
+
+The broad variant (C/H) covers both `RtlEnterCriticalSection` and `KeWaitForSingleObject`. Phase 1 evidence shows:
+
+- 19,494 RtlEnter calls in Sylpheed's boot
+- 34 wait.begin events total
+
+The CS surface is ~570× larger than the wait surface. Adding wait-side replay buys little. More importantly, wait-side replay has tougher semantics: when canary's KeWaitForSingleObject fires on a TIMER (with a host-wallclock deadline), ours can't replay because ours doesn't have a wallclock to match.
+
+H' defers wait-side replay until evidence shows it's needed (backstop in `plan.md` §Backstop).
+
+### Why H' over B (single-thread canary)
+
+B fundamentally changes the oracle. The oracle's stability across phases is a foundational invariant; modifying its scheduling layer introduces game-compatibility risk that we cannot fully test (only Sylpheed is in scope, but canary supports many titles). LOC is also 4-6× larger.
+
+H' leaves the oracle's behavior unchanged in the default case. The contention emitter (Stage 1) is a passive observer; the manifest captures one canary cold run as canonical and ours replays it. Canary is not asked to be deterministic — it's asked to *report* its non-determinism.
+
+### Why H' over D (diff absorber extension)
+
+The current C+21 absorber is already at the safe limit of reading-error #23. Extending the absorber to fold "post-wait nested Enter/Leave blocks" would hide REAL guest-code execution differences. The canary side's nested-Enter reads mutated memory and modifies state (lock_count, recursion_count) that affects subsequent events. Folding it at the diff layer means downstream divergences are misattributed.
+
+D remains as a *backstop* (plan.md §Backstop item 2) for residual gaps post-Stage-3, with explicit reading-error annotation.
+
+### Why H' over F (make ours preemptive)
+
+23 phases (C+1 through C+23) have stabilized ours's cold digest. Changing the default scheduler to preempt at fixed intervals would invalidate every prior baseline. Even if the new digest is stable, it severs continuity with the existing test infrastructure and audit-run archives.
+
+H' preserves ours's default `OrderMode::Fixed`. The replay mode is opt-in via `--scheduler-replay-manifest PATH`. Default-mode digest is provably unchanged (Stage 3 validation #2).
+
+### Why Stage 0 first
+
+Cost is 1 day, 80 LOC. If a tuned quantum advances the prefix past 104,607 with a stable digest, the manifest work (Stages 1-4, ~450-500 LOC, 3-5 sessions) is unnecessary. Even if Stage 0 doesn't fully unblock, the data informs the manifest design (e.g., "quantum=200 advances prefix by 800 events but stalls at 105,407" tells us the next divergence is a different class).
+
+Stage 0 is *strictly dominated* by approaches that include it. Skipping risks doing 500 LOC of unnecessary work.
+
+### Why NOT spin-then-wait fix in ours
+
+The 104,607 divergence is canary contending, ours NOT contending. Adding spin to ours would make ours's RtlEnterCS try harder to acquire without parking — which makes contention *less* likely on the ours side, the OPPOSITE of what we need. Documenting the spin asymmetry is valuable for future divergences in the opposite direction (where ours spuriously contends and canary doesn't), but it's not the lever for 104,607.
+
+## Open tradeoffs (decisions deferred to user / Stage 0 outcome)
+
+- **Stage 0 alone might suffice**: if quantum=N produces a stable digest matching canary's behavior at 104,607, the plan collapses to a single 80-LOC change. Stage 0 decision tree is in plan.md.
+- **Sister chain regression budget**: -5 per sister. If exceeded post-Stage-3, scope manifest to tid=6 only initially, then iterate.
+- **Wait-side replay (broad H)**: deferred unless sister chains (esp tid=12→7 timeout class) need it. Backstop only.
+- **Approach D extension as final band-aid**: documented in backstop with explicit #23 annotation. Only land if Stages 0-4 leave residual divergence with no other path forward.
--- a/audit-runs/scheduler-determinism-plan/canary-variance.md
+++ b/audit-runs/scheduler-determinism-plan/canary-variance.md
@@ -0,0 +1,69 @@
+# Canary Variance Characterization — Reading-Error #32
+
+## Source data
+
+Re-analysis of the C+22 archived jitter jsonls + ours-cold.jsonl from `xenia-rs/audit-runs/phase-c22-rtl-enter-leave-control-flow/`. No fresh runs done in this session — the C+22 samples (4 canary cold runs + 1 ours cold run) are sufficient to characterize.
+
+## Files inspected
+
+- `canary-cold-trunc.jsonl` (494 MB, truncated to ~250k tid=6 events) — fresh c22
+- `ours-cold.jsonl` (28 MB, 121,569 events)
+- Archived: jitter-1, jitter-2, jitter-3 (referenced in C+22 memory + `investigation.md`)
+- `cold-vs-cold-result.md` — variance table
+
+## Variance summary at tid=6 idx 104,604..104,620
+
+Pattern of `import.call` events (E = RtlEnterCriticalSection, L = RtlLeaveCriticalSection):
+
+| sample | observed pattern | wait.begin slow-path? | notes |
+|---|---|---|---|
+| C+21 archived (jitter-2 equivalent) | E E L L | no | fast-path acquire, fast-path nested-acquire, two releases |
+| canary jitter-1 | E **wait.begin** E L L | yes (between first E's call and return) | slow-path on the OUTER acquire |
+| canary jitter-2 | E E L L | no | same as C+21 |
+| canary jitter-3 | E E L L (shifted by +3 indices upstream) | no | upstream tid=6 events have different ordering |
+| fresh c22 | E **wait.begin** E L L | yes | same shape as jitter-1 |
+| **ours cold** | **E L NtClose** | no | NO nested acquire; releases and proceeds to close |
+
+## Key observations
+
+1. **Canary 5/5 samples** have the second (nested) `E` regardless of whether the outer acquire took the slow path. The nested-Enter is canary-structural, not jitter.
+2. **wait.begin presence varies**: 2 of 5 canary samples emit it, 3 of 5 don't. The C+21 floating absorber correctly masks both cases via the shared-global SID `75ae880ec432eb36`.
+3. **Ours-cold takes a different control-flow path**: no second E, no nested cleanup, proceeds straight to NtClose. This is `RtlLeaveCriticalSection` followed by `NtClose` on the Event handle that the CS was protecting.
+4. The C+21 floating-absorb engages correctly in all canary samples (`floating_create (c/o) = 1/0`, `floating_wait (c/o)` varies 0-3/0). Matched-prefix is invariant at 104,607 across all canary cold samples after absorption.
+
+## The structural divergence
+
+After the C+21 absorber runs, the next event index on each side is:
+
+- **Canary**: `import.call RtlEnterCriticalSection` (the nested second E at canary idx 104,610, post-absorption-aligned to ours idx 104,607).
+- **Ours**: `import.call RtlLeaveCriticalSection` (the simple release at ours idx 104,607).
+
+These are different guest control-flow paths. Both are correct executions of the SAME guest code under different scheduling assumptions:
+
+- **Canary path**: tid=6 blocked on the dispatcher Event while another guest thread acquired the CS, mutated protected state (queue ptr / refcount / signaled flag), released, transferred the CS to tid=6. tid=6 woke, post-acquire branch reads MUTATED state, takes nested-cleanup path.
+- **Ours path**: tid=1 (mapped from canary tid=6) was running monolithically under the cooperative scheduler. No other thread ran during what would have been the wait window. Post-acquire branch reads PRE-WAIT state (unchanged), takes simple-release path.
+
+## Variance taxonomy
+
+| variance dimension | observable | absorbable by current diff tool? | root cause |
+|---|---|---|---|
+| Whether wait.begin event fires | yes (event present/absent) | YES (C+21 absorber, shared-global SID) | host-OS scheduler decided contention/no-contention timing |
+| Index offset in upstream events | yes (idx shifts ±3 across samples) | partial (C+21 absorbs ≤1 floating per side) | upstream contention propagates index drift |
+| Whether nested Enter/Leave block fires | yes (E E L L vs E L) | NO (would cross reading-error #23) | post-wait state mutation by another thread; real guest control-flow |
+| First-toucher tid for shared dispatcher | yes (varies tid=9, others) | YES (C+18 shared-global SID scheduling-invariant) | host-OS scheduler decided first-thread-touches-dispatcher |
+| handle.create raw_handle_id | yes (differs across runs) | YES (SKIP_PAYLOAD_FIELDS) | canary stashes handle-table slot; ours uses dispatcher VA |
+| KeQuerySystemTime returned value | yes (wallclock vs fixed) | partial (already-known void-export pattern from C+1) | canary wallclock vs ours fixed FILETIME |
+
+## What this means for the plan
+
+The C+21 absorber handles the *observation-side* jitter (the wait.begin event itself; the upstream index drift) up to the boundary of reading-error #23. Past 104,607, the variance becomes *state-side*: canary's tid=6 reads mutated protected state, ours's tid=1 doesn't. No event-level absorption can hide a different sequence of guest-code-executed instructions.
+
+This is why the plan recommends approach H' (manifest replay): make ours produce the same state-side outcome (mutated CS state after a real wait) so that ours's tid=1 takes the same nested-cleanup path canary's tid=6 takes. The absorber stays unchanged; ours's events become structurally identical to canary's.
+
+## Fresh re-runs not performed
+
+This session is plan-only — no fresh `wine xenia_canary --mute=true` cold runs. The C+22 jitter-1/2/3 + c21 + c22 samples are sufficient to characterize variance for plan-design purposes. Fresh re-runs will happen during Stage 0 spike and Stage 1 implementation per the validation criteria in `plan.md`.
+
+## Reading-error #32 status
+
+**MITIGATED** at the diff-tool layer for shared-global SIDs (C+18) and wait.begin (C+21). Residual variance at 104,607 is OUT of #32 scope — it's state-mutation timing, addressed by the plan's Stage 3 forced-contention replay.
--- a/audit-runs/scheduler-determinism-plan/investigation.md
+++ b/audit-runs/scheduler-determinism-plan/investigation.md
@@ -0,0 +1,206 @@
+# Investigation Notes — Scheduler-Determinism Plan (2026-05-18)
+
+Source citations and probe results from the Phase-1 investigation. All claims here are verified against source or runtime data; speculation is flagged.
+
+## 1. Canary threading & scheduling model
+
+**Verdict**: 1-host-thread-per-XThread; scheduling delegated to host OS (Wine on Linux). No internal scheduler.
+
+- Each guest `XThread` owns a host `xe::threading::Thread` (`xenia-canary/src/xenia/kernel/xthread.h:476`).
+- POSIX backend: pthread per XThread (`xenia-canary/src/xenia/base/threading_posix.cc`).
+- TLS bridge: `thread_local XThread* current_xthread_tls_` (`xthread.cc:105`). `XThread::TryGetCurrentThread()` returns null when called outside a guest thread (C+15-α robustness fix for the boot-time emitter).
+- Tid assignment: `thread_id_(++next_xthread_id_)` in ctor (`xthread.cc:62`).
+- KPCR per XThread, allocated at `pcr_address_` (`xthread.h:506`); contains scheduler-like state mirroring real Xenon KPRCB.
+- `CheckQuantumAndDecay()` (`xthread.h:437`) fires ~20ms via `KernelState`'s timer — simulates Xenon priority decay but does NOT preempt; runs on whichever host thread the host OS schedules.
+
+**No internal scheduler.** No `lockstep`, `deterministic`, `replay` cvar (grep confirmed across `xenia-canary/src/xenia/`).
+
+## 2. Canary clock infrastructure
+
+**Verdict**: wallclock-driven (rdtsc or platform API). Optional scaling, no full deterministic mode.
+
+- Canonical class `xe::Clock` (`base/clock.h:30`).
+- `Clock::QueryHostTickCount()` (`base/clock.cc:128`): rdtsc on x64 if `clock_source_raw=true`, else platform API.
+- `Clock::QueryGuestSystemTime()` (`clock.h:82`): host time adjusted by `guest_time_scalar_`.
+- `KeQuerySystemTime_entry` (`xboxkrnl_threading.cc:459`): declared `void`, writes via OUT pointer; reads `Clock::QueryGuestSystemTime()`. (C+1 verified parity with ours's void-export framing.)
+- `KeWaitForSingleObject_entry` (`xboxkrnl_threading.cc:1003`): reads `*timeout_ptr` as i64×100 → ns (C+23 verified ours computes the same value).
+- Cvars: `clock_no_scaling` (`base/clock.cc:24`), `clock_source_raw` (`base/clock.cc:28`). Neither makes the clock deterministic across Wine runs — wallclock drift is irreducible.
+
+## 3. Canary wait primitives
+
+**Verdict**: `xe::threading::Wait` → `pthread_cond_timedwait` (POSIX) / `WaitForMultipleObjects` (Win32).
+
+- `xeKeWaitForSingleObject()` (`xboxkrnl_threading.cc:969`) → `XObject::Wait()` → `xe::threading::Wait()` → host primitive.
+- Whether contention happens is purely host-OS-scheduler-driven. Reading-error #32 from C+20 documents this: 3 fresh canary cold runs at tid=6 idx 104,606 showed different patterns (no wait.begin / wait.begin contended / offset-shifted).
+
+## 4. Canary RtlEnterCriticalSection — spin-then-wait (DISCOVERED)
+
+[xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:596-633](../../../xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc) — `RtlEnterCriticalSection_entry`:
+
+```c
+uint32_t spin_count = cs->header.absolute * 256;  // game-supplied spin count
+if (cs->owning_thread == cur_thread) { recursion++; return; }
+while (spin_count--) {
+  if (xe::atomic_cas(-1, 0, &cs->lock_count)) { /* acquired via spin */ break; }
+}
+if (xe::atomic_inc(&cs->lock_count) != 0) {
+  xeKeWaitForSingleObject(...);  // slow path
+}
+cs->owning_thread = cur_thread; cs->recursion_count = 1;
+```
+
+**Implication**: under low contention, spin succeeds and no `wait.begin` is emitted. Under high contention, spin fails and `wait.begin` fires. Whether spin succeeds depends on host-OS timing — non-deterministic across Wine runs.
+
+## 5. Ours threading & scheduling
+
+**Verdict**: single host thread; 6 cooperative HW slots; deterministic by construction.
+
+- `xenia-rs/crates/xenia-cpu/src/scheduler.rs`:
+  - `OrderMode { Fixed, Seeded { seed } }` (lines 230-258).
+  - `round_schedule()` (lines 710-740): returns slot-id vector; advances `rotation_cursor` by 1.
+  - `park_current(BlockReason)` (line 808).
+  - `wake_ref(ThreadRef)` (line 831).
+- M3 optional `--parallel` mode (6 workers + coordinator, 7-party phaser) exists but is not default.
+
+**Determinism foundation**: 23 phases of stabilization invested in `e1dfcb15…` cold digest × 3 reproducible.
+
+## 6. Ours RtlEnterCriticalSection — NO spin
+
+[xenia-rs/crates/xenia-kernel/src/exports.rs:2886-2946](../../../xenia-rs/crates/xenia-kernel/src/exports.rs) — `rtl_enter_critical_section`:
+
+```rust
+let owner = mem.read_u32(cs_ptr + CS_OFFS_OWNING_THREAD);
+let owner_is_live = owner != 0 && state.scheduler.find_by_tid(owner).is_some();
+if owner == 0 || !owner_is_live {
+  /* claim immediately — write owning_thread, lock_count=0, recursion=1 */
+  return;
+}
+if owner == current_tid { /* recursive lock — increment counts */ return; }
+// Truly contended against a live peer — park IMMEDIATELY (no spin).
+state.cs_waiters.entry(cs_ptr).or_default().push(current_ref);
+state.scheduler.park_current(BlockReason::CriticalSection(cs_ptr));
+```
+
+**Asymmetry summary**: canary spins ~256×N times before parking; ours parks immediately. Under the cooperative scheduler, ours's tid=1 runs monolithically until it parks — no other thread has a chance to acquire the CS first. Hence at 104,607, the CS is free when tid=1 tries, while in canary it was held by another thread that got scheduled in between.
+
+## 7. Ours clock infrastructure
+
+**Verdict**: fixed FILETIME constant. No wallclock dependency in the hot path.
+
+- `KeQuerySystemTime` returns `132_500_000_000_000_000` (~2021) via OUT-ptr (`exports.rs:628`).
+- `KeQueryInterruptTime` returns `0x0000_0001_0000_0000` (`exports.rs:504`).
+- `event_log.rs` uses `Instant::now()` for the observability `host_ns` field — non-deterministic but not consumed by the matched-prefix metric.
+
+## 8. Sylpheed workload profile (probe)
+
+Ran on `xenia-rs/audit-runs/phase-c22-rtl-enter-leave-control-flow/ours-cold.jsonl` (121,569 events):
+
+| event | count | notes |
+|---|---|---|
+| RtlEnterCriticalSection (kernel.call) | 19,494 | ≈80% of all kernel.calls |
+| RtlLeaveCriticalSection (kernel.call) | 19,492 | matches Enter (off-by-2 from boot edge) |
+| NtClose | 160 | |
+| NtCreateEvent | 103 | |
+| NtReleaseSemaphore | 99 | |
+| NtQueryInformationFile | 93 | |
+| NtWaitForMultipleObjectsEx | 92 | |
+| KeWaitForSingleObject | 5 | |
+| KeWaitForMultipleObjects | 1 | |
+| **KeQuerySystemTime** | **2** | clock-light workload |
+| KeQueryPerformanceFrequency | 6 | |
+| KeQueryPerformanceCounter | 0 | |
+| KeQueryInterruptTime | 0 | |
+| KeDelayExecutionThread | 0 | |
+| NtYieldExecution | 0 | |
+| wait.begin events (all kinds) | 34 | most with `timeout_ns=-1` (indefinite) |
+
+**Implications**:
+- Sylpheed is CS-dominated. Stage-1 emitter on RtlEnterCS captures the dominant signal.
+- Sylpheed barely touches the clock. Approach A (cycle clock in canary) addresses ≈2 events out of 121,569. Wrong target.
+- Wait surface is small (34 events). Wait-side replay is low-value; scope to CS only.
+
+## 9. The 104,607 divergence (re-verified)
+
+From C+22 memory + jitter jsonl re-analysis:
+
+| sample | tid=6 events 104,604..104,615 (import.call only) |
+|---|---|
+| c21 archived | E E L L |
+| canary jitter-1 | E (wait.begin slow path) E L L |
+| canary jitter-2 | E E L L |
+| canary jitter-3 | (shifted) E E L L |
+| fresh c22 | E (wait.begin slow path) E L L |
+
+All canary samples have the EXTRA nested RtlEnterCriticalSection (second `E` before the final `L L`). Ours never does — it goes `E L NtClose`. Structural divergence post-absorber-engagement.
+
+Shared dispatcher: canary's wait.begin `handles_semantic_ids=['75ae880ec432eb36']` — this is the CS embedded Event dispatcher, lazy-wrapped by `XObject::GetNativeObject`. Same SID computed via C+18 shared-global recipe in both engines.
+
+## 10. Cvar inventory (canary side)
+
+Grep across `xenia-canary/src/xenia/` for `DEFINE_bool|DEFINE_int|DEFINE_uint|DEFINE_string`:
+
+- `clock_no_scaling` (`base/clock.cc:24`)
+- `clock_source_raw` (`base/clock.cc:28`)
+- `ignore_thread_priorities` (`kernel/xthread.cc:30`)
+- `ignore_thread_affinities` (`kernel/xthread.cc:33`)
+- `stack_size_multiplier_hack` (`kernel/xthread.cc:37`)
+- `main_xthread_stack_size_multiplier_hack` (`kernel/xthread.cc:39`)
+- `phase_a_event_log_path` (`cpu/cpu_flags.cc:84`) — Phase A trace gate
+- `phase_a_event_log_mem_writes` (`cpu/cpu_flags.cc:88`) — reserved, not wired
+- `phase_b_snapshot_dir` (`cpu/cpu_flags.cc:94`) — Phase B image snapshot
+- `phase_b_snapshot_and_exit` (`cpu/cpu_flags.cc:100`)
+
+No `lockstep`, `deterministic`, `replay`, `single_thread`, `cooperative` cvars exist. **No built-in deterministic mode.**
+
+## 11. Diff-tool absorber state (post-C+21)
+
+`xenia-rs/tools/diff-events/diff_events.py` (767 LOC):
+
+- `collect_shared_global_sids()`: pre-pass union of (a) recipe-matching SIDs (C+18) and (b) cross-tid usage heuristic — any SID used by handle.create OR wait.begin on ≥2 distinct tids.
+- `is_shared_global_wait_begin()`: classifies a wait.begin as floating if any handle_sid is in the shared-global set.
+- `diff_one_tid()`: floating-absorbs `handle.create` (C+18) and `wait.begin` (C+21) on kind mismatches.
+- `SKIP_PAYLOAD_FIELDS_BY_KIND`: skips engine-local fields per kind.
+
+**Reading-error #23 boundary**: absorbing the post-wait Enter/Leave block (canary's extra `E` then `L` at 104,610-104,615) would be folding real guest behavior, not transient observation. The plan's Stage 3 instead makes ours produce the same observation by forcing ours into the same contended state.
+
+## 12. Tid-chain mapping (stable per memory baseline)
+
+| canary | ours |
+|---|---|
+| 6 | 1 |
+| 4 | 11 |
+| 7 | 2 |
+| 12 | 7 |
+| 14 | 9 |
+| 15 | 10 |
+
+This is a *display* convention for cross-engine alignment in diff reports. In the wire format, each engine emits its native tid. The manifest in Stage 2-3 keys on the source-side native tid — no translation needed since each side consumes events it produced.
+
+## 13. Methodology rules in force
+
+- **Reading-error #28** (verify source first): applied — read both engines' RtlEnterCS implementations before designing.
+- **Reading-error #32** (canary non-deterministic in contention regions): characterized — 3 jitter samples documented.
+- **Reading-error #33** (canary cache lives in binary-dir under wine): not relevant here.
+- **Reading-error #34** (use `.iso` not loose `.xex`): apply in all validation runs.
+- **Cold-vs-cold protocol**: canary `--mute=true`, ours `XENIA_CACHE_WIPE=1`.
+- **Stop hook rename**: rename background binaries before any backgrounded run (e.g. `xrs-verify-stage0`, `xrs-replay`).
+
+## 14. Confidence calibration
+
+| claim | source-verified | probe-verified | confidence |
+|---|---|---|---|
+| Canary spins, ours doesn't | yes (xboxkrnl_rtl.cc:613 + exports.rs:2927) | n/a (static) | high |
+| Sylpheed clock-light | n/a | yes (kernel.call counts) | high |
+| 104,607 divergence is structural | yes (C+22 mech) | yes (5 canary samples consistent) | high |
+| C+18 shared-global SID is cross-engine identical | yes (event_log.rs + event_log.cc) | implicit (matched in diff reports) | high |
+| Canary has no deterministic mode cvar | yes (grep) | n/a | high |
+| Stage-0 quantum spike may unblock | no (untested) | no | medium |
+| Stage-3 manifest replay unblocks | no | no | medium-high (mechanism sound, integration risk) |
+| Sister chain regression ≤5 acceptable | n/a | n/a | open question for user |
+
+## Open unknowns (deferred to implementation)
+
+1. The exact `cs_ptr` of the contended CS at canary tid=6 idx 104,608 is not directly emitted by the current schema (the `wait.begin` payload carries SID but not the raw pointer). Stage 1's `cs_ptr` field plugs this gap.
+2. Does Sylpheed initialize the contended CS with `RtlInitializeCriticalSectionAndSpinCount(spin_count > 0)` or just `RtlInitializeCriticalSection` (zero spin)? Affects whether canary's spin path can succeed at this site. Probe by reading the cs's `header.absolute` field during a canary run.
+3. The dispatcher Event's first-toucher tid differs across cold runs (canary tid=9 in one, others in others). Does this stable enough across cold runs of the SAME canary binary to be a reliable replay anchor? Stage 1 round-trip validation will reveal.
+4. Does the M3 `--parallel` mode in ours reproduce the same divergence pattern? Untested. Out of scope for this plan but worth a future probe.
--- a/audit-runs/scheduler-determinism-plan/plan.md
+++ b/audit-runs/scheduler-determinism-plan/plan.md
@@ -0,0 +1,288 @@
+# Plan: Unblock the 104,607 Scheduler-Determinism Cap
+
+## Context
+
+The Phase A matched-prefix between xenia-canary (oracle, C++) and xenia-rs ("ours", Rust) is structurally capped at **104,607 events** on the main chain (canary tid=6 → ours tid=1). C+20 and C+22 escalated the divergence at idx 104,607 as **class (A) scheduler-determinism** — not a fixable bug in either engine, but a fundamental mismatch in scheduling philosophy.
+
+**The two engines are correct independently; their scheduling models cannot agree:**
+
+| dimension | canary (oracle) | ours |
+|---|---|---|
+| host-thread mapping | 1 host std::thread per XThread (`xthread.cc:62/358/476`) | single host thread, 6 cooperative HW slots (`scheduler.rs:230-258`) |
+| who picks next runnable | host OS (Wine on Linux) — non-deterministic | `round_schedule` over `OrderMode::Fixed` rotation cursor (`scheduler.rs:710-740`) — deterministic |
+| RtlEnterCriticalSection on contention | spins `cs->header.absolute*256` times then `xeKeWaitForSingleObject` (`xboxkrnl_rtl.cc:596-633`) | parks immediately via `BlockReason::CriticalSection` (`exports.rs:2927-2945`) — **no spin** |
+| clock | wallclock-driven (`Clock::QueryHostSystemTime`, optional rdtsc) | fixed FILETIME `132_500_000_000_000_000` |
+| determinism cvars | `clock_no_scaling`, `clock_source_raw`, `ignore_thread_priorities`, `ignore_thread_affinities` — none enable lockstep | already lockstep by default; `XENIA_SCHED_ORDER=random` opt-out |
+
+The 104,607 divergence is the symptom: canary's tid=6 contends → blocks on shared dispatcher Event `sid=75ae880ec432eb36` → another guest thread mutates protected state during the wait → post-acquire reads mutated value → nested-cleanup branch (`E E L L`). Ours's tid=1 runs monolithically, no other thread gets the CS first → fast-path acquire → reads pre-wait value → simple-release branch (`E L NtClose`). The C+18/C+21 floating absorbers already mask the observation-side jitter (the `wait.begin` event itself); the post-wait *control-flow* divergence is real guest code, not absorbable without crossing reading-error #23.
+
+**Sylpheed workload profile** (probed from `ours-cold.jsonl`, 121,569 events):
+- 19,494 RtlEnterCriticalSection + 19,492 RtlLeaveCriticalSection calls (≈80% of all kernel.calls)
+- only 2 KeQuerySystemTime, 0 KeQueryPerformanceCounter, 0 KeDelayExecutionThread, 0 NtYieldExecution
+- 34 wait.begin events total, most with `timeout_ns=-1` (indefinite)
+- 6 sister chains; main capped at 104,607, sisters capped at 11/32/4/41/16
+
+**Intended outcome**: advance the main matched-prefix by ≥1,000 events (target ≥106,000) without destabilizing ours's cold digest `e1dfcb1559f987b35012a7f2dc6d93f5` and without modifying canary's default behavior.
+
+## Recommended approach: Stage 0 spike → Targeted Contention-Replay Manifest
+
+**Stage 0** first — a cheap (≈80 LOC, 1 day) cycle-quantum preemption sweep to test whether *scheduling shape alone* unblocks the cap. If a tuned quantum advances main prefix past 104,607 with stable digest across 3 cold runs, the manifest work may be unnecessary.
+
+**Stages 1–4 (gated on Stage 0 outcome)** — a **contention manifest**: canary emits a new event kind `contention.observed` on every `RtlEnterCriticalSection` with `{site_sid, cs_ptr, contended}`. A Python tool distills the trace into a per-(tid, tid_event_idx) manifest. Ours's `rtl_enter_critical_section`, in a new `OrderMode::ContentionReplay` mode, consults the manifest before its fast-path check; when an entry says contended, it parks via the *existing* `BlockReason::CriticalSection` machinery and lets the actual owner's `RtlLeaveCriticalSection` (also already wired) hand back the lock through the existing wake path. Stage 5 is a per-CS fallback kludge.
+
+**Why this approach over the alternatives:**
+
+| approach | LOC | unblocks 104,607? | preserves ours digest | preserves canary default | verdict |
+|---|---|---|---|---|---|
+| A — cycle clock in canary | ~200 in `base/clock.cc` | NO — workload is clock-light | n/a | yes | wrong target |
+| B — single-thread cooperative canary | ~2000-3000 in `kernel/xthread.cc`, `base/threading*.cc`, `cpu/processor.cc` | yes | yes | NO — destabilizes oracle | overscoped, breaks oracle |
+| C/H — contention manifest replay (broad: CS + wait) | ~600-700 | yes | yes (default-off) | yes (cvar-off) | **second choice** |
+| **H' — manifest replay, scoped to RtlEnterCS** | **~450-500** | **yes** | **yes** | **yes** | **recommended** |
+| D — diff-harness absorption extension | ~200 in `diff_events.py` | partially — hits #23 wall, buys 10-100 idx | n/a | n/a | fallback only |
+| E — A+D hybrid | ~400 | LOW | n/a | n/a | tactical band-aid |
+| F — make ours preemptive | ~500 in `scheduler.rs` | maybe | NO — breaks `e1dfcb15…` | n/a | wrong direction |
+| **cycle-quantum spike** | **~80 in `scheduler.rs`** | **TBD by spike** | **TBD by spike** | **n/a** | **Stage 0 gate** |
+| spin-then-wait CS fix in ours | ~50 in `exports.rs:2886` | NO (canary contends/ours doesn't — adding spin to ours makes contention *less* likely, wrong direction) | yes | n/a | log finding, defer |
+
+The fundamental insight is that **scoping the replay to RtlEnterCriticalSection only** sidesteps four traps in a broader design: (1) no tid translation needed since canary and ours each consume their own native-tid events; (2) no mid-instruction forced-yield primitive needed since the import dispatch is already a scheduling boundary; (3) no "mutating thread" field needed since the current owner does the mutation and the existing wake path handles it; (4) wait-side replay is deferred since only 34 wait.begin events exist in the whole boot.
+
+## Stages
+
+### Stage 0 — Cycle-Quantum Preemption Spike (~80 LOC, 1 day) **[GATE]**
+
+**Goal**: cheap signal on whether scheduling shape alone is sufficient.
+
+| Change | File | LOC |
+|---|---|---|
+| Add `OrderMode::ScanQuantum { ticks: u32 }` variant; in `round_schedule` or the step loop, force `decrement_quantum` on every Nth step | [scheduler.rs:230-258](xenia-rs/crates/xenia-cpu/src/scheduler.rs#L230-L258) and [scheduler.rs:710-740](xenia-rs/crates/xenia-cpu/src/scheduler.rs#L710-L740) | ~30 |
+| Wire `XENIA_SCHED_QUANTUM=<u32>` env var → `OrderMode::ScanQuantum` | same | ~10 |
+| Sweep harness: bash script running cold-vs-cold at quanta `[10, 50, 200, 1000, 5000]` | new under `xenia-rs/audit-runs/stage0-quantum-sweep/` | ~40 (script + notes) |
+
+**Validation**:
+- Cold-vs-cold per quantum (`XENIA_CACHE_WIPE=1`, `.iso` path, canary `--mute=true`).
+- Record matched-prefix per quantum value in a sweep table.
+- Verify ours's digest stable × 3 cold runs at each candidate quantum.
+
+**Decision tree**:
+- *If* a quantum value advances main prefix ≥ 105,500 AND ours's digest is stable × 3 at that quantum: land it behind a non-default `OrderMode` (keep `Fixed` as default so `e1dfcb15…` is preserved). Skip Stages 1-4. Document.
+- *Else if* some quantum partially helps (105,000-105,500) but digest is unstable: keep the variant available as a probe but proceed to Stage 1.
+- *Else* (no improvement): proceed to Stage 1 immediately.
+
+**Rollback**: trivial — revert the variant; default `OrderMode::Fixed` is unchanged.
+
+### Stage 1 — Canary-Side Contention Emitter (~100 LOC, cvar-OFF byte-identical)
+
+**Goal**: produce ground truth that "tid X contended on cs Y at its kernel-call ordinal N."
+
+| File | Edit | LOC |
+|---|---|---|
+| [xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:596-633](xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc) (`RtlEnterCriticalSection_entry`) | Emit `contention.observed` with `contended=false` on spin-loop success (`atomic_cas` hits) and `contended=true` when control falls through to `xeKeWaitForSingleObject` | ~40 |
+| `xenia-canary/src/xenia/kernel/util/event_log.{h,cc}` | New `EmitContentionObserved(site_sid, cs_ptr, contended)`; cvar `kernel_emit_contention=false` default | ~30 |
+| `xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md` | New §"contention.observed (v1.4 — Phase D+0)" | ~10 |
+| `xenia-canary/src/xenia/kernel/util/event_log_test.cc` | Round-trip test | ~20 |
+
+**Schema (minimum)**:
+```
+kind: "contention.observed"
+tid: <guest tid of caller>
+tid_event_idx: <per-tid ordinal>
+payload: { "cs_ptr": <u32 hex>, "site_sid": <16-hex>, "contended": <bool> }
+```
+`site_sid` is the **C+18 shared-global recipe** `semantic_id_shared_global(cs_ptr, KernelObjectType::CriticalSection)` — both engines compute the same SID for the same CS pointer, so it's a valid cross-engine lookup key.
+
+**Validation**:
+- Enable cvar, cold-run canary, verify ≥1 `contended=true` event near canary's tid=6 `tid_event_idx` ≈ 104,605.
+- Verify cold digest unchanged when `kernel_emit_contention=false` (default) — byte-identical to pre-Stage-1.
+
+**Rollback**: cvar OFF by default; revert the 4 files.
+
+### Stage 2 — Manifest Builder (~150 LOC, pure Python)
+
+**Goal**: distill canary jsonl into a replay-ready manifest.
+
+| File | LOC |
+|---|---|
+| `xenia-rs/tools/diff-events/build_contention_manifest.py` (new) | ~120 |
+| `xenia-rs/tools/diff-events/test_build_manifest.py` (new) | ~30 |
+
+**Manifest schema** (`contention_manifest.json`):
+```json
+{
+  "version": 1,
+  "source_canary_digest": "<sha256 of canary jsonl>",
+  "entries": [
+    { "tid": 6, "tid_event_idx": 104605, "site_sid": "75ae880ec432eb36",
+      "cs_ptr": "0x82abc000", "contended": true }
+  ]
+}
+```
+
+Builder reads canary jsonl, filters `kind == "contention.observed"`, keeps `contended=true` (Phase 1 evidence suggests <100 entries across the whole boot given the wait-light profile), sorts by `(tid, tid_event_idx)`. Diff tool already keys events by `(tid, tid_event_idx)`; this matches.
+
+**Validation**: round-trip — build from canary cold jsonl, count `contended=true` entries, eyeball-diff against C+22 jitter samples.
+
+### Stage 3 — Ours Replay Mode (~200 LOC + ~50 LOC tests)
+
+**Goal**: ours's `rtl_enter_critical_section` consults the manifest *before* the fast-path check; forces park if the manifest says contended.
+
+| File | Edit | LOC |
+|---|---|---|
+| [xenia-rs/crates/xenia-cpu/src/scheduler.rs:230-258](xenia-rs/crates/xenia-cpu/src/scheduler.rs#L230-L258) | New `OrderMode::ContentionReplay { manifest_path }`; `Scheduler` carries `Option<Arc<ContentionManifest>>` | ~40 |
+| `xenia-rs/crates/xenia-kernel/src/contention_manifest.rs` (new) | Loader, hashmap keyed on `(tid, tid_event_idx)`, `consume(tid, idx) -> Option<Entry>` | ~80 |
+| [xenia-rs/crates/xenia-kernel/src/exports.rs:2886-2946](xenia-rs/crates/xenia-kernel/src/exports.rs#L2886-L2946) (`rtl_enter_critical_section`) | After computing `current_tid`, peek `tid_event_idx`; if manifest says contended at `(tid, idx)`: (a) verify `site_sid` matches recomputed shared-global SID for `cs_ptr`, (b) check the CS in guest memory actually has a live non-self owner — if not, skip with a log warning (state-divergence not schedule-divergence), (c) emit a synthetic `wait.begin` (C+21 absorber will handle it), (d) push self onto `cs_waiters[cs_ptr]`, (e) call `park_current(BlockReason::CriticalSection(cs_ptr))`. The existing wake path at lines 2972-2980 already hands us the lock when the owner releases. | ~50 |
+| `xenia-rs/crates/xenia-cpu/src/main.rs` or equivalent CLI module | `--scheduler-replay-manifest PATH` flag | ~20 |
+| Replay-mode unit tests | `xenia-rs/crates/xenia-kernel/src/contention_manifest.rs` | ~50 |
+
+**Critical subtlety**: only force park when the CS in guest memory actually has a live different-tid owner at the replay point. If the CS is free, this is a state-divergence (mutation timing mismatch), not a schedule-divergence; replay must skip and log. Otherwise we'd park on a CS that no one will release → deadlock. Explicit branch in (b) above.
+
+**Validation**:
+1. Cold-vs-cold matched-prefix advances past 104,607 (target ≥106,000, the next major divergence boundary).
+2. Ours's digest, when `--scheduler-replay-manifest` is NOT passed, byte-identical to pre-Stage-3 `e1dfcb1559f987b35012a7f2dc6d93f5`.
+3. With manifest passed, replay-mode digest stable × 3 cold runs (a NEW digest, archived).
+4. Sister chains tid=4→11/7→2/12→7/14→9/15→10 regress at most -5 events each.
+5. Phase B `image_loaded_sha256 ea8d160e…` unchanged.
+
+**Rollback criteria**:
+- If prefix doesn't advance past 104,607: diagnose via `RUST_LOG=trace` on the replay-consume path; verify SID match against canary's emitted SID for the contended cs_ptr; check whether the CS was free at the replay point (the (b) skip-branch may be firing).
+- If digest unstable with replay: forced-park is non-deterministic. Inspect `cs_waiters[cs_ptr]` ordering, `wake_ref` selection at scheduler.rs (queue.remove(0) — FIFO, should be deterministic). Possible culprit: `find_by_tid` at exports.rs:2903 traverses HW slots in `rotation_cursor` order — pin or verify.
+- If sister chains regress >5: forced contention on tid=6 is changing other chains' progression. Initially keep manifest tid-scoped to tid=6 only; broaden iteratively if needed.
+
+### Stage 4 — Diff Tool Hookup (~30 LOC + tests)
+
+**Goal**: register the new event kind.
+
+| File | LOC |
+|---|---|
+| [xenia-rs/tools/diff-events/diff_events.py:201-216](xenia-rs/tools/diff-events/diff_events.py#L201-L216) | Add `ENGINE_LOCAL_KINDS = {"contention.observed"}` set; advance per-tid pointer past these events without comparison | ~20 |
+| `xenia-rs/tools/diff-events/test_diff_events.py` | Test: `contention.observed` doesn't affect matched-prefix | ~10 |
+
+**Rationale for engine-local**: ours doesn't emit `contention.observed` (it consumes the manifest instead). Marking the kind engine-local keeps the matched-prefix definition unchanged and reversible. We can promote to a matched event later if both engines start emitting it.
+
+**Validation**: existing 30 diff tests pass; new test confirms engine-local handling.
+
+### Stage 5 — Per-CS Fallback (deferred, ~30 LOC if needed)
+
+If Stage 3 succeeds on the 104,607 cap but a similar divergence appears soon after on a CS the manifest doesn't cover (e.g., because canary's emitter missed it), hardcode the SID `75ae880ec432eb36` (or whichever) to always force a one-round yield. Document as a known kludge. Don't generalize.
+
+## Critical files
+
+- [xenia-rs/crates/xenia-kernel/src/exports.rs:2886-2980](xenia-rs/crates/xenia-kernel/src/exports.rs#L2886-L2980) — `rtl_enter_critical_section`, `rtl_leave_critical_section`
+- [xenia-rs/crates/xenia-cpu/src/scheduler.rs:230-258](xenia-rs/crates/xenia-cpu/src/scheduler.rs#L230-L258) — `OrderMode`
+- [xenia-rs/crates/xenia-cpu/src/scheduler.rs:710-740](xenia-rs/crates/xenia-cpu/src/scheduler.rs#L710-L740) — `round_schedule`
+- [xenia-rs/crates/xenia-cpu/src/scheduler.rs:808-852](xenia-rs/crates/xenia-cpu/src/scheduler.rs#L808-L852) — `park_current`, `wake_ref`
+- [xenia-rs/crates/xenia-kernel/src/event_log.rs](xenia-rs/crates/xenia-kernel/src/event_log.rs) — `TID_COUNTERS`, `next_tid_idx`, `peek_tid_idx`, C+18 shared-global SID recipe
+- [xenia-rs/tools/diff-events/diff_events.py](xenia-rs/tools/diff-events/diff_events.py) — `SKIP_PAYLOAD_FIELDS_BY_KIND`, `SHARED_GLOBAL_SID_MARKER`, C+18/C+21 absorb logic
+- [xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:596-633](xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc#L596-L633) — `RtlEnterCriticalSection_entry` (read before Stage 1 edit)
+- `xenia-canary/src/xenia/kernel/util/event_log.{h,cc}` — Phase A emitter (add Stage 1 helper here)
+- `xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md` — extend with v1.4 §"contention.observed"
+
+## Reused utilities
+
+- `semantic_id_shared_global(pointer, object_type)` (C+18 recipe) — both engines compute identical SIDs for the same CS pointer. Use as the cross-engine lookup key in the manifest.
+- `Scheduler::park_current(BlockReason::CriticalSection(cs_ptr))` (scheduler.rs:808) — existing primitive; Stage 3 reuses without modification.
+- `Scheduler::wake_ref(r)` and the `cs_waiters` queue (exports.rs:2972-2980) — existing wake/transfer machinery handles the post-park resume without any new code.
+- C+21 `wait.begin` floating absorber (diff_events.py) — Stage 3's synthetic `wait.begin` emission is automatically absorbed.
+- C+18 shared-global SID emission (event_log.rs / event_log.cc) — Stage 1's `site_sid` field uses this directly.
+
+## Verification end-to-end
+
+After Stage 0:
+```bash
+cd "/home/fabi/RE - Project Sylpheed"
+for q in 10 50 200 1000 5000; do
+  XENIA_CACHE_WIPE=1 XENIA_SCHED_QUANTUM=$q \
+    xenia-rs/target/release/xrs-verify-stage0 "Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso" \
+    --phase-a-event-log /tmp/ours-q$q.jsonl
+done
+# compare each /tmp/ours-q*.jsonl digest and matched-prefix vs canary baseline
+```
+
+After Stages 1-4:
+```bash
+# capture canary contention trace (one-time)
+wine xenia-canary/build-cross/bin/Windows/Debug/xenia_canary.exe \
+  --mute=true --kernel_emit_contention=true \
+  --phase_a_event_log_path=/tmp/canary-contention.jsonl \
+  "Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso"
+
+# build manifest
+python3 xenia-rs/tools/diff-events/build_contention_manifest.py \
+  /tmp/canary-contention.jsonl > /tmp/contention.json
+
+# replay in ours
+XENIA_CACHE_WIPE=1 xenia-rs/target/release/xrs-replay \
+  --scheduler-replay-manifest /tmp/contention.json \
+  --phase-a-event-log /tmp/ours-replay.jsonl \
+  "Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso"
+
+# diff
+python3 xenia-rs/tools/diff-events/diff_events.py \
+  --canary /tmp/canary-contention.jsonl \
+  --ours /tmp/ours-replay.jsonl
+# expect main matched-prefix ≥ 106,000
+```
+
+Test suites:
+```bash
+cd xenia-rs && cargo test --release
+# expect: kernel 204→≥210, workspace 291→≥305
+cd xenia-canary && build-cross/bin/Linux/Debug/xenia-kernel-test
+# expect: pass with kernel_emit_contention round-trip test
+python3 xenia-rs/tools/diff-events/test_diff_events.py
+# expect: 30→≥34 pass
+python3 xenia-rs/tools/diff-events/test_build_manifest.py
+# expect: pass
+```
+
+## Risk register
+
+| risk | likelihood | mitigation |
+|---|---|---|
+| Forced contention on tid=6 changes ordinal progression on later tids, breaking sister chain matches | high | Stage 3 validation #4 (-5 budget). If exceeded: keep manifest tid-scoped to tid=6 only initially; broaden iteratively per sister. |
+| Synthetic park on a CS that's free in ours's memory → deadlock | medium | Stage 3 explicit (b) skip-branch + warn. Replay only parks when guest-memory shows a live different-tid owner. |
+| Canary's spin loop usually succeeds → very few `contended=true` events to drive replay | medium | Stage 1 round-trip validation confirms ≥1 entry near 104,605. If empty: spin sizes are too large for Sylpheed's CS contention; emit on spin-loop entry too with `contended=spin_count_exhausted`. |
+| Ours digest destabilizes under replay (forced park orderings non-deterministic) | medium | Inspect `cs_waiters` FIFO + `find_by_tid` HW-slot traversal order. Pin both to deterministic order if needed. |
+| Reading-error #34 (loose `.xex` vs `.iso`) | known, covered | All validation uses `.iso` path. |
+| Reading-error #28 (verify source before writing) | active | Stage 1 requires reading `xboxkrnl_rtl.cc::RtlEnterCriticalSection` end-to-end first; Stage 3 likewise for `exports.rs:2886-2946`. Already done in planning. |
+| Manifest grows huge on other games | low | Per-game tool; Sylpheed wait-light. Document scope in plan-doc / memory. |
+| Phase B `image_loaded_sha256 ea8d160e…` regression | low | Image load is Phase B; CS replay touches Phase A only. Verify in every cold run. |
+| Game-compat: real Sylpheed depends on wallclock pacing | low | Workload profile: 2 KeQuerySystemTime calls. Clock not in scope. |
+| The spin-then-wait asymmetry is itself a divergence | known, deferred | Don't add spin to ours under this plan — it would make ours's contention *less* likely, which is the wrong direction for 104,607. Log finding; defer to a separate phase. |
+
+## Backstop
+
+If Stage 3 lands but matched-prefix advance is <500 events (i.e., we get past 104,607 but quickly hit a similar wall):
+1. **Stage 5** per-CS hardcoded yield for the next blocking SID.
+2. **Approach D extension** — extend the diff-tool absorber to fold "post-wait nested Enter/Leave blocks" matched against a known pattern (one extra `E`-then-`L` cycle on the canary side with the same outer CS). ~150 LOC in `diff_events.py`. This crosses reading-error #23 in spirit but with a narrow heuristic; tag explicitly as a band-aid in schema v1.5.
+3. **Broaden manifest** to wait.begin (the deferred "H broad" variant): ~150 LOC. Only worth it if sister chains tid=12→7 or tid=14→9 are stuck on wait timing.
+
+## Acceptance criteria
+
+- Stage 0 spike completes with a decision (land / proceed).
+- Stages 1-4 (if needed) land in 3-5 sessions total.
+- Main matched-prefix ≥ 106,000 (≥1,393 events past current cap).
+- Ours's default-mode cold digest unchanged: `e1dfcb1559f987b35012a7f2dc6d93f5` × 3.
+- Phase B `image_loaded_sha256` unchanged: `ea8d160e…`.
+- Canary default-mode (`kernel_emit_contention=false`) cold digest unchanged.
+- Replay-mode digest stable × 3 cold runs (new value, archived).
+- Sister chain regression ≤5 events per sister.
+- Test suite: kernel 204→≥210, workspace 291→≥305, diff-tool 30→≥34.
+- Memory entry + schema-v1.md v1.4 §"contention.observed" + audit-run dir.
+
+## Out of scope
+
+- GPU/audio determinism (separate subaudits).
+- Wine-level changes (unmodifiable host primitive).
+- Modifying ours's default scheduler (`OrderMode::Fixed` stays the default; replay is opt-in).
+- Modifying canary's default scheduling (single-thread cooperative canary deferred indefinitely — too risky to oracle).
+- Modifying ours's spin behavior on RtlEnterCS (logged, deferred — wrong direction for 104,607).
+- Adding wait.begin replay to manifest (deferred unless sisters need it; backstop only).
+- Clock determinism in canary (workload doesn't need it).
+
+## Open questions for the user
+
+1. **Stage 0 as gate or parallel?** Recommended as gate (1 day, cheap, may answer the whole question). Parallel risks duplicating effort.
+2. **Sister-chain regression budget**: -5 events per sister acceptable? Past escalations (C+17 tid=15→10 was -14, treated as D-NEW-3) suggest budget is tight.
+3. **Canary cvar default**: `kernel_emit_contention=false` (off by default) confirmed?
+4. **Reading-error class**: should this plan reserve class #35 for "scheduling-philosophy divergences", or extend #30 ("scheduling determinism")?
+5. **Spin-then-wait scope**: separate mini-phase or document-and-defer? Recommended defer.