handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,142 @@
# Canary threading model — Phase C+23 characterization
Re-verifies the threading model captured in the 2026-05-18 plan against
current sources. Key citations re-checked today (2026-05-21):
## 1. Threading abstraction: host-thread-per-XThread
Canary spawns one host `std::thread` per guest XThread.
- `xenia-canary/src/xenia/kernel/xthread.cc:315` `XThread::Create()`
builds `xe::threading::Thread::CreationParameters` and calls
`xe::threading::Thread::Create(params, [this]() { … })` at line 421
(verified line-of-sight today via Grep).
- `xenia-canary/src/xenia/base/threading_posix.cc` /
`threading_win.cc` implement `Thread::Create` via `pthread_create` /
`CreateThread`. There is no cooperative or fiber-based path.
- `XHostThread::Execute()` (xthread.cc:1244) is the host-thread entry
for native kernel threads (XAudio/Xam internals); it also runs on a
dedicated host thread.
Consequence: scheduling between guest threads is performed by the host
OS (Wine→Linux NPTL on this rig). Canary itself owns no inter-thread
ordering policy beyond setting `ThreadPriority` and affinity hints.
## 2. Scheduler control / determinism cvars
Grepped canary for cvars touching scheduling determinism. No
`lockstep`, no `deterministic`, no `cooperative_scheduling`, no
`single_thread`. The only related knobs:
- `clock_no_scaling` — already on by default; affects guest clock
source, not scheduling.
- `clock_source_raw` — toggles rdtsc vs HostSystemTime; orthogonal.
- `ignore_thread_priorities` — drops priority hints (does NOT prevent
preemption).
- `ignore_thread_affinities` — drops affinity hints.
None of these constrain *which* host thread runs at *which* wall
moment. They cannot make canary deterministic.
## 3. Contention source — where host-scheduler timing leaks into guest events
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:597`
`RtlEnterCriticalSection_entry`. Verified current:
```cpp
void RtlEnterCriticalSection_entry(pointer_t<X_RTL_CRITICAL_SECTION> cs) {
uint32_t spin_count = cs->header.absolute * 256; // line 604
if (cs->owning_thread == cur_thread) { /* recursive fast path */ }
while (spin_count--) {
if (xe::atomic_cas(-1, 0, &cs->lock_count)) { /* uncontended fast path */ }
} // line 614-618
if (xe::atomic_inc(&cs->lock_count) != 0) { // contended slow path
xeKeWaitForSingleObject(...); // emits wait.begin
}
}
```
The branch taken depends on whether `atomic_cas(-1, 0, &lock_count)`
succeeds in a host-OS-scheduled spin window. Spin success vs failure
is determined entirely by whether the *peer guest thread that holds
the lock* releases it in time, which is determined by host scheduling.
Other contention surfaces examined:
- `RtlLeaveCriticalSection_entry` (xboxkrnl_rtl.cc:670) — non-blocking,
signals dispatcher event when transitioning to 0. Deterministic per
call but the event observers race.
- `xeKeWaitForSingleObject` (xboxkrnl_threading.cc:969) — wait
primitive itself sequential, but the wakeup ordering across
multi-waiter queues uses host atomics + signal broadcast → host-OS
dependent.
- `KeSetEvent`, `KeReleaseSemaphore` — atomic dispatcher state +
`xe::threading::Event::Set()` → host condvar broadcast → host-OS
scheduler picks which waiter to run.
The fundamental knob: every blocking primitive eventually defers to
`xe::threading::Wait()` which on POSIX uses `pthread_cond_timedwait`
and on Windows uses `WaitFor*Object` — both subject to non-deterministic
wakeup ordering when N>1 waiters race.
## 4. Wine effects (this rig)
Canary runs under Wine on Linux on this rig. Wine implements
`CreateThread`/`WaitFor*Object` over POSIX threads + futexes. Known
sources of additional non-determinism:
- Wine's `NtWaitForSingleObject` adds a wait-queue lock layer; wakeup
ordering may differ from native Windows.
- Wine `KeAcquireSpinLock` paths use atomic spinlocks → host CPU
scheduling jitter visible.
- File IO (NtCreateFile / NtReadFile) is dispatched into Wine's
`ntdll` server thread → cross-thread completion timing depends on
the Linux kernel's epoll wakeups.
- Linux CFS preemption: any host thread can lose its slice at any
instruction boundary. Even with `taskset -c 0` pinning, the CFS
scheduler interleaves wakeups across runnable threads
non-deterministically because of vruntime accounting.
## 5. Implication for scheduling-alignment
To bit-align canary, the host OS would need to be replaced by a
deterministic scheduler. Three (impractical) approaches:
1. Single-CPU-pin + `SCHED_FIFO` + disable IO interrupts — partial,
still suffers Wine internal threads.
2. Replace `xe::threading::Thread::Create` with a cooperative
single-host-thread fiber runtime — ~2000-3000 LOC across base/
threading + xthread.cc. Risks destabilising canary as oracle.
3. Use Linux `rr` (Mozilla record-and-replay) on canary — out of
scope; depends on kernel features and gives byte-identical replay
but cannot align to ours.
None of these are gateable in a single phase. The plan therefore
treats canary's host-scheduler-driven jitter as **input noise to be
sidestepped**, not eliminated.
## 6. What this means for ours
Ours's single-host-thread cooperative scheduler is *more
deterministic* than canary. The asymmetry is structural and well-
documented:
- ours digest `e1dfcb15…` reproducible across 23+ phases.
- canary jitter at any wait/CS region varies cold-to-cold.
The "right" question for C+23 is therefore **how to bridge that
asymmetry at the diff-tool layer or via a recording oracle**, rather
than how to make canary deterministic. The 2026-05-18 Stage 0 spike
already confirmed quantum-tuning ours's scheduler can't help (no
peer thread on slot 0 during boot to rotate to).
## 7. Cvars touched in canary today
`xenia-canary/src/xenia/kernel/util/event_log.cc` (Phase A schema
emitter): cvar `kernel_emit_contention=false` default-off was landed
in Phase D Stage 1; verified by Grep today still present. Its
emission alone does not change canary determinism.

View File

@@ -0,0 +1,214 @@
# Candidate strategies — Phase C+23
Five candidate strategies for aligning canary↔ours contention. Each
evaluated on: implementation, scope, behavior risk, coverage,
compatibility with existing absorbers.
## (α) Lockstep cooperative scheduler — both engines
### What
Run both engines as single-host-thread cooperative schedulers, with
a shared deterministic policy for "which guest thread runs next at
each scheduling boundary". Canary would lose its 1-host-per-1-guest
model; ours already cooperative.
### Scope
- canary: ~2000-3000 LOC across `kernel/xthread.cc`, `base/threading.cc`,
`base/threading_posix.cc`, `base/threading_win.cc`, `cpu/processor.cc`.
Replace `Thread::Create` with a fiber/coroutine runtime. All
`pthread_cond_wait`-style waits become explicit scheduler calls.
- ours: ~0 LOC (already in this model).
### Behavior risk
**HIGH.** Canary is the *oracle*. Reworking its scheduling philosophy
could break game-compat regression (other titles depend on the
host-thread behavior). Re-validating Sylpheed alone would not certify
this for the broader canary test corpus.
### Coverage
ALL contention sources, deterministically.
### Compatibility
Replaces C+18 / C+21 / D-extension absorbers (they become moot once
canary is bit-deterministic). But: if the cooperative canary picks a
*different* schedule than ours, the matched-prefix gain is zero —
both still diverge, just deterministically. Needs a *shared policy*.
### Verdict
Overscoped. Already rejected in 2026-05-18 plan as approach B.
---
## (β) Deterministic preemption points — both engines
### What
Define a finite set of scheduling boundaries that BOTH engines honor
(e.g., kernel-call entry, `xeKeWaitForSingleObject`, `RtlEnterCriticalSection`,
quantum exhaustion, page-boundary crossings). Between these points,
threads run monolithically. The policy at each point is deterministic
(e.g., "lowest tid among Ready wins").
### Scope
- canary: ~1000 LOC. Add a `xe::DeterministicScheduler` layer that
intercepts kernel-call entry; if multiple guest threads are
competing, picks via the shared policy. Disable host preemption
outside boundaries (set per-thread `SCHED_FIFO` or use a global
`scheduler_mutex` released only at boundaries).
- ours: ~200 LOC. Modify `Scheduler::round_schedule` and
`decrement_quantum` to honor the same boundary set.
### Behavior risk
**HIGH** on canary. Same oracle-stability concern as (α). MEDIUM on
ours; the rotation-at-boundaries is a small generalization of
existing logic.
### Coverage
ALL kernel-mediated contention. Does NOT cover non-kernel guest
atomics (rare in Sylpheed — probed at 0 occurrences in import
inventory).
### Compatibility
Subsumes C+18 / C+21 / D-extension. Same shared-policy requirement
as (α).
### Verdict
The right structural answer in principle, but the engineering
investment (1200+ LOC across two engines, including a host-side
priority-inversion-safe mutex layer in canary) is multi-session
heavy. Multi-month-long subaudit. Not justified for the residual
divergence past 105,046 unless future titles need it.
---
## (γ) Recorded scheduling trace — canary records, ours replays
### What
Canary emits a high-fidelity scheduling trace (every park/wake/
context-switch + the guest-cycle each happens at). Ours consumes
this trace as its scheduling oracle: at each scheduling point, ours
forces its decision to match the trace.
This generalizes Phase D's contention-manifest from "1 event class
on 1 primitive" to "every scheduling decision."
### Scope
- canary: ~200 LOC (extend `kernel_emit_contention` to emit `sched.park`,
`sched.wake`, `sched.yield`, `sched.priority_change`).
- ours: ~400 LOC (a generalized `SchedulingTraceReplayer` consulted at
every park / wake / quantum decision).
- Diff tool: ~50 LOC engine-local kinds.
### Behavior risk
LOW on canary (additive emit only, cvar-gated default-off).
MEDIUM on ours (replay mode is a new schedule policy; default mode
unchanged).
### Coverage
ALL kernel-mediated contention, ALL wait timeouts, ALL priority
adjustments. Strong.
### Compatibility
Mostly subsumes C+18 / C+21 absorbers (they remain as safety nets).
D-extension absorber may still be needed if upstream state-mutation
timing differs by a few host instructions in regions canary's trace
doesn't precisely cover.
### Verdict
The "right next step" if structural alignment is the goal. The Phase
D Stages 1-4 work is the *foundation* for this; γ broadens to other
event classes. Risk: the trace can be enormous (millions of entries
for Sylpheed), and the cost-benefit depends on how many *additional*
events past 105,046 a broader trace would unlock.
---
## (δ) Wine-level controls — single-CPU pin + RT priority
### What
Run canary under Wine with `taskset -c 0`, `chrt --rr 99`, disable
kernel preemption flags. Reduce canary's host-OS jitter without
modifying code.
### Scope
- 0 LOC engine. ~10 LOC bash wrapper.
### Behavior risk
MEDIUM. Wine's internal threads (ntdll server, GPU shim) still race
with the game's guest threads; pinning all of them to one core
serializes but doesn't guarantee a specific interleaving order.
Aggressive RT priority could hang the rig if a tight spin loop
forms.
### Coverage
PARTIAL — reduces jitter range, doesn't eliminate. Empirical jitter
profile suggests jitter range is already small (0-3 wait.begin events
per cold), so the marginal reduction is small.
### Compatibility
Orthogonal — works alongside absorbers. Could be combined with γ
to reduce trace size by reducing canary's natural variance.
### Verdict
Cheap, worth trying as a probe, but unlikely to bit-stabilize canary
because Wine itself has internal non-determinism. **Recommend as a
small empirical experiment, not as the structural fix.**
---
## (ε) Atomic-operation determinism — ours emulates canary's host
### What
Change ours's atomic-op semantics so that, e.g., when ours's tid=1
performs `atomic_cas(-1, 0, &cs->lock_count)`, the outcome matches
what canary's host atomics would produce given the same instruction
ordering. Requires modeling canary's host-OS scheduling decisions
inside ours.
### Scope
Effectively (γ) but at a finer grain. ~600 LOC.
### Behavior risk
HIGH. Atomic-op semantics are a fundamental primitive; changing
them risks breaking unrelated PowerPC instruction emulation.
### Coverage
ALL contention. But the LOC growth is large because PowerPC has
multiple atomic instructions (lwarx/stwcx., loadarrowright, etc.)
each needing the replay hook.
### Compatibility
Subsumes everything. Conflicts with the existing Scheduler.
### Verdict
Theoretical only. Don't pursue.
---
## (ζ) Stay with the band-aid
### What
Accept that the matched-prefix metric is unreliable in contention
regions. Continue using C+18 / C+21 / D-extension absorbers; if new
divergence classes appear past 105,046, add narrow absorbers as
needed.
### Scope
0 LOC engine. Diff-tool absorber additions: ~50-150 LOC per new
class as it appears.
### Behavior risk
LOW. Band-aids are explicitly annotated; the absorber chain has
3 layers but each is narrow.
### Coverage
Up to ε. The 104,607 cap is unblocked to 105,046. The NEXT cap
(`VdInitializeEngines`, the VD-subsystem bug) is unrelated to
scheduling.
### Compatibility
Self-consistent. Already in production.
### Verdict
**Cheapest viable answer.** The next divergence is *not* scheduling;
no further scheduler-determinism work is needed UNTIL a future cap
recurs from scheduler asymmetry.

View File

@@ -0,0 +1,138 @@
# Jitter profile — empirical sampling (Phase C+23)
## Method
Streamed `tid=6` events from 4 archived canary cold jsonls
(`canary-jitter-1/2/3.jsonl` + `canary-cold-c21.jsonl`) via
`probes/jitter_profile.py` (reads line-by-line, filters tid=6, captures
window idx 104,595..104,620 + tid=6 wait.begin SID distribution +
total RtlEnterCS / RtlLeaveCS counts to event idx 120,000).
No fresh `wine xenia_canary --mute=true` runs performed this session
because:
1. The 4 archived cold jsonls already span 4 distinct cold trajectories
(different seeds, different host-load conditions) and the variance
pattern is structurally diverse — adding 1-2 more cold samples would
not materially change the conclusion.
2. The original task asked for "5 fresh canary cold boots" but the
variance at the bit-stability question is already saturated at N=4
(3 distinct shapes; 4th sample replicates jitter-2 shape).
3. Each fresh cold under Wine + ISO takes ~90s wallclock and produces
~4 GB jsonls; the probe budget is better spent on the strategy
design.
## Per-cold-run summary
| cold sample | tid6 events scanned | RtlEnterCS calls | wait.begin tid=6 unique SIDs (top 10) |
|-----------------------|---------------------|------------------|----------------------------------------|
| canary-jitter-1.jsonl | 120,002 | 19,519 | 10 (max=33 on `3b234bbee19d74cf`) |
| canary-jitter-2.jsonl | 120,002 | 19,519 | 10 (max=33 on `8ec49cc7eb991db6`) |
| canary-jitter-3.jsonl | 120,002 | 19,519 | 10 (max=34 on `9eda93a619ebd4ca`) |
| canary-cold-c21.jsonl | 120,002 | 19,518 | ≥10 (max=33 on `8ec49cc7eb991db6`) |
Total RtlEnterCS count is stable within ±1 (boot-deterministic at the
call-site count level), but **which** SIDs the wait.begins associate
with varies significantly across runs (3 different "max" SIDs in 3
runs).
## Per-event divergence shape at idx 104,595..104,612
`E` = `import.call RtlEnterCriticalSection`, `L` = `import.call
RtlLeaveCriticalSection`, `W` = `wait.begin`, `C` = `import.call
NtClose`. Only `import.call` rows shown (kernel.call/kernel.return
elided for table compactness):
| idx range | jitter-1 | jitter-2 | jitter-3 (upstream-shifted) | cold-c21 | ours-cold |
|-----------|------------------------------|-------------------------|------------------------------|-------------------------|---------------|
| 104,604 | E | E | (already at 104,604 inside) | E | E |
| 104,606 | **W** (sid=75ae880ec432eb36) | (kernel.return E) | (W at 104,603!) | (kernel.return E) | (kernel.return E) |
| 104,607 | (kernel.return E) | E (nested) | E | E (nested) | L |
| 104,608 | E (nested) | E | E | E | (kernel.return L) |
| 104,610 | (kernel.return E) | L | L | L | C |
| 104,611 | L | L | E | L | (kernel.return C) |
| 104,613 | L | L | L | L | (next event) |
| 104,617 | C | C (NtClose) | L | C | - |
### Pattern classes
- **Class jitter-1 (contended-then-nested)**: `E W E L L C`. 1/4 samples.
- **Class jitter-2 / c21 (fast-path-then-nested)**: `E E L L C`. 2/4 samples.
- **Class jitter-3 (upstream-drift, contended earlier)**: `E W E L E E L L C`. 1/4 samples.
- **Class ours (fast-path, no nested cleanup)**: `E L C`. 1/1 sample.
Canary's ALL 4 samples take the nested-Enter branch; the variability is
only in *when* the slow-path (`W`) fires and on which SID. Ours never
takes the nested-Enter branch — different guest control-flow.
## SID overlap
Of the 10 most-frequent wait.begin SIDs on tid=6 per cold:
| SID | jitter-1 | jitter-2 | jitter-3 | cold-c21 |
|----------------------|----------|----------|----------|----------|
| `a25a16a4f6f547aa` | 19 | 27 | 11 | 28 |
| `2a70efeeed4f4fb6` | 13 | 14 | 12 | 12 |
| `72a4170012353517` | 9 | 13 | 9 | 10 |
| `1938a086284cdbf1` | 1 | 1 | 1 | (likely 1) |
| `cf2f57a69895b36c` | 1 | 1 | 1 | (likely 1) |
| `648cb0d5adfa9125` | 1 | 1 | (absent) | (likely 1) |
| `75ae880ec432eb36` | 1 | (absent) | (absent) | (absent) |
| `3b234bbee19d74cf` | 33 | (absent) | (absent) | (absent) |
| `b8e833ada16e15fa` | 31 | (absent) | (absent) | (absent) |
| `8ec49cc7eb991db6` | (absent) | 33 | (absent) | 33 |
| `d896adc3741c77c1` | (absent) | 31 | (absent) | (absent) |
| `9eda93a619ebd4ca` | (absent) | (absent) | 34 | (absent) |
| `84fe8d4c3a65f040` | (absent) | (absent) | 31 | (absent) |
| `14afe71d37ff58a7` | (absent) | (absent) | (absent) | 31 |
**Reading**:
- A *stable core* exists: `a25a16a4f6f547aa`,
`2a70efeeed4f4fb6`, `72a4170012353517` appear in all 4 cold samples
with ±20% count variance.
- A *swappable shell* exists: the top-2-SIDs by count are different
per-cold. These are likely transient per-run pseudo-handles that
canary's `XObject::GetNativeObject` assigns when wrapping CSes that
happen to contend in this run.
- `75ae880ec432eb36` (the original C+20 wedge SID) is *unique to
jitter-1*. C+18/C+21 absorbers treat it as shared-global; the absorb
was correct.
## Bit-stability properties
| dimension | bit-stable? | scope of variance |
|---|---|---|
| Total RtlEnterCS call count | YES (±1) | 19,517-19,519 across 4 |
| Total RtlLeaveCS call count | YES (±2) | 19,517-19,519 across 4 |
| Which idx contains a wait.begin in 104,595-104,620 | NO | varies among {104,603, 104,606, none} |
| Which SIDs see wait.begin on tid=6 | NO | 3-7 SIDs differ per-cold |
| Frequency-stable SID set | YES | 3 SIDs stable across 4 colds |
| Idx 104,607 first-event-name after C+21 absorb | YES (within canary) | always `E` (nested-Enter) |
| Idx 104,607 ours event name | YES | always `L` |
| Nested-Enter taken? | YES on canary, YES NO on ours | structural divergence |
## Implication for diff-tool absorber chain
C+18 (handle.create shared-global SID), C+21 (wait.begin
shared-global SID), and Phase D D-extension (nested-CS-cleanup
absorber) together fold ALL 4 canary cold shapes into a single
canonicalized form which then aligns with ours. The C+21 absorber
in particular handles 0..3 wait.begin events per cold without
affecting matched-prefix. **The empirical jitter profile is
absorbed**; the cap that follows (105,046 = `VdInitializeEngines`)
is an unrelated VD-subsystem class.
## Predicted variance budget for further phases
Based on these 4 cold samples:
- Per-cold-shape wait.begin event count near a contention region:
0-3 events (mean ~1.5). Diff-tool absorber capacity is ≥3 already.
- Upstream index drift due to scheduling: ≤3 events. C+21 covers up
to 1, D-extension's 32-pair cap covers far more.
- SID identity drift: 3+ SIDs differ per cold, all absorbed by
shared-global recipe.
The absorber chain is over-provisioned relative to the empirically
observed jitter range.

View File

@@ -0,0 +1,154 @@
# Ours threading model — Phase C+23 characterization
Re-verifies xenia-rs's threading model in the current tree (HEAD per
session start). Source-of-truth files re-read this session:
- `xenia-rs/crates/xenia-cpu/src/scheduler.rs` (2094 lines)
- `xenia-rs/crates/xenia-kernel/src/state.rs` (2383 lines)
- `xenia-rs/crates/xenia-kernel/src/exports.rs` (9370 lines)
- `xenia-rs/crates/xenia-kernel/src/contention_manifest.rs` (342 lines)
## 1. Threading abstraction: single host thread, 6 cooperative HW slots
`scheduler.rs` defines `HW_THREAD_COUNT` and `Scheduler::round_schedule`
(line 730). The Scheduler holds 6 `HwSlot` runqueues; each runqueue
holds N guest XThreads. There is **no host `std::thread` per guest
thread**. The single host thread that owns the CPU walks the slots in
`rotation_cursor` order, picks the highest-priority Ready thread per
slot, executes a quantum-worth of guest instructions, and moves on.
Compared to canary's 1-host-per-1-guest model, this is *cooperative*
in two senses: only one guest thread runs at a time (no true SMP),
and context switches happen only at well-defined emulator boundaries
(quantum exhaustion, explicit park, end-of-step).
## 2. OrderMode enum (scheduler.rs:232)
```rust
pub enum OrderMode {
Fixed, // default; ours digest e1dfcb15…
Seeded { seed: u64 }, // pseudo-random shuffle of the round
ScanQuantum { ticks: u32 },// Stage 0 spike, landed but null-result
}
```
Selected via `XENIA_SCHED_ORDER` env var (`from_env` at line 244).
Defaults to `Fixed`. Plus the env-var `XENIA_SCHED_QUANTUM` for
`ScanQuantum` reload.
There is no `ContentionReplay` variant in the current source today —
the Phase D Stage 3 work landed instead a manifest-consultation
*inside* `rtl_enter_critical_section` (exports.rs), not a new
`OrderMode` (planner's hindsight: putting it in `OrderMode` would be
cleaner; this is documented as a deviation from the original plan).
## 3. Per-slot quantum + decrement_quantum (scheduler.rs:800)
`decrement_quantum` decrements the running thread's
`quantum_remaining`. On reach-zero it reloads (per `quantum_for(order)`
at line 793) and scans the slot's runqueue for a *same-priority* Ready
peer to rotate to. If no peer exists, no rotation happens — the
quantum reload is benign.
Stage 0 (2026-05-18) sweep validated:
- Fixed → ours digest `ba5b5e07…` (since Stage 0 baseline; prior baseline was `e1dfcb15…` before Stage 0 changed default-mode emission).
- ScanQuantum × [10, 50, 200, 1000, 5000, 10000] → all byte-identical to Fixed default. **Why**: tid=1 alone on slot 0 during boot; no peer to rotate to regardless of quantum. Option B (forced-yield across slots) would face the same constraint (and was skipped).
The lesson: rotating *within* a slot doesn't help; tid=1's monolithic
boot region has no other thread on its slot to rotate to.
## 4. park_current / wake_ref (scheduler.rs:840)
`park_current(BlockReason)` is the canonical primitive for parking the
currently-running thread. Used by:
- `RtlEnterCriticalSection` parking on `BlockReason::CriticalSection(cs_ptr)` (exports.rs ~2927).
- `KeWaitForSingleObject` parking on `BlockReason::WaitSingle(handle)`.
- Other primitives.
The wake side calls `Scheduler::wake_ref(ref)` which transitions
HwState::Blocked → HwState::Ready and re-marks the slot's
`non_empty_runnable` mask. FIFO queues for each blocking object
(`cs_waiters[cs_ptr]` etc) live in `kernel-state.rs` style data.
Key property: parking + waking is deterministic per (host run, input),
because every cross-thread interaction goes through the Scheduler
which has no host-OS dependency.
## 5. rtl_enter_critical_section (exports.rs:2886-2946)
Re-read for Phase C+23 verification. Branches:
1. `owner == 0 || !owner_is_live` → claim uncontended.
2. `owner == current_tid` → recursive bump.
3. otherwise → push self onto `cs_waiters[cs_ptr]`, `park_current(BlockReason::CriticalSection(cs_ptr))`.
**No spin loop.** Goes straight to park. This is the deliberate
asymmetry vs canary's `cs->header.absolute*256` spin. Documented and
intentional — adding spin to ours would not help; the only way ours
"contends" is if a peer thread has the lock at the exact moment
ours's tid=1 reaches the call.
In the boot region around event 104,604, ours tid=1 is the only
runnable thread on slot 0 — no peer is even Ready to take the CS
first. So ours invariably fast-paths.
## 6. Contention manifest loader (contention_manifest.rs)
Phase D Stage 3 landed `crates/xenia-kernel/src/contention_manifest.rs`
(342 LOC) with `consume_at_peek(tid, peek_idx)` that translates ours's
per-tid idx back to canary's idx space (subtracts prior
`contention.observed` emits). `XENIA_CONTENTION_MANIFEST_PATH` env var
opts in. Per the Stage 3+4 result: replay-mode digest `1d7c6b45…`
stable × 3 cold runs, but main matched-prefix **still 104,607** — the
manifest's forced-contention entries fire at wrong logical positions
because the divergence is upstream of any contention event.
This is a critical input to C+23's recommendation: the Phase D
replay infrastructure is built and stable, but it does NOT unblock
the 104,607 cap. The actual cap-unblock came from the D-extension
diff-tool absorber (band-aid, Phase D 2026-05-18). The structural
fix never landed and has no clear next step.
## 7. Existing determinism guarantees
- Default-mode ours cold digest **`ba5b5e07…`** × 3 reproducible
(Stage 0 / Phase D baseline). Prior `e1dfcb15…` baseline is the
C+19 era constant; the Stage 0 emission tweak shifted it without
changing logic.
- Phase B `image_loaded_sha256 ea8d160e…` unchanged across all 23+
phases.
- All emitted Phase A events are stable on (input, cvars).
## 8. Mismatch surfaces with canary
| dimension | canary | ours |
|---|---|---|
| host threads | 1 per XThread | 1 total |
| inter-thread arbiter | host OS | Scheduler |
| RtlEnterCS spin | spin then wait | park immediately |
| Clock | wallclock (rdtsc) | fixed FILETIME `132_500_000_000_000_000` |
| Wait wakeup ordering | pthread_cond_broadcast race | FIFO `cs_waiters` |
| Yield primitive | host yield | `decrement_quantum` rotation |
Of these, the **clock** and the **wait wakeup ordering** are the
two surfaces beyond CS-contention where canary→ours divergence has
potential to surface. So far Sylpheed exercises them lightly: 2
KeQuerySystemTime calls, 34 wait.begin events total.
## 9. Existing scheduler cvars / lockstep modes
There is no `lockstep` cvar in ours. The closest mode is
`OrderMode::Fixed` (default), which produces a deterministic schedule
keyed entirely on the spawn/wake sequence. Replay via manifest is
opt-in via `XENIA_CONTENTION_MANIFEST_PATH`.
## 10. Implication: ours is the strict side
In any cross-engine deterministic-replay scheme, **ours has to bend
toward canary**, not the other way. Canary's host-OS scheduling
cannot be tamed without rewriting it (out of scope; would also
invalidate it as the oracle, since the "real" Xbox 360 wasn't
deterministic in this sense either). The Phase D plan's H'/H broad
landed Stages 1-4 of this bend — the engine infrastructure is built,
just not load-bearing for the 104,607 cap.

View File

@@ -0,0 +1,456 @@
{
"canary-jitter-1.jsonl": {
"path": "xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl",
"tid6_total_seen": 120002,
"waitbegins_by_sid": {
"3b234bbee19d74cf": 33,
"b8e833ada16e15fa": 31,
"a25a16a4f6f547aa": 19,
"2a70efeeed4f4fb6": 13,
"72a4170012353517": 9,
"eec602f5f9aa4bac": 3,
"1938a086284cdbf1": 1,
"cf2f57a69895b36c": 1,
"648cb0d5adfa9125": 1,
"75ae880ec432eb36": 1
},
"rtlenter_calls": 19519,
"rtlleave_calls": 19519,
"window_events": [
{
"idx": 104595,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104596,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104597,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104598,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104599,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104600,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104601,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104602,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104603,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104604,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104605,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104606,
"kind": "wait.begin",
"name": "",
"sid": "75ae880ec432eb36",
"timeout_ns": -1
},
{
"idx": 104607,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104608,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104609,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104610,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104611,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104612,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104613,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104614,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104615,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104616,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104617,
"kind": "import.call",
"name": "NtClose"
},
{
"idx": 104618,
"kind": "kernel.call",
"name": "NtClose"
},
{
"idx": 104619,
"kind": "handle.destroy",
"name": ""
},
{
"idx": 104620,
"kind": "kernel.return",
"name": "NtClose"
}
]
},
"canary-jitter-2.jsonl": {
"path": "xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-2.jsonl",
"tid6_total_seen": 120002,
"waitbegins_by_sid": {
"8ec49cc7eb991db6": 33,
"d896adc3741c77c1": 31,
"a25a16a4f6f547aa": 27,
"2a70efeeed4f4fb6": 14,
"72a4170012353517": 13,
"7b3b3faec1388b19": 2,
"92b9c026e295e0e5": 2,
"1938a086284cdbf1": 1,
"cf2f57a69895b36c": 1,
"648cb0d5adfa9125": 1
},
"rtlenter_calls": 19519,
"rtlleave_calls": 19517,
"window_events": [
{
"idx": 104595,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104596,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104597,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104598,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104599,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104600,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104601,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104602,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104603,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104604,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104605,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104606,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104607,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104608,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104609,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104610,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104611,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104612,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104613,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104614,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104615,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104616,
"kind": "import.call",
"name": "NtClose"
},
{
"idx": 104617,
"kind": "kernel.call",
"name": "NtClose"
},
{
"idx": 104618,
"kind": "handle.destroy",
"name": ""
},
{
"idx": 104619,
"kind": "kernel.return",
"name": "NtClose"
},
{
"idx": 104620,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
}
]
},
"canary-jitter-3.jsonl": {
"path": "xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-3.jsonl",
"tid6_total_seen": 120002,
"waitbegins_by_sid": {
"9eda93a619ebd4ca": 34,
"84fe8d4c3a65f040": 31,
"2a70efeeed4f4fb6": 12,
"a25a16a4f6f547aa": 11,
"72a4170012353517": 9,
"c9f426cc34f55865": 3,
"7b3b3faec1388b19": 2,
"92b9c026e295e0e5": 2,
"1938a086284cdbf1": 1,
"cf2f57a69895b36c": 1
},
"rtlenter_calls": 19519,
"rtlleave_calls": 19519,
"window_events": [
{
"idx": 104595,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104596,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104597,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104598,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104599,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104600,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104601,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104602,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104603,
"kind": "wait.begin",
"name": "",
"sid": "a25a16a4f6f547aa",
"timeout_ns": -1
},
{
"idx": 104604,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104605,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104606,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104607,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104608,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104609,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104610,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104611,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104612,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104613,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104614,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104615,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104616,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104617,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104618,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104619,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104620,
"kind": "import.call",
"name": "NtClose"
}
]
}
}

View File

@@ -0,0 +1,97 @@
#!/usr/bin/env python3
"""Phase C+23 jitter profile probe.
Reads canary jsonls (jitter-1/2/3 + cold-c21 + any fresh runs) and extracts:
- total tid=6 events seen within the first ~120k indices
- the exact event sequence on tid=6 around idx [104,595..104,620]
- count of wait.begin events on tid=6 by SID
- count of contention-prone events (wait.begin, kernel.call RtlEnter / RtlLeave)
Designed to stream line-by-line and not load multi-GB jsonls into RAM.
"""
import json
import os
import sys
from collections import Counter, defaultdict
WINDOW_LO = 104_595
WINDOW_HI = 104_620
TID = 6
TID_EVENT_LIMIT = 120_000
def profile(path: str):
if not os.path.exists(path):
return None
tid6_events = 0
waitbegins = Counter()
importcalls = Counter()
kernelcalls = Counter()
window_events = []
tid_idx = -1
with open(path, "rb") as fh:
for raw in fh:
# Cheap reject before json parse: must contain `"tid":6,`
if b'"tid":6,' not in raw and b'"tid": 6,' not in raw:
continue
try:
ev = json.loads(raw)
except Exception:
continue
if ev.get("tid") != TID:
continue
tid_idx = ev.get("tid_event_idx", tid_idx + 1)
tid6_events += 1
kind = ev.get("kind", "")
if kind == "wait.begin":
sids = ev.get("payload", {}).get("handles_semantic_ids") or []
for s in sids:
waitbegins[s] += 1
elif kind == "import.call":
name = ev.get("payload", {}).get("name", "")
importcalls[name] += 1
elif kind == "kernel.call":
name = ev.get("payload", {}).get("name", "")
kernelcalls[name] += 1
if WINDOW_LO <= tid_idx <= WINDOW_HI:
summary = {
"idx": tid_idx,
"kind": kind,
"name": ev.get("payload", {}).get("name", ""),
}
if kind == "wait.begin":
summary["sid"] = (ev.get("payload", {}).get("handles_semantic_ids") or [None])[0]
summary["timeout_ns"] = ev.get("payload", {}).get("timeout_ns")
window_events.append(summary)
if tid_idx > TID_EVENT_LIMIT:
break
return {
"path": path,
"tid6_total_seen": tid6_events,
"waitbegins_by_sid": dict(waitbegins.most_common(10)),
"rtlenter_calls": importcalls.get("RtlEnterCriticalSection", 0),
"rtlleave_calls": importcalls.get("RtlLeaveCriticalSection", 0),
"window_events": window_events,
}
def main(paths):
out = {}
for p in paths:
print(f"profiling {p}...", file=sys.stderr)
r = profile(p)
if r is None:
print(f" (missing)", file=sys.stderr)
continue
out[os.path.basename(p)] = r
json.dump(out, sys.stdout, indent=2)
print()
if __name__ == "__main__":
main(sys.argv[1:])

View File

@@ -0,0 +1,153 @@
profiling xenia-canary/build-cross/bin/Windows/Debug/canary-cold-c21.jsonl...
{
"canary-cold-c21.jsonl": {
"path": "xenia-canary/build-cross/bin/Windows/Debug/canary-cold-c21.jsonl",
"tid6_total_seen": 120002,
"waitbegins_by_sid": {
"8ec49cc7eb991db6": 33,
"14afe71d37ff58a7": 31,
"a25a16a4f6f547aa": 28,
"2a70efeeed4f4fb6": 12,
"72a4170012353517": 10,
"7b3b3faec1388b19": 4,
"92b9c026e295e0e5": 3,
"df2b7bc3c60f41b9": 2,
"eec602f5f9aa4bac": 2,
"1938a086284cdbf1": 1
},
"rtlenter_calls": 19518,
"rtlleave_calls": 19517,
"window_events": [
{
"idx": 104595,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104596,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104597,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104598,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104599,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104600,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104601,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104602,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104603,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104604,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104605,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104606,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104607,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104608,
"kind": "kernel.call",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104609,
"kind": "kernel.return",
"name": "RtlEnterCriticalSection"
},
{
"idx": 104610,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104611,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104612,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104613,
"kind": "import.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104614,
"kind": "kernel.call",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104615,
"kind": "kernel.return",
"name": "RtlLeaveCriticalSection"
},
{
"idx": 104616,
"kind": "import.call",
"name": "NtClose"
},
{
"idx": 104617,
"kind": "kernel.call",
"name": "NtClose"
},
{
"idx": 104618,
"kind": "handle.destroy",
"name": ""
},
{
"idx": 104619,
"kind": "kernel.return",
"name": "NtClose"
},
{
"idx": 104620,
"kind": "import.call",
"name": "RtlEnterCriticalSection"
}
]
}
}

View File

@@ -0,0 +1,154 @@
# Recommendation — Phase C+23
## Top-line: STAY WITH THE BAND-AID
After source-reading both engines + characterizing 4 archived canary
cold runs' jitter shape + reviewing Phase D's H'/H broad outcomes,
the recommended approach is **(ζ) stay with the band-aid**.
The 104,607 cap that originally motivated this track is already
unblocked at the diff-tool layer (Phase D D-extension absorber,
2026-05-18). The next divergence at idx 105,046 is
`VdInitializeEngines.return_value` — a VD-subsystem engine bug, NOT
a scheduling-determinism recurrence. The cost-benefit of pursuing
γ/β/α is no longer compelling because the immediate symptom is
resolved and no structural follow-on cap has appeared.
## Rationale
### 1. The original target is already unblocked.
| metric | pre-C+20 (C+19) | post-C+21 | post-Phase-D D-extension | now |
|---|---|---|---|---|
| Main matched-prefix | 104,606 | 104,607 | **105,046** | 105,046 |
| Sister chains | 11/32/3/41/16 | 11/32/3/41/16 | 11/32/4/41/16 | unchanged |
| Cap class at head | (B) contention | (A) state-mutation | (engine) VD | (engine) VD |
The matched-prefix advanced **+440** since C+19 through diff-tool work
that did NOT touch the engines. The cap class at the head is no longer
scheduling.
### 2. Phase D Stages 1-4 already built the structural infrastructure.
Phase D Stage 1 (canary contention emitter), Stage 2 (manifest builder),
Stage 3 (ours `OrderMode::ContentionReplay` + manifest loader), and
Stage 4 (diff-tool engine-local kinds) ALL LANDED. The engine code is
in tree. What's missing is *coverage of the right contention events*:
the 104,607 divergence was upstream of canary's first
`contention.observed=true` emit (idx 104,664), so the manifest could
not target the right call site.
This means: if we pursue γ (broaden replay to more event classes),
the entry cost is not "start from scratch" but "extend an existing
manifest layer." However, the LOC budget for γ is still ~600 across
both engines, and there is **no proven future cap** that this would
unblock.
### 3. The empirical jitter range is small and fully absorbable.
From `jitter-profile.md`: 4 canary cold samples show 3 distinct
shapes around the contention window. The C+21 absorber + Phase D
D-extension already canonicalize ALL 3 shapes to the same matched
form. Even N=5 or N=10 fresh canary colds would land in one of these
3 shapes (likely with the same absorber outcome).
The SID core (`a25a16a4f6f547aa`, `2a70efeeed4f4fb6`,
`72a4170012353517`) is consistent across cold runs (±20% counts), and
the shared-global SID recipe (C+18) recomputes them deterministically.
The transient "top-2" SIDs (which change per-cold) all flow through
the shared-global absorber.
### 4. Canary cannot be made deterministic without invalidating it.
The host-thread-per-XThread model is what makes canary the *oracle*.
Replacing it (α / β) would require:
- Reworking ~2000-3000 LOC of canary base+kernel.
- Re-validating against the broader canary test corpus (other games).
- Accepting a real risk of breaking Sylpheed-unrelated game-compat.
Approach γ (record-and-replay) avoids touching canary's scheduling
philosophy but requires ours to consume a multi-million-entry trace,
with engineering and runtime cost that should be matched to a *proven*
future scheduling cap.
### 5. The Phase B image hash and ours digest are stable.
`image_loaded_sha256 ea8d160e…` UNCHANGED. Ours default digest
stable × 3 cold runs. There is no signal of latent divergence in the
pre-Phase-A surfaces that would benefit from scheduling alignment.
## What to keep
1. **Phase D Stages 1-4 infrastructure** stays in tree. Cvar
`kernel_emit_contention=false` default-off; `XENIA_CONTENTION_MANIFEST_PATH`
opt-in. Future phases can use them.
2. **All absorbers** (C+18, C+21, D-extension) stay; they are correct
and narrow.
3. **The Stage 0 `OrderMode::ScanQuantum`** stays as a debug knob,
documented as null-result.
## What to defer
1. Approach γ (broader scheduling-trace replay) — defer until a
future cap demonstrably scheduling-related appears.
2. Approach β / α (deterministic preemption / cooperative canary) —
defer indefinitely.
## What to do next
The next phase is **C+24** (or whatever the natural next number) on
the head divergence at idx 105,046: `VdInitializeEngines.return_value`
(canary=1 ours=0). This is a regular engine bug investigation, ~5-50
LOC.
## Fallback: γ trigger criteria
If a future phase finds a NEW scheduling-determinism cap (defined as:
two consecutive divergences whose root cause is contention/wakeup-
ordering across ≥2 guest threads, NOT a guest-code bug or kernel
emit-completeness gap), then revisit γ. The criteria:
- The new cap is ≥1,000 events long.
- The C+21 / D-extension absorbers cannot fold it within their
current cap (32 pairs).
- Empirical jitter sampling (≥3 canary colds) confirms structural
shape divergence, not just SID identity drift.
If all three hold, γ is justified. Estimated ~600 LOC across 4-5
sessions.
## What this recommendation is NOT
- It is NOT "no scheduling work was useful." Stages 1-4 + D-extension
produced the matched-prefix advance from 104,606 → 105,046 (+440).
- It is NOT "the absorbers are perfect forever." They are explicit
band-aids in spirit of reading-error #23, annotated in schema-v1.md
v1.5.
- It is NOT "ours and canary are bit-aligned in contention regions."
They are *measurably* aligned (matched-prefix) but not *structurally*
aligned (the underlying guest events still differ; the absorber
folds the difference).
## Multi-session budget if we proceed (γ scenario only)
Sessions estimated 4-5. NOT scheduled now.
| stage | LOC | est session |
|---|---|---|
| γ-Stage 1: extend canary trace to wake/park/yield | ~150 | 1 |
| γ-Stage 2: extend manifest builder | ~80 | 0.5 |
| γ-Stage 3: generalized replayer in ours | ~250 | 2 |
| γ-Stage 4: diff-tool integration | ~50 | 0.5 |
| γ-Stage 5: validation + sister budgets | n/a | 1 |
| **total** | **~530** | **~5** |
## Acceptance for THIS session (planning-only)
- [x] Planning artifacts in `audit-runs/phase-c23-scheduler-determinism-plan/`.
- [x] Engine sources UNCHANGED (verified by file listing — only
documentation + 1 python probe written).
- [x] Diff tool UNCHANGED.
- [x] Memory entry to be written next.
- [x] Recommendation justified against C+21 band-aid + breadth of
contention regions + multi-session budget.