# Canary threading model — Phase C+23 characterization Re-verifies the threading model captured in the 2026-05-18 plan against current sources. Key citations re-checked today (2026-05-21): ## 1. Threading abstraction: host-thread-per-XThread Canary spawns one host `std::thread` per guest XThread. - `xenia-canary/src/xenia/kernel/xthread.cc:315` `XThread::Create()` builds `xe::threading::Thread::CreationParameters` and calls `xe::threading::Thread::Create(params, [this]() { … })` at line 421 (verified line-of-sight today via Grep). - `xenia-canary/src/xenia/base/threading_posix.cc` / `threading_win.cc` implement `Thread::Create` via `pthread_create` / `CreateThread`. There is no cooperative or fiber-based path. - `XHostThread::Execute()` (xthread.cc:1244) is the host-thread entry for native kernel threads (XAudio/Xam internals); it also runs on a dedicated host thread. Consequence: scheduling between guest threads is performed by the host OS (Wine→Linux NPTL on this rig). Canary itself owns no inter-thread ordering policy beyond setting `ThreadPriority` and affinity hints. ## 2. Scheduler control / determinism cvars Grepped canary for cvars touching scheduling determinism. No `lockstep`, no `deterministic`, no `cooperative_scheduling`, no `single_thread`. The only related knobs: - `clock_no_scaling` — already on by default; affects guest clock source, not scheduling. - `clock_source_raw` — toggles rdtsc vs HostSystemTime; orthogonal. - `ignore_thread_priorities` — drops priority hints (does NOT prevent preemption). - `ignore_thread_affinities` — drops affinity hints. None of these constrain *which* host thread runs at *which* wall moment. They cannot make canary deterministic. ## 3. Contention source — where host-scheduler timing leaks into guest events `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:597` `RtlEnterCriticalSection_entry`. Verified current: ```cpp void RtlEnterCriticalSection_entry(pointer_t cs) { … uint32_t spin_count = cs->header.absolute * 256; // line 604 if (cs->owning_thread == cur_thread) { /* recursive fast path */ } while (spin_count--) { if (xe::atomic_cas(-1, 0, &cs->lock_count)) { /* uncontended fast path */ } } // line 614-618 if (xe::atomic_inc(&cs->lock_count) != 0) { // contended slow path xeKeWaitForSingleObject(...); // emits wait.begin } } ``` The branch taken depends on whether `atomic_cas(-1, 0, &lock_count)` succeeds in a host-OS-scheduled spin window. Spin success vs failure is determined entirely by whether the *peer guest thread that holds the lock* releases it in time, which is determined by host scheduling. Other contention surfaces examined: - `RtlLeaveCriticalSection_entry` (xboxkrnl_rtl.cc:670) — non-blocking, signals dispatcher event when transitioning to 0. Deterministic per call but the event observers race. - `xeKeWaitForSingleObject` (xboxkrnl_threading.cc:969) — wait primitive itself sequential, but the wakeup ordering across multi-waiter queues uses host atomics + signal broadcast → host-OS dependent. - `KeSetEvent`, `KeReleaseSemaphore` — atomic dispatcher state + `xe::threading::Event::Set()` → host condvar broadcast → host-OS scheduler picks which waiter to run. The fundamental knob: every blocking primitive eventually defers to `xe::threading::Wait()` which on POSIX uses `pthread_cond_timedwait` and on Windows uses `WaitFor*Object` — both subject to non-deterministic wakeup ordering when N>1 waiters race. ## 4. Wine effects (this rig) Canary runs under Wine on Linux on this rig. Wine implements `CreateThread`/`WaitFor*Object` over POSIX threads + futexes. Known sources of additional non-determinism: - Wine's `NtWaitForSingleObject` adds a wait-queue lock layer; wakeup ordering may differ from native Windows. - Wine `KeAcquireSpinLock` paths use atomic spinlocks → host CPU scheduling jitter visible. - File IO (NtCreateFile / NtReadFile) is dispatched into Wine's `ntdll` server thread → cross-thread completion timing depends on the Linux kernel's epoll wakeups. - Linux CFS preemption: any host thread can lose its slice at any instruction boundary. Even with `taskset -c 0` pinning, the CFS scheduler interleaves wakeups across runnable threads non-deterministically because of vruntime accounting. ## 5. Implication for scheduling-alignment To bit-align canary, the host OS would need to be replaced by a deterministic scheduler. Three (impractical) approaches: 1. Single-CPU-pin + `SCHED_FIFO` + disable IO interrupts — partial, still suffers Wine internal threads. 2. Replace `xe::threading::Thread::Create` with a cooperative single-host-thread fiber runtime — ~2000-3000 LOC across base/ threading + xthread.cc. Risks destabilising canary as oracle. 3. Use Linux `rr` (Mozilla record-and-replay) on canary — out of scope; depends on kernel features and gives byte-identical replay but cannot align to ours. None of these are gateable in a single phase. The plan therefore treats canary's host-scheduler-driven jitter as **input noise to be sidestepped**, not eliminated. ## 6. What this means for ours Ours's single-host-thread cooperative scheduler is *more deterministic* than canary. The asymmetry is structural and well- documented: - ours digest `e1dfcb15…` reproducible across 23+ phases. - canary jitter at any wait/CS region varies cold-to-cold. The "right" question for C+23 is therefore **how to bridge that asymmetry at the diff-tool layer or via a recording oracle**, rather than how to make canary deterministic. The 2026-05-18 Stage 0 spike already confirmed quantum-tuning ours's scheduler can't help (no peer thread on slot 0 during boot to rotate to). ## 7. Cvars touched in canary today `xenia-canary/src/xenia/kernel/util/event_log.cc` (Phase A schema emitter): cvar `kernel_emit_contention=false` default-off was landed in Phase D Stage 1; verified by Grep today still present. Its emission alone does not change canary determinism.