Files
xenia-rs/audit-runs/phase-c23-scheduler-determinism-plan/canary-threading-model.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

6.0 KiB

Canary threading model — Phase C+23 characterization

Re-verifies the threading model captured in the 2026-05-18 plan against current sources. Key citations re-checked today (2026-05-21):

1. Threading abstraction: host-thread-per-XThread

Canary spawns one host std::thread per guest XThread.

  • xenia-canary/src/xenia/kernel/xthread.cc:315 XThread::Create() builds xe::threading::Thread::CreationParameters and calls xe::threading::Thread::Create(params, [this]() { … }) at line 421 (verified line-of-sight today via Grep).
  • xenia-canary/src/xenia/base/threading_posix.cc / threading_win.cc implement Thread::Create via pthread_create / CreateThread. There is no cooperative or fiber-based path.
  • XHostThread::Execute() (xthread.cc:1244) is the host-thread entry for native kernel threads (XAudio/Xam internals); it also runs on a dedicated host thread.

Consequence: scheduling between guest threads is performed by the host OS (Wine→Linux NPTL on this rig). Canary itself owns no inter-thread ordering policy beyond setting ThreadPriority and affinity hints.

2. Scheduler control / determinism cvars

Grepped canary for cvars touching scheduling determinism. No lockstep, no deterministic, no cooperative_scheduling, no single_thread. The only related knobs:

  • clock_no_scaling — already on by default; affects guest clock source, not scheduling.
  • clock_source_raw — toggles rdtsc vs HostSystemTime; orthogonal.
  • ignore_thread_priorities — drops priority hints (does NOT prevent preemption).
  • ignore_thread_affinities — drops affinity hints.

None of these constrain which host thread runs at which wall moment. They cannot make canary deterministic.

3. Contention source — where host-scheduler timing leaks into guest events

xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:597 RtlEnterCriticalSection_entry. Verified current:

void RtlEnterCriticalSection_entry(pointer_t<X_RTL_CRITICAL_SECTION> cs) {
  
  uint32_t spin_count = cs->header.absolute * 256;   // line 604

  if (cs->owning_thread == cur_thread) { /* recursive fast path */ }

  while (spin_count--) {
    if (xe::atomic_cas(-1, 0, &cs->lock_count)) { /* uncontended fast path */ }
  }                                                  // line 614-618

  if (xe::atomic_inc(&cs->lock_count) != 0) {        // contended slow path
    xeKeWaitForSingleObject(...);                    // emits wait.begin
  }
}

The branch taken depends on whether atomic_cas(-1, 0, &lock_count) succeeds in a host-OS-scheduled spin window. Spin success vs failure is determined entirely by whether the peer guest thread that holds the lock releases it in time, which is determined by host scheduling.

Other contention surfaces examined:

  • RtlLeaveCriticalSection_entry (xboxkrnl_rtl.cc:670) — non-blocking, signals dispatcher event when transitioning to 0. Deterministic per call but the event observers race.
  • xeKeWaitForSingleObject (xboxkrnl_threading.cc:969) — wait primitive itself sequential, but the wakeup ordering across multi-waiter queues uses host atomics + signal broadcast → host-OS dependent.
  • KeSetEvent, KeReleaseSemaphore — atomic dispatcher state + xe::threading::Event::Set() → host condvar broadcast → host-OS scheduler picks which waiter to run.

The fundamental knob: every blocking primitive eventually defers to xe::threading::Wait() which on POSIX uses pthread_cond_timedwait and on Windows uses WaitFor*Object — both subject to non-deterministic wakeup ordering when N>1 waiters race.

4. Wine effects (this rig)

Canary runs under Wine on Linux on this rig. Wine implements CreateThread/WaitFor*Object over POSIX threads + futexes. Known sources of additional non-determinism:

  • Wine's NtWaitForSingleObject adds a wait-queue lock layer; wakeup ordering may differ from native Windows.
  • Wine KeAcquireSpinLock paths use atomic spinlocks → host CPU scheduling jitter visible.
  • File IO (NtCreateFile / NtReadFile) is dispatched into Wine's ntdll server thread → cross-thread completion timing depends on the Linux kernel's epoll wakeups.
  • Linux CFS preemption: any host thread can lose its slice at any instruction boundary. Even with taskset -c 0 pinning, the CFS scheduler interleaves wakeups across runnable threads non-deterministically because of vruntime accounting.

5. Implication for scheduling-alignment

To bit-align canary, the host OS would need to be replaced by a deterministic scheduler. Three (impractical) approaches:

  1. Single-CPU-pin + SCHED_FIFO + disable IO interrupts — partial, still suffers Wine internal threads.
  2. Replace xe::threading::Thread::Create with a cooperative single-host-thread fiber runtime — ~2000-3000 LOC across base/ threading + xthread.cc. Risks destabilising canary as oracle.
  3. Use Linux rr (Mozilla record-and-replay) on canary — out of scope; depends on kernel features and gives byte-identical replay but cannot align to ours.

None of these are gateable in a single phase. The plan therefore treats canary's host-scheduler-driven jitter as input noise to be sidestepped, not eliminated.

6. What this means for ours

Ours's single-host-thread cooperative scheduler is more deterministic than canary. The asymmetry is structural and well- documented:

  • ours digest e1dfcb15… reproducible across 23+ phases.
  • canary jitter at any wait/CS region varies cold-to-cold.

The "right" question for C+23 is therefore how to bridge that asymmetry at the diff-tool layer or via a recording oracle, rather than how to make canary deterministic. The 2026-05-18 Stage 0 spike already confirmed quantum-tuning ours's scheduler can't help (no peer thread on slot 0 during boot to rotate to).

7. Cvars touched in canary today

xenia-canary/src/xenia/kernel/util/event_log.cc (Phase A schema emitter): cvar kernel_emit_contention=false default-off was landed in Phase D Stage 1; verified by Grep today still present. Its emission alone does not change canary determinism.