Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
143 lines
6.0 KiB
Markdown
143 lines
6.0 KiB
Markdown
# Canary threading model — Phase C+23 characterization
|
|
|
|
Re-verifies the threading model captured in the 2026-05-18 plan against
|
|
current sources. Key citations re-checked today (2026-05-21):
|
|
|
|
## 1. Threading abstraction: host-thread-per-XThread
|
|
|
|
Canary spawns one host `std::thread` per guest XThread.
|
|
|
|
- `xenia-canary/src/xenia/kernel/xthread.cc:315` `XThread::Create()`
|
|
builds `xe::threading::Thread::CreationParameters` and calls
|
|
`xe::threading::Thread::Create(params, [this]() { … })` at line 421
|
|
(verified line-of-sight today via Grep).
|
|
- `xenia-canary/src/xenia/base/threading_posix.cc` /
|
|
`threading_win.cc` implement `Thread::Create` via `pthread_create` /
|
|
`CreateThread`. There is no cooperative or fiber-based path.
|
|
- `XHostThread::Execute()` (xthread.cc:1244) is the host-thread entry
|
|
for native kernel threads (XAudio/Xam internals); it also runs on a
|
|
dedicated host thread.
|
|
|
|
Consequence: scheduling between guest threads is performed by the host
|
|
OS (Wine→Linux NPTL on this rig). Canary itself owns no inter-thread
|
|
ordering policy beyond setting `ThreadPriority` and affinity hints.
|
|
|
|
## 2. Scheduler control / determinism cvars
|
|
|
|
Grepped canary for cvars touching scheduling determinism. No
|
|
`lockstep`, no `deterministic`, no `cooperative_scheduling`, no
|
|
`single_thread`. The only related knobs:
|
|
|
|
- `clock_no_scaling` — already on by default; affects guest clock
|
|
source, not scheduling.
|
|
- `clock_source_raw` — toggles rdtsc vs HostSystemTime; orthogonal.
|
|
- `ignore_thread_priorities` — drops priority hints (does NOT prevent
|
|
preemption).
|
|
- `ignore_thread_affinities` — drops affinity hints.
|
|
|
|
None of these constrain *which* host thread runs at *which* wall
|
|
moment. They cannot make canary deterministic.
|
|
|
|
## 3. Contention source — where host-scheduler timing leaks into guest events
|
|
|
|
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:597`
|
|
`RtlEnterCriticalSection_entry`. Verified current:
|
|
|
|
```cpp
|
|
void RtlEnterCriticalSection_entry(pointer_t<X_RTL_CRITICAL_SECTION> cs) {
|
|
…
|
|
uint32_t spin_count = cs->header.absolute * 256; // line 604
|
|
|
|
if (cs->owning_thread == cur_thread) { /* recursive fast path */ }
|
|
|
|
while (spin_count--) {
|
|
if (xe::atomic_cas(-1, 0, &cs->lock_count)) { /* uncontended fast path */ }
|
|
} // line 614-618
|
|
|
|
if (xe::atomic_inc(&cs->lock_count) != 0) { // contended slow path
|
|
xeKeWaitForSingleObject(...); // emits wait.begin
|
|
}
|
|
}
|
|
```
|
|
|
|
The branch taken depends on whether `atomic_cas(-1, 0, &lock_count)`
|
|
succeeds in a host-OS-scheduled spin window. Spin success vs failure
|
|
is determined entirely by whether the *peer guest thread that holds
|
|
the lock* releases it in time, which is determined by host scheduling.
|
|
|
|
Other contention surfaces examined:
|
|
|
|
- `RtlLeaveCriticalSection_entry` (xboxkrnl_rtl.cc:670) — non-blocking,
|
|
signals dispatcher event when transitioning to 0. Deterministic per
|
|
call but the event observers race.
|
|
- `xeKeWaitForSingleObject` (xboxkrnl_threading.cc:969) — wait
|
|
primitive itself sequential, but the wakeup ordering across
|
|
multi-waiter queues uses host atomics + signal broadcast → host-OS
|
|
dependent.
|
|
- `KeSetEvent`, `KeReleaseSemaphore` — atomic dispatcher state +
|
|
`xe::threading::Event::Set()` → host condvar broadcast → host-OS
|
|
scheduler picks which waiter to run.
|
|
|
|
The fundamental knob: every blocking primitive eventually defers to
|
|
`xe::threading::Wait()` which on POSIX uses `pthread_cond_timedwait`
|
|
and on Windows uses `WaitFor*Object` — both subject to non-deterministic
|
|
wakeup ordering when N>1 waiters race.
|
|
|
|
## 4. Wine effects (this rig)
|
|
|
|
Canary runs under Wine on Linux on this rig. Wine implements
|
|
`CreateThread`/`WaitFor*Object` over POSIX threads + futexes. Known
|
|
sources of additional non-determinism:
|
|
|
|
- Wine's `NtWaitForSingleObject` adds a wait-queue lock layer; wakeup
|
|
ordering may differ from native Windows.
|
|
- Wine `KeAcquireSpinLock` paths use atomic spinlocks → host CPU
|
|
scheduling jitter visible.
|
|
- File IO (NtCreateFile / NtReadFile) is dispatched into Wine's
|
|
`ntdll` server thread → cross-thread completion timing depends on
|
|
the Linux kernel's epoll wakeups.
|
|
- Linux CFS preemption: any host thread can lose its slice at any
|
|
instruction boundary. Even with `taskset -c 0` pinning, the CFS
|
|
scheduler interleaves wakeups across runnable threads
|
|
non-deterministically because of vruntime accounting.
|
|
|
|
## 5. Implication for scheduling-alignment
|
|
|
|
To bit-align canary, the host OS would need to be replaced by a
|
|
deterministic scheduler. Three (impractical) approaches:
|
|
|
|
1. Single-CPU-pin + `SCHED_FIFO` + disable IO interrupts — partial,
|
|
still suffers Wine internal threads.
|
|
2. Replace `xe::threading::Thread::Create` with a cooperative
|
|
single-host-thread fiber runtime — ~2000-3000 LOC across base/
|
|
threading + xthread.cc. Risks destabilising canary as oracle.
|
|
3. Use Linux `rr` (Mozilla record-and-replay) on canary — out of
|
|
scope; depends on kernel features and gives byte-identical replay
|
|
but cannot align to ours.
|
|
|
|
None of these are gateable in a single phase. The plan therefore
|
|
treats canary's host-scheduler-driven jitter as **input noise to be
|
|
sidestepped**, not eliminated.
|
|
|
|
## 6. What this means for ours
|
|
|
|
Ours's single-host-thread cooperative scheduler is *more
|
|
deterministic* than canary. The asymmetry is structural and well-
|
|
documented:
|
|
|
|
- ours digest `e1dfcb15…` reproducible across 23+ phases.
|
|
- canary jitter at any wait/CS region varies cold-to-cold.
|
|
|
|
The "right" question for C+23 is therefore **how to bridge that
|
|
asymmetry at the diff-tool layer or via a recording oracle**, rather
|
|
than how to make canary deterministic. The 2026-05-18 Stage 0 spike
|
|
already confirmed quantum-tuning ours's scheduler can't help (no
|
|
peer thread on slot 0 during boot to rotate to).
|
|
|
|
## 7. Cvars touched in canary today
|
|
|
|
`xenia-canary/src/xenia/kernel/util/event_log.cc` (Phase A schema
|
|
emitter): cvar `kernel_emit_contention=false` default-off was landed
|
|
in Phase D Stage 1; verified by Grep today still present. Its
|
|
emission alone does not change canary determinism.
|