handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,142 @@
|
||||
# Canary threading model — Phase C+23 characterization
|
||||
|
||||
Re-verifies the threading model captured in the 2026-05-18 plan against
|
||||
current sources. Key citations re-checked today (2026-05-21):
|
||||
|
||||
## 1. Threading abstraction: host-thread-per-XThread
|
||||
|
||||
Canary spawns one host `std::thread` per guest XThread.
|
||||
|
||||
- `xenia-canary/src/xenia/kernel/xthread.cc:315` `XThread::Create()`
|
||||
builds `xe::threading::Thread::CreationParameters` and calls
|
||||
`xe::threading::Thread::Create(params, [this]() { … })` at line 421
|
||||
(verified line-of-sight today via Grep).
|
||||
- `xenia-canary/src/xenia/base/threading_posix.cc` /
|
||||
`threading_win.cc` implement `Thread::Create` via `pthread_create` /
|
||||
`CreateThread`. There is no cooperative or fiber-based path.
|
||||
- `XHostThread::Execute()` (xthread.cc:1244) is the host-thread entry
|
||||
for native kernel threads (XAudio/Xam internals); it also runs on a
|
||||
dedicated host thread.
|
||||
|
||||
Consequence: scheduling between guest threads is performed by the host
|
||||
OS (Wine→Linux NPTL on this rig). Canary itself owns no inter-thread
|
||||
ordering policy beyond setting `ThreadPriority` and affinity hints.
|
||||
|
||||
## 2. Scheduler control / determinism cvars
|
||||
|
||||
Grepped canary for cvars touching scheduling determinism. No
|
||||
`lockstep`, no `deterministic`, no `cooperative_scheduling`, no
|
||||
`single_thread`. The only related knobs:
|
||||
|
||||
- `clock_no_scaling` — already on by default; affects guest clock
|
||||
source, not scheduling.
|
||||
- `clock_source_raw` — toggles rdtsc vs HostSystemTime; orthogonal.
|
||||
- `ignore_thread_priorities` — drops priority hints (does NOT prevent
|
||||
preemption).
|
||||
- `ignore_thread_affinities` — drops affinity hints.
|
||||
|
||||
None of these constrain *which* host thread runs at *which* wall
|
||||
moment. They cannot make canary deterministic.
|
||||
|
||||
## 3. Contention source — where host-scheduler timing leaks into guest events
|
||||
|
||||
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:597`
|
||||
`RtlEnterCriticalSection_entry`. Verified current:
|
||||
|
||||
```cpp
|
||||
void RtlEnterCriticalSection_entry(pointer_t<X_RTL_CRITICAL_SECTION> cs) {
|
||||
…
|
||||
uint32_t spin_count = cs->header.absolute * 256; // line 604
|
||||
|
||||
if (cs->owning_thread == cur_thread) { /* recursive fast path */ }
|
||||
|
||||
while (spin_count--) {
|
||||
if (xe::atomic_cas(-1, 0, &cs->lock_count)) { /* uncontended fast path */ }
|
||||
} // line 614-618
|
||||
|
||||
if (xe::atomic_inc(&cs->lock_count) != 0) { // contended slow path
|
||||
xeKeWaitForSingleObject(...); // emits wait.begin
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The branch taken depends on whether `atomic_cas(-1, 0, &lock_count)`
|
||||
succeeds in a host-OS-scheduled spin window. Spin success vs failure
|
||||
is determined entirely by whether the *peer guest thread that holds
|
||||
the lock* releases it in time, which is determined by host scheduling.
|
||||
|
||||
Other contention surfaces examined:
|
||||
|
||||
- `RtlLeaveCriticalSection_entry` (xboxkrnl_rtl.cc:670) — non-blocking,
|
||||
signals dispatcher event when transitioning to 0. Deterministic per
|
||||
call but the event observers race.
|
||||
- `xeKeWaitForSingleObject` (xboxkrnl_threading.cc:969) — wait
|
||||
primitive itself sequential, but the wakeup ordering across
|
||||
multi-waiter queues uses host atomics + signal broadcast → host-OS
|
||||
dependent.
|
||||
- `KeSetEvent`, `KeReleaseSemaphore` — atomic dispatcher state +
|
||||
`xe::threading::Event::Set()` → host condvar broadcast → host-OS
|
||||
scheduler picks which waiter to run.
|
||||
|
||||
The fundamental knob: every blocking primitive eventually defers to
|
||||
`xe::threading::Wait()` which on POSIX uses `pthread_cond_timedwait`
|
||||
and on Windows uses `WaitFor*Object` — both subject to non-deterministic
|
||||
wakeup ordering when N>1 waiters race.
|
||||
|
||||
## 4. Wine effects (this rig)
|
||||
|
||||
Canary runs under Wine on Linux on this rig. Wine implements
|
||||
`CreateThread`/`WaitFor*Object` over POSIX threads + futexes. Known
|
||||
sources of additional non-determinism:
|
||||
|
||||
- Wine's `NtWaitForSingleObject` adds a wait-queue lock layer; wakeup
|
||||
ordering may differ from native Windows.
|
||||
- Wine `KeAcquireSpinLock` paths use atomic spinlocks → host CPU
|
||||
scheduling jitter visible.
|
||||
- File IO (NtCreateFile / NtReadFile) is dispatched into Wine's
|
||||
`ntdll` server thread → cross-thread completion timing depends on
|
||||
the Linux kernel's epoll wakeups.
|
||||
- Linux CFS preemption: any host thread can lose its slice at any
|
||||
instruction boundary. Even with `taskset -c 0` pinning, the CFS
|
||||
scheduler interleaves wakeups across runnable threads
|
||||
non-deterministically because of vruntime accounting.
|
||||
|
||||
## 5. Implication for scheduling-alignment
|
||||
|
||||
To bit-align canary, the host OS would need to be replaced by a
|
||||
deterministic scheduler. Three (impractical) approaches:
|
||||
|
||||
1. Single-CPU-pin + `SCHED_FIFO` + disable IO interrupts — partial,
|
||||
still suffers Wine internal threads.
|
||||
2. Replace `xe::threading::Thread::Create` with a cooperative
|
||||
single-host-thread fiber runtime — ~2000-3000 LOC across base/
|
||||
threading + xthread.cc. Risks destabilising canary as oracle.
|
||||
3. Use Linux `rr` (Mozilla record-and-replay) on canary — out of
|
||||
scope; depends on kernel features and gives byte-identical replay
|
||||
but cannot align to ours.
|
||||
|
||||
None of these are gateable in a single phase. The plan therefore
|
||||
treats canary's host-scheduler-driven jitter as **input noise to be
|
||||
sidestepped**, not eliminated.
|
||||
|
||||
## 6. What this means for ours
|
||||
|
||||
Ours's single-host-thread cooperative scheduler is *more
|
||||
deterministic* than canary. The asymmetry is structural and well-
|
||||
documented:
|
||||
|
||||
- ours digest `e1dfcb15…` reproducible across 23+ phases.
|
||||
- canary jitter at any wait/CS region varies cold-to-cold.
|
||||
|
||||
The "right" question for C+23 is therefore **how to bridge that
|
||||
asymmetry at the diff-tool layer or via a recording oracle**, rather
|
||||
than how to make canary deterministic. The 2026-05-18 Stage 0 spike
|
||||
already confirmed quantum-tuning ours's scheduler can't help (no
|
||||
peer thread on slot 0 during boot to rotate to).
|
||||
|
||||
## 7. Cvars touched in canary today
|
||||
|
||||
`xenia-canary/src/xenia/kernel/util/event_log.cc` (Phase A schema
|
||||
emitter): cvar `kernel_emit_contention=false` default-off was landed
|
||||
in Phase D Stage 1; verified by Grep today still present. Its
|
||||
emission alone does not change canary determinism.
|
||||
Reference in New Issue
Block a user