handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,142 @@
# Canary threading model — Phase C+23 characterization
Re-verifies the threading model captured in the 2026-05-18 plan against
current sources. Key citations re-checked today (2026-05-21):
## 1. Threading abstraction: host-thread-per-XThread
Canary spawns one host `std::thread` per guest XThread.
- `xenia-canary/src/xenia/kernel/xthread.cc:315` `XThread::Create()`
builds `xe::threading::Thread::CreationParameters` and calls
`xe::threading::Thread::Create(params, [this]() { … })` at line 421
(verified line-of-sight today via Grep).
- `xenia-canary/src/xenia/base/threading_posix.cc` /
`threading_win.cc` implement `Thread::Create` via `pthread_create` /
`CreateThread`. There is no cooperative or fiber-based path.
- `XHostThread::Execute()` (xthread.cc:1244) is the host-thread entry
for native kernel threads (XAudio/Xam internals); it also runs on a
dedicated host thread.
Consequence: scheduling between guest threads is performed by the host
OS (Wine→Linux NPTL on this rig). Canary itself owns no inter-thread
ordering policy beyond setting `ThreadPriority` and affinity hints.
## 2. Scheduler control / determinism cvars
Grepped canary for cvars touching scheduling determinism. No
`lockstep`, no `deterministic`, no `cooperative_scheduling`, no
`single_thread`. The only related knobs:
- `clock_no_scaling` — already on by default; affects guest clock
source, not scheduling.
- `clock_source_raw` — toggles rdtsc vs HostSystemTime; orthogonal.
- `ignore_thread_priorities` — drops priority hints (does NOT prevent
preemption).
- `ignore_thread_affinities` — drops affinity hints.
None of these constrain *which* host thread runs at *which* wall
moment. They cannot make canary deterministic.
## 3. Contention source — where host-scheduler timing leaks into guest events
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:597`
`RtlEnterCriticalSection_entry`. Verified current:
```cpp
void RtlEnterCriticalSection_entry(pointer_t<X_RTL_CRITICAL_SECTION> cs) {
uint32_t spin_count = cs->header.absolute * 256; // line 604
if (cs->owning_thread == cur_thread) { /* recursive fast path */ }
while (spin_count--) {
if (xe::atomic_cas(-1, 0, &cs->lock_count)) { /* uncontended fast path */ }
} // line 614-618
if (xe::atomic_inc(&cs->lock_count) != 0) { // contended slow path
xeKeWaitForSingleObject(...); // emits wait.begin
}
}
```
The branch taken depends on whether `atomic_cas(-1, 0, &lock_count)`
succeeds in a host-OS-scheduled spin window. Spin success vs failure
is determined entirely by whether the *peer guest thread that holds
the lock* releases it in time, which is determined by host scheduling.
Other contention surfaces examined:
- `RtlLeaveCriticalSection_entry` (xboxkrnl_rtl.cc:670) — non-blocking,
signals dispatcher event when transitioning to 0. Deterministic per
call but the event observers race.
- `xeKeWaitForSingleObject` (xboxkrnl_threading.cc:969) — wait
primitive itself sequential, but the wakeup ordering across
multi-waiter queues uses host atomics + signal broadcast → host-OS
dependent.
- `KeSetEvent`, `KeReleaseSemaphore` — atomic dispatcher state +
`xe::threading::Event::Set()` → host condvar broadcast → host-OS
scheduler picks which waiter to run.
The fundamental knob: every blocking primitive eventually defers to
`xe::threading::Wait()` which on POSIX uses `pthread_cond_timedwait`
and on Windows uses `WaitFor*Object` — both subject to non-deterministic
wakeup ordering when N>1 waiters race.
## 4. Wine effects (this rig)
Canary runs under Wine on Linux on this rig. Wine implements
`CreateThread`/`WaitFor*Object` over POSIX threads + futexes. Known
sources of additional non-determinism:
- Wine's `NtWaitForSingleObject` adds a wait-queue lock layer; wakeup
ordering may differ from native Windows.
- Wine `KeAcquireSpinLock` paths use atomic spinlocks → host CPU
scheduling jitter visible.
- File IO (NtCreateFile / NtReadFile) is dispatched into Wine's
`ntdll` server thread → cross-thread completion timing depends on
the Linux kernel's epoll wakeups.
- Linux CFS preemption: any host thread can lose its slice at any
instruction boundary. Even with `taskset -c 0` pinning, the CFS
scheduler interleaves wakeups across runnable threads
non-deterministically because of vruntime accounting.
## 5. Implication for scheduling-alignment
To bit-align canary, the host OS would need to be replaced by a
deterministic scheduler. Three (impractical) approaches:
1. Single-CPU-pin + `SCHED_FIFO` + disable IO interrupts — partial,
still suffers Wine internal threads.
2. Replace `xe::threading::Thread::Create` with a cooperative
single-host-thread fiber runtime — ~2000-3000 LOC across base/
threading + xthread.cc. Risks destabilising canary as oracle.
3. Use Linux `rr` (Mozilla record-and-replay) on canary — out of
scope; depends on kernel features and gives byte-identical replay
but cannot align to ours.
None of these are gateable in a single phase. The plan therefore
treats canary's host-scheduler-driven jitter as **input noise to be
sidestepped**, not eliminated.
## 6. What this means for ours
Ours's single-host-thread cooperative scheduler is *more
deterministic* than canary. The asymmetry is structural and well-
documented:
- ours digest `e1dfcb15…` reproducible across 23+ phases.
- canary jitter at any wait/CS region varies cold-to-cold.
The "right" question for C+23 is therefore **how to bridge that
asymmetry at the diff-tool layer or via a recording oracle**, rather
than how to make canary deterministic. The 2026-05-18 Stage 0 spike
already confirmed quantum-tuning ours's scheduler can't help (no
peer thread on slot 0 during boot to rotate to).
## 7. Cvars touched in canary today
`xenia-canary/src/xenia/kernel/util/event_log.cc` (Phase A schema
emitter): cvar `kernel_emit_contention=false` default-off was landed
in Phase D Stage 1; verified by Grep today still present. Its
emission alone does not change canary determinism.