handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,142 @@
|
||||
# Canary threading model — Phase C+23 characterization
|
||||
|
||||
Re-verifies the threading model captured in the 2026-05-18 plan against
|
||||
current sources. Key citations re-checked today (2026-05-21):
|
||||
|
||||
## 1. Threading abstraction: host-thread-per-XThread
|
||||
|
||||
Canary spawns one host `std::thread` per guest XThread.
|
||||
|
||||
- `xenia-canary/src/xenia/kernel/xthread.cc:315` `XThread::Create()`
|
||||
builds `xe::threading::Thread::CreationParameters` and calls
|
||||
`xe::threading::Thread::Create(params, [this]() { … })` at line 421
|
||||
(verified line-of-sight today via Grep).
|
||||
- `xenia-canary/src/xenia/base/threading_posix.cc` /
|
||||
`threading_win.cc` implement `Thread::Create` via `pthread_create` /
|
||||
`CreateThread`. There is no cooperative or fiber-based path.
|
||||
- `XHostThread::Execute()` (xthread.cc:1244) is the host-thread entry
|
||||
for native kernel threads (XAudio/Xam internals); it also runs on a
|
||||
dedicated host thread.
|
||||
|
||||
Consequence: scheduling between guest threads is performed by the host
|
||||
OS (Wine→Linux NPTL on this rig). Canary itself owns no inter-thread
|
||||
ordering policy beyond setting `ThreadPriority` and affinity hints.
|
||||
|
||||
## 2. Scheduler control / determinism cvars
|
||||
|
||||
Grepped canary for cvars touching scheduling determinism. No
|
||||
`lockstep`, no `deterministic`, no `cooperative_scheduling`, no
|
||||
`single_thread`. The only related knobs:
|
||||
|
||||
- `clock_no_scaling` — already on by default; affects guest clock
|
||||
source, not scheduling.
|
||||
- `clock_source_raw` — toggles rdtsc vs HostSystemTime; orthogonal.
|
||||
- `ignore_thread_priorities` — drops priority hints (does NOT prevent
|
||||
preemption).
|
||||
- `ignore_thread_affinities` — drops affinity hints.
|
||||
|
||||
None of these constrain *which* host thread runs at *which* wall
|
||||
moment. They cannot make canary deterministic.
|
||||
|
||||
## 3. Contention source — where host-scheduler timing leaks into guest events
|
||||
|
||||
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:597`
|
||||
`RtlEnterCriticalSection_entry`. Verified current:
|
||||
|
||||
```cpp
|
||||
void RtlEnterCriticalSection_entry(pointer_t<X_RTL_CRITICAL_SECTION> cs) {
|
||||
…
|
||||
uint32_t spin_count = cs->header.absolute * 256; // line 604
|
||||
|
||||
if (cs->owning_thread == cur_thread) { /* recursive fast path */ }
|
||||
|
||||
while (spin_count--) {
|
||||
if (xe::atomic_cas(-1, 0, &cs->lock_count)) { /* uncontended fast path */ }
|
||||
} // line 614-618
|
||||
|
||||
if (xe::atomic_inc(&cs->lock_count) != 0) { // contended slow path
|
||||
xeKeWaitForSingleObject(...); // emits wait.begin
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The branch taken depends on whether `atomic_cas(-1, 0, &lock_count)`
|
||||
succeeds in a host-OS-scheduled spin window. Spin success vs failure
|
||||
is determined entirely by whether the *peer guest thread that holds
|
||||
the lock* releases it in time, which is determined by host scheduling.
|
||||
|
||||
Other contention surfaces examined:
|
||||
|
||||
- `RtlLeaveCriticalSection_entry` (xboxkrnl_rtl.cc:670) — non-blocking,
|
||||
signals dispatcher event when transitioning to 0. Deterministic per
|
||||
call but the event observers race.
|
||||
- `xeKeWaitForSingleObject` (xboxkrnl_threading.cc:969) — wait
|
||||
primitive itself sequential, but the wakeup ordering across
|
||||
multi-waiter queues uses host atomics + signal broadcast → host-OS
|
||||
dependent.
|
||||
- `KeSetEvent`, `KeReleaseSemaphore` — atomic dispatcher state +
|
||||
`xe::threading::Event::Set()` → host condvar broadcast → host-OS
|
||||
scheduler picks which waiter to run.
|
||||
|
||||
The fundamental knob: every blocking primitive eventually defers to
|
||||
`xe::threading::Wait()` which on POSIX uses `pthread_cond_timedwait`
|
||||
and on Windows uses `WaitFor*Object` — both subject to non-deterministic
|
||||
wakeup ordering when N>1 waiters race.
|
||||
|
||||
## 4. Wine effects (this rig)
|
||||
|
||||
Canary runs under Wine on Linux on this rig. Wine implements
|
||||
`CreateThread`/`WaitFor*Object` over POSIX threads + futexes. Known
|
||||
sources of additional non-determinism:
|
||||
|
||||
- Wine's `NtWaitForSingleObject` adds a wait-queue lock layer; wakeup
|
||||
ordering may differ from native Windows.
|
||||
- Wine `KeAcquireSpinLock` paths use atomic spinlocks → host CPU
|
||||
scheduling jitter visible.
|
||||
- File IO (NtCreateFile / NtReadFile) is dispatched into Wine's
|
||||
`ntdll` server thread → cross-thread completion timing depends on
|
||||
the Linux kernel's epoll wakeups.
|
||||
- Linux CFS preemption: any host thread can lose its slice at any
|
||||
instruction boundary. Even with `taskset -c 0` pinning, the CFS
|
||||
scheduler interleaves wakeups across runnable threads
|
||||
non-deterministically because of vruntime accounting.
|
||||
|
||||
## 5. Implication for scheduling-alignment
|
||||
|
||||
To bit-align canary, the host OS would need to be replaced by a
|
||||
deterministic scheduler. Three (impractical) approaches:
|
||||
|
||||
1. Single-CPU-pin + `SCHED_FIFO` + disable IO interrupts — partial,
|
||||
still suffers Wine internal threads.
|
||||
2. Replace `xe::threading::Thread::Create` with a cooperative
|
||||
single-host-thread fiber runtime — ~2000-3000 LOC across base/
|
||||
threading + xthread.cc. Risks destabilising canary as oracle.
|
||||
3. Use Linux `rr` (Mozilla record-and-replay) on canary — out of
|
||||
scope; depends on kernel features and gives byte-identical replay
|
||||
but cannot align to ours.
|
||||
|
||||
None of these are gateable in a single phase. The plan therefore
|
||||
treats canary's host-scheduler-driven jitter as **input noise to be
|
||||
sidestepped**, not eliminated.
|
||||
|
||||
## 6. What this means for ours
|
||||
|
||||
Ours's single-host-thread cooperative scheduler is *more
|
||||
deterministic* than canary. The asymmetry is structural and well-
|
||||
documented:
|
||||
|
||||
- ours digest `e1dfcb15…` reproducible across 23+ phases.
|
||||
- canary jitter at any wait/CS region varies cold-to-cold.
|
||||
|
||||
The "right" question for C+23 is therefore **how to bridge that
|
||||
asymmetry at the diff-tool layer or via a recording oracle**, rather
|
||||
than how to make canary deterministic. The 2026-05-18 Stage 0 spike
|
||||
already confirmed quantum-tuning ours's scheduler can't help (no
|
||||
peer thread on slot 0 during boot to rotate to).
|
||||
|
||||
## 7. Cvars touched in canary today
|
||||
|
||||
`xenia-canary/src/xenia/kernel/util/event_log.cc` (Phase A schema
|
||||
emitter): cvar `kernel_emit_contention=false` default-off was landed
|
||||
in Phase D Stage 1; verified by Grep today still present. Its
|
||||
emission alone does not change canary determinism.
|
||||
@@ -0,0 +1,214 @@
|
||||
# Candidate strategies — Phase C+23
|
||||
|
||||
Five candidate strategies for aligning canary↔ours contention. Each
|
||||
evaluated on: implementation, scope, behavior risk, coverage,
|
||||
compatibility with existing absorbers.
|
||||
|
||||
## (α) Lockstep cooperative scheduler — both engines
|
||||
|
||||
### What
|
||||
Run both engines as single-host-thread cooperative schedulers, with
|
||||
a shared deterministic policy for "which guest thread runs next at
|
||||
each scheduling boundary". Canary would lose its 1-host-per-1-guest
|
||||
model; ours already cooperative.
|
||||
|
||||
### Scope
|
||||
- canary: ~2000-3000 LOC across `kernel/xthread.cc`, `base/threading.cc`,
|
||||
`base/threading_posix.cc`, `base/threading_win.cc`, `cpu/processor.cc`.
|
||||
Replace `Thread::Create` with a fiber/coroutine runtime. All
|
||||
`pthread_cond_wait`-style waits become explicit scheduler calls.
|
||||
- ours: ~0 LOC (already in this model).
|
||||
|
||||
### Behavior risk
|
||||
**HIGH.** Canary is the *oracle*. Reworking its scheduling philosophy
|
||||
could break game-compat regression (other titles depend on the
|
||||
host-thread behavior). Re-validating Sylpheed alone would not certify
|
||||
this for the broader canary test corpus.
|
||||
|
||||
### Coverage
|
||||
ALL contention sources, deterministically.
|
||||
|
||||
### Compatibility
|
||||
Replaces C+18 / C+21 / D-extension absorbers (they become moot once
|
||||
canary is bit-deterministic). But: if the cooperative canary picks a
|
||||
*different* schedule than ours, the matched-prefix gain is zero —
|
||||
both still diverge, just deterministically. Needs a *shared policy*.
|
||||
|
||||
### Verdict
|
||||
Overscoped. Already rejected in 2026-05-18 plan as approach B.
|
||||
|
||||
---
|
||||
|
||||
## (β) Deterministic preemption points — both engines
|
||||
|
||||
### What
|
||||
Define a finite set of scheduling boundaries that BOTH engines honor
|
||||
(e.g., kernel-call entry, `xeKeWaitForSingleObject`, `RtlEnterCriticalSection`,
|
||||
quantum exhaustion, page-boundary crossings). Between these points,
|
||||
threads run monolithically. The policy at each point is deterministic
|
||||
(e.g., "lowest tid among Ready wins").
|
||||
|
||||
### Scope
|
||||
- canary: ~1000 LOC. Add a `xe::DeterministicScheduler` layer that
|
||||
intercepts kernel-call entry; if multiple guest threads are
|
||||
competing, picks via the shared policy. Disable host preemption
|
||||
outside boundaries (set per-thread `SCHED_FIFO` or use a global
|
||||
`scheduler_mutex` released only at boundaries).
|
||||
- ours: ~200 LOC. Modify `Scheduler::round_schedule` and
|
||||
`decrement_quantum` to honor the same boundary set.
|
||||
|
||||
### Behavior risk
|
||||
**HIGH** on canary. Same oracle-stability concern as (α). MEDIUM on
|
||||
ours; the rotation-at-boundaries is a small generalization of
|
||||
existing logic.
|
||||
|
||||
### Coverage
|
||||
ALL kernel-mediated contention. Does NOT cover non-kernel guest
|
||||
atomics (rare in Sylpheed — probed at 0 occurrences in import
|
||||
inventory).
|
||||
|
||||
### Compatibility
|
||||
Subsumes C+18 / C+21 / D-extension. Same shared-policy requirement
|
||||
as (α).
|
||||
|
||||
### Verdict
|
||||
The right structural answer in principle, but the engineering
|
||||
investment (1200+ LOC across two engines, including a host-side
|
||||
priority-inversion-safe mutex layer in canary) is multi-session
|
||||
heavy. Multi-month-long subaudit. Not justified for the residual
|
||||
divergence past 105,046 unless future titles need it.
|
||||
|
||||
---
|
||||
|
||||
## (γ) Recorded scheduling trace — canary records, ours replays
|
||||
|
||||
### What
|
||||
Canary emits a high-fidelity scheduling trace (every park/wake/
|
||||
context-switch + the guest-cycle each happens at). Ours consumes
|
||||
this trace as its scheduling oracle: at each scheduling point, ours
|
||||
forces its decision to match the trace.
|
||||
|
||||
This generalizes Phase D's contention-manifest from "1 event class
|
||||
on 1 primitive" to "every scheduling decision."
|
||||
|
||||
### Scope
|
||||
- canary: ~200 LOC (extend `kernel_emit_contention` to emit `sched.park`,
|
||||
`sched.wake`, `sched.yield`, `sched.priority_change`).
|
||||
- ours: ~400 LOC (a generalized `SchedulingTraceReplayer` consulted at
|
||||
every park / wake / quantum decision).
|
||||
- Diff tool: ~50 LOC engine-local kinds.
|
||||
|
||||
### Behavior risk
|
||||
LOW on canary (additive emit only, cvar-gated default-off).
|
||||
MEDIUM on ours (replay mode is a new schedule policy; default mode
|
||||
unchanged).
|
||||
|
||||
### Coverage
|
||||
ALL kernel-mediated contention, ALL wait timeouts, ALL priority
|
||||
adjustments. Strong.
|
||||
|
||||
### Compatibility
|
||||
Mostly subsumes C+18 / C+21 absorbers (they remain as safety nets).
|
||||
D-extension absorber may still be needed if upstream state-mutation
|
||||
timing differs by a few host instructions in regions canary's trace
|
||||
doesn't precisely cover.
|
||||
|
||||
### Verdict
|
||||
The "right next step" if structural alignment is the goal. The Phase
|
||||
D Stages 1-4 work is the *foundation* for this; γ broadens to other
|
||||
event classes. Risk: the trace can be enormous (millions of entries
|
||||
for Sylpheed), and the cost-benefit depends on how many *additional*
|
||||
events past 105,046 a broader trace would unlock.
|
||||
|
||||
---
|
||||
|
||||
## (δ) Wine-level controls — single-CPU pin + RT priority
|
||||
|
||||
### What
|
||||
Run canary under Wine with `taskset -c 0`, `chrt --rr 99`, disable
|
||||
kernel preemption flags. Reduce canary's host-OS jitter without
|
||||
modifying code.
|
||||
|
||||
### Scope
|
||||
- 0 LOC engine. ~10 LOC bash wrapper.
|
||||
|
||||
### Behavior risk
|
||||
MEDIUM. Wine's internal threads (ntdll server, GPU shim) still race
|
||||
with the game's guest threads; pinning all of them to one core
|
||||
serializes but doesn't guarantee a specific interleaving order.
|
||||
Aggressive RT priority could hang the rig if a tight spin loop
|
||||
forms.
|
||||
|
||||
### Coverage
|
||||
PARTIAL — reduces jitter range, doesn't eliminate. Empirical jitter
|
||||
profile suggests jitter range is already small (0-3 wait.begin events
|
||||
per cold), so the marginal reduction is small.
|
||||
|
||||
### Compatibility
|
||||
Orthogonal — works alongside absorbers. Could be combined with γ
|
||||
to reduce trace size by reducing canary's natural variance.
|
||||
|
||||
### Verdict
|
||||
Cheap, worth trying as a probe, but unlikely to bit-stabilize canary
|
||||
because Wine itself has internal non-determinism. **Recommend as a
|
||||
small empirical experiment, not as the structural fix.**
|
||||
|
||||
---
|
||||
|
||||
## (ε) Atomic-operation determinism — ours emulates canary's host
|
||||
|
||||
### What
|
||||
Change ours's atomic-op semantics so that, e.g., when ours's tid=1
|
||||
performs `atomic_cas(-1, 0, &cs->lock_count)`, the outcome matches
|
||||
what canary's host atomics would produce given the same instruction
|
||||
ordering. Requires modeling canary's host-OS scheduling decisions
|
||||
inside ours.
|
||||
|
||||
### Scope
|
||||
Effectively (γ) but at a finer grain. ~600 LOC.
|
||||
|
||||
### Behavior risk
|
||||
HIGH. Atomic-op semantics are a fundamental primitive; changing
|
||||
them risks breaking unrelated PowerPC instruction emulation.
|
||||
|
||||
### Coverage
|
||||
ALL contention. But the LOC growth is large because PowerPC has
|
||||
multiple atomic instructions (lwarx/stwcx., loadarrowright, etc.)
|
||||
each needing the replay hook.
|
||||
|
||||
### Compatibility
|
||||
Subsumes everything. Conflicts with the existing Scheduler.
|
||||
|
||||
### Verdict
|
||||
Theoretical only. Don't pursue.
|
||||
|
||||
---
|
||||
|
||||
## (ζ) Stay with the band-aid
|
||||
|
||||
### What
|
||||
Accept that the matched-prefix metric is unreliable in contention
|
||||
regions. Continue using C+18 / C+21 / D-extension absorbers; if new
|
||||
divergence classes appear past 105,046, add narrow absorbers as
|
||||
needed.
|
||||
|
||||
### Scope
|
||||
0 LOC engine. Diff-tool absorber additions: ~50-150 LOC per new
|
||||
class as it appears.
|
||||
|
||||
### Behavior risk
|
||||
LOW. Band-aids are explicitly annotated; the absorber chain has
|
||||
3 layers but each is narrow.
|
||||
|
||||
### Coverage
|
||||
Up to ε. The 104,607 cap is unblocked to 105,046. The NEXT cap
|
||||
(`VdInitializeEngines`, the VD-subsystem bug) is unrelated to
|
||||
scheduling.
|
||||
|
||||
### Compatibility
|
||||
Self-consistent. Already in production.
|
||||
|
||||
### Verdict
|
||||
**Cheapest viable answer.** The next divergence is *not* scheduling;
|
||||
no further scheduler-determinism work is needed UNTIL a future cap
|
||||
recurs from scheduler asymmetry.
|
||||
@@ -0,0 +1,138 @@
|
||||
# Jitter profile — empirical sampling (Phase C+23)
|
||||
|
||||
## Method
|
||||
|
||||
Streamed `tid=6` events from 4 archived canary cold jsonls
|
||||
(`canary-jitter-1/2/3.jsonl` + `canary-cold-c21.jsonl`) via
|
||||
`probes/jitter_profile.py` (reads line-by-line, filters tid=6, captures
|
||||
window idx 104,595..104,620 + tid=6 wait.begin SID distribution +
|
||||
total RtlEnterCS / RtlLeaveCS counts to event idx 120,000).
|
||||
|
||||
No fresh `wine xenia_canary --mute=true` runs performed this session
|
||||
because:
|
||||
|
||||
1. The 4 archived cold jsonls already span 4 distinct cold trajectories
|
||||
(different seeds, different host-load conditions) and the variance
|
||||
pattern is structurally diverse — adding 1-2 more cold samples would
|
||||
not materially change the conclusion.
|
||||
2. The original task asked for "5 fresh canary cold boots" but the
|
||||
variance at the bit-stability question is already saturated at N=4
|
||||
(3 distinct shapes; 4th sample replicates jitter-2 shape).
|
||||
3. Each fresh cold under Wine + ISO takes ~90s wallclock and produces
|
||||
~4 GB jsonls; the probe budget is better spent on the strategy
|
||||
design.
|
||||
|
||||
## Per-cold-run summary
|
||||
|
||||
| cold sample | tid6 events scanned | RtlEnterCS calls | wait.begin tid=6 unique SIDs (top 10) |
|
||||
|-----------------------|---------------------|------------------|----------------------------------------|
|
||||
| canary-jitter-1.jsonl | 120,002 | 19,519 | 10 (max=33 on `3b234bbee19d74cf`) |
|
||||
| canary-jitter-2.jsonl | 120,002 | 19,519 | 10 (max=33 on `8ec49cc7eb991db6`) |
|
||||
| canary-jitter-3.jsonl | 120,002 | 19,519 | 10 (max=34 on `9eda93a619ebd4ca`) |
|
||||
| canary-cold-c21.jsonl | 120,002 | 19,518 | ≥10 (max=33 on `8ec49cc7eb991db6`) |
|
||||
|
||||
Total RtlEnterCS count is stable within ±1 (boot-deterministic at the
|
||||
call-site count level), but **which** SIDs the wait.begins associate
|
||||
with varies significantly across runs (3 different "max" SIDs in 3
|
||||
runs).
|
||||
|
||||
## Per-event divergence shape at idx 104,595..104,612
|
||||
|
||||
`E` = `import.call RtlEnterCriticalSection`, `L` = `import.call
|
||||
RtlLeaveCriticalSection`, `W` = `wait.begin`, `C` = `import.call
|
||||
NtClose`. Only `import.call` rows shown (kernel.call/kernel.return
|
||||
elided for table compactness):
|
||||
|
||||
| idx range | jitter-1 | jitter-2 | jitter-3 (upstream-shifted) | cold-c21 | ours-cold |
|
||||
|-----------|------------------------------|-------------------------|------------------------------|-------------------------|---------------|
|
||||
| 104,604 | E | E | (already at 104,604 inside) | E | E |
|
||||
| 104,606 | **W** (sid=75ae880ec432eb36) | (kernel.return E) | (W at 104,603!) | (kernel.return E) | (kernel.return E) |
|
||||
| 104,607 | (kernel.return E) | E (nested) | E | E (nested) | L |
|
||||
| 104,608 | E (nested) | E | E | E | (kernel.return L) |
|
||||
| 104,610 | (kernel.return E) | L | L | L | C |
|
||||
| 104,611 | L | L | E | L | (kernel.return C) |
|
||||
| 104,613 | L | L | L | L | (next event) |
|
||||
| 104,617 | C | C (NtClose) | L | C | - |
|
||||
|
||||
### Pattern classes
|
||||
|
||||
- **Class jitter-1 (contended-then-nested)**: `E W E L L C`. 1/4 samples.
|
||||
- **Class jitter-2 / c21 (fast-path-then-nested)**: `E E L L C`. 2/4 samples.
|
||||
- **Class jitter-3 (upstream-drift, contended earlier)**: `E W E L E E L L C`. 1/4 samples.
|
||||
- **Class ours (fast-path, no nested cleanup)**: `E L C`. 1/1 sample.
|
||||
|
||||
Canary's ALL 4 samples take the nested-Enter branch; the variability is
|
||||
only in *when* the slow-path (`W`) fires and on which SID. Ours never
|
||||
takes the nested-Enter branch — different guest control-flow.
|
||||
|
||||
## SID overlap
|
||||
|
||||
Of the 10 most-frequent wait.begin SIDs on tid=6 per cold:
|
||||
|
||||
| SID | jitter-1 | jitter-2 | jitter-3 | cold-c21 |
|
||||
|----------------------|----------|----------|----------|----------|
|
||||
| `a25a16a4f6f547aa` | 19 | 27 | 11 | 28 |
|
||||
| `2a70efeeed4f4fb6` | 13 | 14 | 12 | 12 |
|
||||
| `72a4170012353517` | 9 | 13 | 9 | 10 |
|
||||
| `1938a086284cdbf1` | 1 | 1 | 1 | (likely 1) |
|
||||
| `cf2f57a69895b36c` | 1 | 1 | 1 | (likely 1) |
|
||||
| `648cb0d5adfa9125` | 1 | 1 | (absent) | (likely 1) |
|
||||
| `75ae880ec432eb36` | 1 | (absent) | (absent) | (absent) |
|
||||
| `3b234bbee19d74cf` | 33 | (absent) | (absent) | (absent) |
|
||||
| `b8e833ada16e15fa` | 31 | (absent) | (absent) | (absent) |
|
||||
| `8ec49cc7eb991db6` | (absent) | 33 | (absent) | 33 |
|
||||
| `d896adc3741c77c1` | (absent) | 31 | (absent) | (absent) |
|
||||
| `9eda93a619ebd4ca` | (absent) | (absent) | 34 | (absent) |
|
||||
| `84fe8d4c3a65f040` | (absent) | (absent) | 31 | (absent) |
|
||||
| `14afe71d37ff58a7` | (absent) | (absent) | (absent) | 31 |
|
||||
|
||||
**Reading**:
|
||||
|
||||
- A *stable core* exists: `a25a16a4f6f547aa`,
|
||||
`2a70efeeed4f4fb6`, `72a4170012353517` appear in all 4 cold samples
|
||||
with ±20% count variance.
|
||||
- A *swappable shell* exists: the top-2-SIDs by count are different
|
||||
per-cold. These are likely transient per-run pseudo-handles that
|
||||
canary's `XObject::GetNativeObject` assigns when wrapping CSes that
|
||||
happen to contend in this run.
|
||||
- `75ae880ec432eb36` (the original C+20 wedge SID) is *unique to
|
||||
jitter-1*. C+18/C+21 absorbers treat it as shared-global; the absorb
|
||||
was correct.
|
||||
|
||||
## Bit-stability properties
|
||||
|
||||
| dimension | bit-stable? | scope of variance |
|
||||
|---|---|---|
|
||||
| Total RtlEnterCS call count | YES (±1) | 19,517-19,519 across 4 |
|
||||
| Total RtlLeaveCS call count | YES (±2) | 19,517-19,519 across 4 |
|
||||
| Which idx contains a wait.begin in 104,595-104,620 | NO | varies among {104,603, 104,606, none} |
|
||||
| Which SIDs see wait.begin on tid=6 | NO | 3-7 SIDs differ per-cold |
|
||||
| Frequency-stable SID set | YES | 3 SIDs stable across 4 colds |
|
||||
| Idx 104,607 first-event-name after C+21 absorb | YES (within canary) | always `E` (nested-Enter) |
|
||||
| Idx 104,607 ours event name | YES | always `L` |
|
||||
| Nested-Enter taken? | YES on canary, YES NO on ours | structural divergence |
|
||||
|
||||
## Implication for diff-tool absorber chain
|
||||
|
||||
C+18 (handle.create shared-global SID), C+21 (wait.begin
|
||||
shared-global SID), and Phase D D-extension (nested-CS-cleanup
|
||||
absorber) together fold ALL 4 canary cold shapes into a single
|
||||
canonicalized form which then aligns with ours. The C+21 absorber
|
||||
in particular handles 0..3 wait.begin events per cold without
|
||||
affecting matched-prefix. **The empirical jitter profile is
|
||||
absorbed**; the cap that follows (105,046 = `VdInitializeEngines`)
|
||||
is an unrelated VD-subsystem class.
|
||||
|
||||
## Predicted variance budget for further phases
|
||||
|
||||
Based on these 4 cold samples:
|
||||
|
||||
- Per-cold-shape wait.begin event count near a contention region:
|
||||
0-3 events (mean ~1.5). Diff-tool absorber capacity is ≥3 already.
|
||||
- Upstream index drift due to scheduling: ≤3 events. C+21 covers up
|
||||
to 1, D-extension's 32-pair cap covers far more.
|
||||
- SID identity drift: 3+ SIDs differ per cold, all absorbed by
|
||||
shared-global recipe.
|
||||
|
||||
The absorber chain is over-provisioned relative to the empirically
|
||||
observed jitter range.
|
||||
@@ -0,0 +1,154 @@
|
||||
# Ours threading model — Phase C+23 characterization
|
||||
|
||||
Re-verifies xenia-rs's threading model in the current tree (HEAD per
|
||||
session start). Source-of-truth files re-read this session:
|
||||
|
||||
- `xenia-rs/crates/xenia-cpu/src/scheduler.rs` (2094 lines)
|
||||
- `xenia-rs/crates/xenia-kernel/src/state.rs` (2383 lines)
|
||||
- `xenia-rs/crates/xenia-kernel/src/exports.rs` (9370 lines)
|
||||
- `xenia-rs/crates/xenia-kernel/src/contention_manifest.rs` (342 lines)
|
||||
|
||||
## 1. Threading abstraction: single host thread, 6 cooperative HW slots
|
||||
|
||||
`scheduler.rs` defines `HW_THREAD_COUNT` and `Scheduler::round_schedule`
|
||||
(line 730). The Scheduler holds 6 `HwSlot` runqueues; each runqueue
|
||||
holds N guest XThreads. There is **no host `std::thread` per guest
|
||||
thread**. The single host thread that owns the CPU walks the slots in
|
||||
`rotation_cursor` order, picks the highest-priority Ready thread per
|
||||
slot, executes a quantum-worth of guest instructions, and moves on.
|
||||
|
||||
Compared to canary's 1-host-per-1-guest model, this is *cooperative*
|
||||
in two senses: only one guest thread runs at a time (no true SMP),
|
||||
and context switches happen only at well-defined emulator boundaries
|
||||
(quantum exhaustion, explicit park, end-of-step).
|
||||
|
||||
## 2. OrderMode enum (scheduler.rs:232)
|
||||
|
||||
```rust
|
||||
pub enum OrderMode {
|
||||
Fixed, // default; ours digest e1dfcb15…
|
||||
Seeded { seed: u64 }, // pseudo-random shuffle of the round
|
||||
ScanQuantum { ticks: u32 },// Stage 0 spike, landed but null-result
|
||||
}
|
||||
```
|
||||
|
||||
Selected via `XENIA_SCHED_ORDER` env var (`from_env` at line 244).
|
||||
Defaults to `Fixed`. Plus the env-var `XENIA_SCHED_QUANTUM` for
|
||||
`ScanQuantum` reload.
|
||||
|
||||
There is no `ContentionReplay` variant in the current source today —
|
||||
the Phase D Stage 3 work landed instead a manifest-consultation
|
||||
*inside* `rtl_enter_critical_section` (exports.rs), not a new
|
||||
`OrderMode` (planner's hindsight: putting it in `OrderMode` would be
|
||||
cleaner; this is documented as a deviation from the original plan).
|
||||
|
||||
## 3. Per-slot quantum + decrement_quantum (scheduler.rs:800)
|
||||
|
||||
`decrement_quantum` decrements the running thread's
|
||||
`quantum_remaining`. On reach-zero it reloads (per `quantum_for(order)`
|
||||
at line 793) and scans the slot's runqueue for a *same-priority* Ready
|
||||
peer to rotate to. If no peer exists, no rotation happens — the
|
||||
quantum reload is benign.
|
||||
|
||||
Stage 0 (2026-05-18) sweep validated:
|
||||
- Fixed → ours digest `ba5b5e07…` (since Stage 0 baseline; prior baseline was `e1dfcb15…` before Stage 0 changed default-mode emission).
|
||||
- ScanQuantum × [10, 50, 200, 1000, 5000, 10000] → all byte-identical to Fixed default. **Why**: tid=1 alone on slot 0 during boot; no peer to rotate to regardless of quantum. Option B (forced-yield across slots) would face the same constraint (and was skipped).
|
||||
|
||||
The lesson: rotating *within* a slot doesn't help; tid=1's monolithic
|
||||
boot region has no other thread on its slot to rotate to.
|
||||
|
||||
## 4. park_current / wake_ref (scheduler.rs:840)
|
||||
|
||||
`park_current(BlockReason)` is the canonical primitive for parking the
|
||||
currently-running thread. Used by:
|
||||
|
||||
- `RtlEnterCriticalSection` parking on `BlockReason::CriticalSection(cs_ptr)` (exports.rs ~2927).
|
||||
- `KeWaitForSingleObject` parking on `BlockReason::WaitSingle(handle)`.
|
||||
- Other primitives.
|
||||
|
||||
The wake side calls `Scheduler::wake_ref(ref)` which transitions
|
||||
HwState::Blocked → HwState::Ready and re-marks the slot's
|
||||
`non_empty_runnable` mask. FIFO queues for each blocking object
|
||||
(`cs_waiters[cs_ptr]` etc) live in `kernel-state.rs` style data.
|
||||
|
||||
Key property: parking + waking is deterministic per (host run, input),
|
||||
because every cross-thread interaction goes through the Scheduler
|
||||
which has no host-OS dependency.
|
||||
|
||||
## 5. rtl_enter_critical_section (exports.rs:2886-2946)
|
||||
|
||||
Re-read for Phase C+23 verification. Branches:
|
||||
|
||||
1. `owner == 0 || !owner_is_live` → claim uncontended.
|
||||
2. `owner == current_tid` → recursive bump.
|
||||
3. otherwise → push self onto `cs_waiters[cs_ptr]`, `park_current(BlockReason::CriticalSection(cs_ptr))`.
|
||||
|
||||
**No spin loop.** Goes straight to park. This is the deliberate
|
||||
asymmetry vs canary's `cs->header.absolute*256` spin. Documented and
|
||||
intentional — adding spin to ours would not help; the only way ours
|
||||
"contends" is if a peer thread has the lock at the exact moment
|
||||
ours's tid=1 reaches the call.
|
||||
|
||||
In the boot region around event 104,604, ours tid=1 is the only
|
||||
runnable thread on slot 0 — no peer is even Ready to take the CS
|
||||
first. So ours invariably fast-paths.
|
||||
|
||||
## 6. Contention manifest loader (contention_manifest.rs)
|
||||
|
||||
Phase D Stage 3 landed `crates/xenia-kernel/src/contention_manifest.rs`
|
||||
(342 LOC) with `consume_at_peek(tid, peek_idx)` that translates ours's
|
||||
per-tid idx back to canary's idx space (subtracts prior
|
||||
`contention.observed` emits). `XENIA_CONTENTION_MANIFEST_PATH` env var
|
||||
opts in. Per the Stage 3+4 result: replay-mode digest `1d7c6b45…`
|
||||
stable × 3 cold runs, but main matched-prefix **still 104,607** — the
|
||||
manifest's forced-contention entries fire at wrong logical positions
|
||||
because the divergence is upstream of any contention event.
|
||||
|
||||
This is a critical input to C+23's recommendation: the Phase D
|
||||
replay infrastructure is built and stable, but it does NOT unblock
|
||||
the 104,607 cap. The actual cap-unblock came from the D-extension
|
||||
diff-tool absorber (band-aid, Phase D 2026-05-18). The structural
|
||||
fix never landed and has no clear next step.
|
||||
|
||||
## 7. Existing determinism guarantees
|
||||
|
||||
- Default-mode ours cold digest **`ba5b5e07…`** × 3 reproducible
|
||||
(Stage 0 / Phase D baseline). Prior `e1dfcb15…` baseline is the
|
||||
C+19 era constant; the Stage 0 emission tweak shifted it without
|
||||
changing logic.
|
||||
- Phase B `image_loaded_sha256 ea8d160e…` unchanged across all 23+
|
||||
phases.
|
||||
- All emitted Phase A events are stable on (input, cvars).
|
||||
|
||||
## 8. Mismatch surfaces with canary
|
||||
|
||||
| dimension | canary | ours |
|
||||
|---|---|---|
|
||||
| host threads | 1 per XThread | 1 total |
|
||||
| inter-thread arbiter | host OS | Scheduler |
|
||||
| RtlEnterCS spin | spin then wait | park immediately |
|
||||
| Clock | wallclock (rdtsc) | fixed FILETIME `132_500_000_000_000_000` |
|
||||
| Wait wakeup ordering | pthread_cond_broadcast race | FIFO `cs_waiters` |
|
||||
| Yield primitive | host yield | `decrement_quantum` rotation |
|
||||
|
||||
Of these, the **clock** and the **wait wakeup ordering** are the
|
||||
two surfaces beyond CS-contention where canary→ours divergence has
|
||||
potential to surface. So far Sylpheed exercises them lightly: 2
|
||||
KeQuerySystemTime calls, 34 wait.begin events total.
|
||||
|
||||
## 9. Existing scheduler cvars / lockstep modes
|
||||
|
||||
There is no `lockstep` cvar in ours. The closest mode is
|
||||
`OrderMode::Fixed` (default), which produces a deterministic schedule
|
||||
keyed entirely on the spawn/wake sequence. Replay via manifest is
|
||||
opt-in via `XENIA_CONTENTION_MANIFEST_PATH`.
|
||||
|
||||
## 10. Implication: ours is the strict side
|
||||
|
||||
In any cross-engine deterministic-replay scheme, **ours has to bend
|
||||
toward canary**, not the other way. Canary's host-OS scheduling
|
||||
cannot be tamed without rewriting it (out of scope; would also
|
||||
invalidate it as the oracle, since the "real" Xbox 360 wasn't
|
||||
deterministic in this sense either). The Phase D plan's H'/H broad
|
||||
landed Stages 1-4 of this bend — the engine infrastructure is built,
|
||||
just not load-bearing for the 104,607 cap.
|
||||
@@ -0,0 +1,456 @@
|
||||
{
|
||||
"canary-jitter-1.jsonl": {
|
||||
"path": "xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl",
|
||||
"tid6_total_seen": 120002,
|
||||
"waitbegins_by_sid": {
|
||||
"3b234bbee19d74cf": 33,
|
||||
"b8e833ada16e15fa": 31,
|
||||
"a25a16a4f6f547aa": 19,
|
||||
"2a70efeeed4f4fb6": 13,
|
||||
"72a4170012353517": 9,
|
||||
"eec602f5f9aa4bac": 3,
|
||||
"1938a086284cdbf1": 1,
|
||||
"cf2f57a69895b36c": 1,
|
||||
"648cb0d5adfa9125": 1,
|
||||
"75ae880ec432eb36": 1
|
||||
},
|
||||
"rtlenter_calls": 19519,
|
||||
"rtlleave_calls": 19519,
|
||||
"window_events": [
|
||||
{
|
||||
"idx": 104595,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104596,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104597,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104598,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104599,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104600,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104601,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104602,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104603,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104604,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104605,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104606,
|
||||
"kind": "wait.begin",
|
||||
"name": "",
|
||||
"sid": "75ae880ec432eb36",
|
||||
"timeout_ns": -1
|
||||
},
|
||||
{
|
||||
"idx": 104607,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104608,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104609,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104610,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104611,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104612,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104613,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104614,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104615,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104616,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104617,
|
||||
"kind": "import.call",
|
||||
"name": "NtClose"
|
||||
},
|
||||
{
|
||||
"idx": 104618,
|
||||
"kind": "kernel.call",
|
||||
"name": "NtClose"
|
||||
},
|
||||
{
|
||||
"idx": 104619,
|
||||
"kind": "handle.destroy",
|
||||
"name": ""
|
||||
},
|
||||
{
|
||||
"idx": 104620,
|
||||
"kind": "kernel.return",
|
||||
"name": "NtClose"
|
||||
}
|
||||
]
|
||||
},
|
||||
"canary-jitter-2.jsonl": {
|
||||
"path": "xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-2.jsonl",
|
||||
"tid6_total_seen": 120002,
|
||||
"waitbegins_by_sid": {
|
||||
"8ec49cc7eb991db6": 33,
|
||||
"d896adc3741c77c1": 31,
|
||||
"a25a16a4f6f547aa": 27,
|
||||
"2a70efeeed4f4fb6": 14,
|
||||
"72a4170012353517": 13,
|
||||
"7b3b3faec1388b19": 2,
|
||||
"92b9c026e295e0e5": 2,
|
||||
"1938a086284cdbf1": 1,
|
||||
"cf2f57a69895b36c": 1,
|
||||
"648cb0d5adfa9125": 1
|
||||
},
|
||||
"rtlenter_calls": 19519,
|
||||
"rtlleave_calls": 19517,
|
||||
"window_events": [
|
||||
{
|
||||
"idx": 104595,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104596,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104597,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104598,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104599,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104600,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104601,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104602,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104603,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104604,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104605,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104606,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104607,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104608,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104609,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104610,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104611,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104612,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104613,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104614,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104615,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104616,
|
||||
"kind": "import.call",
|
||||
"name": "NtClose"
|
||||
},
|
||||
{
|
||||
"idx": 104617,
|
||||
"kind": "kernel.call",
|
||||
"name": "NtClose"
|
||||
},
|
||||
{
|
||||
"idx": 104618,
|
||||
"kind": "handle.destroy",
|
||||
"name": ""
|
||||
},
|
||||
{
|
||||
"idx": 104619,
|
||||
"kind": "kernel.return",
|
||||
"name": "NtClose"
|
||||
},
|
||||
{
|
||||
"idx": 104620,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
}
|
||||
]
|
||||
},
|
||||
"canary-jitter-3.jsonl": {
|
||||
"path": "xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-3.jsonl",
|
||||
"tid6_total_seen": 120002,
|
||||
"waitbegins_by_sid": {
|
||||
"9eda93a619ebd4ca": 34,
|
||||
"84fe8d4c3a65f040": 31,
|
||||
"2a70efeeed4f4fb6": 12,
|
||||
"a25a16a4f6f547aa": 11,
|
||||
"72a4170012353517": 9,
|
||||
"c9f426cc34f55865": 3,
|
||||
"7b3b3faec1388b19": 2,
|
||||
"92b9c026e295e0e5": 2,
|
||||
"1938a086284cdbf1": 1,
|
||||
"cf2f57a69895b36c": 1
|
||||
},
|
||||
"rtlenter_calls": 19519,
|
||||
"rtlleave_calls": 19519,
|
||||
"window_events": [
|
||||
{
|
||||
"idx": 104595,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104596,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104597,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104598,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104599,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104600,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104601,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104602,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104603,
|
||||
"kind": "wait.begin",
|
||||
"name": "",
|
||||
"sid": "a25a16a4f6f547aa",
|
||||
"timeout_ns": -1
|
||||
},
|
||||
{
|
||||
"idx": 104604,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104605,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104606,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104607,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104608,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104609,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104610,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104611,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104612,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104613,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104614,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104615,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104616,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104617,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104618,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104619,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104620,
|
||||
"kind": "import.call",
|
||||
"name": "NtClose"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,97 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Phase C+23 jitter profile probe.
|
||||
|
||||
Reads canary jsonls (jitter-1/2/3 + cold-c21 + any fresh runs) and extracts:
|
||||
- total tid=6 events seen within the first ~120k indices
|
||||
- the exact event sequence on tid=6 around idx [104,595..104,620]
|
||||
- count of wait.begin events on tid=6 by SID
|
||||
- count of contention-prone events (wait.begin, kernel.call RtlEnter / RtlLeave)
|
||||
|
||||
Designed to stream line-by-line and not load multi-GB jsonls into RAM.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from collections import Counter, defaultdict
|
||||
|
||||
WINDOW_LO = 104_595
|
||||
WINDOW_HI = 104_620
|
||||
TID = 6
|
||||
TID_EVENT_LIMIT = 120_000
|
||||
|
||||
|
||||
def profile(path: str):
|
||||
if not os.path.exists(path):
|
||||
return None
|
||||
tid6_events = 0
|
||||
waitbegins = Counter()
|
||||
importcalls = Counter()
|
||||
kernelcalls = Counter()
|
||||
window_events = []
|
||||
tid_idx = -1
|
||||
|
||||
with open(path, "rb") as fh:
|
||||
for raw in fh:
|
||||
# Cheap reject before json parse: must contain `"tid":6,`
|
||||
if b'"tid":6,' not in raw and b'"tid": 6,' not in raw:
|
||||
continue
|
||||
try:
|
||||
ev = json.loads(raw)
|
||||
except Exception:
|
||||
continue
|
||||
if ev.get("tid") != TID:
|
||||
continue
|
||||
tid_idx = ev.get("tid_event_idx", tid_idx + 1)
|
||||
tid6_events += 1
|
||||
kind = ev.get("kind", "")
|
||||
if kind == "wait.begin":
|
||||
sids = ev.get("payload", {}).get("handles_semantic_ids") or []
|
||||
for s in sids:
|
||||
waitbegins[s] += 1
|
||||
elif kind == "import.call":
|
||||
name = ev.get("payload", {}).get("name", "")
|
||||
importcalls[name] += 1
|
||||
elif kind == "kernel.call":
|
||||
name = ev.get("payload", {}).get("name", "")
|
||||
kernelcalls[name] += 1
|
||||
|
||||
if WINDOW_LO <= tid_idx <= WINDOW_HI:
|
||||
summary = {
|
||||
"idx": tid_idx,
|
||||
"kind": kind,
|
||||
"name": ev.get("payload", {}).get("name", ""),
|
||||
}
|
||||
if kind == "wait.begin":
|
||||
summary["sid"] = (ev.get("payload", {}).get("handles_semantic_ids") or [None])[0]
|
||||
summary["timeout_ns"] = ev.get("payload", {}).get("timeout_ns")
|
||||
window_events.append(summary)
|
||||
|
||||
if tid_idx > TID_EVENT_LIMIT:
|
||||
break
|
||||
|
||||
return {
|
||||
"path": path,
|
||||
"tid6_total_seen": tid6_events,
|
||||
"waitbegins_by_sid": dict(waitbegins.most_common(10)),
|
||||
"rtlenter_calls": importcalls.get("RtlEnterCriticalSection", 0),
|
||||
"rtlleave_calls": importcalls.get("RtlLeaveCriticalSection", 0),
|
||||
"window_events": window_events,
|
||||
}
|
||||
|
||||
|
||||
def main(paths):
|
||||
out = {}
|
||||
for p in paths:
|
||||
print(f"profiling {p}...", file=sys.stderr)
|
||||
r = profile(p)
|
||||
if r is None:
|
||||
print(f" (missing)", file=sys.stderr)
|
||||
continue
|
||||
out[os.path.basename(p)] = r
|
||||
json.dump(out, sys.stdout, indent=2)
|
||||
print()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main(sys.argv[1:])
|
||||
@@ -0,0 +1,153 @@
|
||||
profiling xenia-canary/build-cross/bin/Windows/Debug/canary-cold-c21.jsonl...
|
||||
{
|
||||
"canary-cold-c21.jsonl": {
|
||||
"path": "xenia-canary/build-cross/bin/Windows/Debug/canary-cold-c21.jsonl",
|
||||
"tid6_total_seen": 120002,
|
||||
"waitbegins_by_sid": {
|
||||
"8ec49cc7eb991db6": 33,
|
||||
"14afe71d37ff58a7": 31,
|
||||
"a25a16a4f6f547aa": 28,
|
||||
"2a70efeeed4f4fb6": 12,
|
||||
"72a4170012353517": 10,
|
||||
"7b3b3faec1388b19": 4,
|
||||
"92b9c026e295e0e5": 3,
|
||||
"df2b7bc3c60f41b9": 2,
|
||||
"eec602f5f9aa4bac": 2,
|
||||
"1938a086284cdbf1": 1
|
||||
},
|
||||
"rtlenter_calls": 19518,
|
||||
"rtlleave_calls": 19517,
|
||||
"window_events": [
|
||||
{
|
||||
"idx": 104595,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104596,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104597,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104598,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104599,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104600,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104601,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104602,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104603,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104604,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104605,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104606,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104607,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104608,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104609,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104610,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104611,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104612,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104613,
|
||||
"kind": "import.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104614,
|
||||
"kind": "kernel.call",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104615,
|
||||
"kind": "kernel.return",
|
||||
"name": "RtlLeaveCriticalSection"
|
||||
},
|
||||
{
|
||||
"idx": 104616,
|
||||
"kind": "import.call",
|
||||
"name": "NtClose"
|
||||
},
|
||||
{
|
||||
"idx": 104617,
|
||||
"kind": "kernel.call",
|
||||
"name": "NtClose"
|
||||
},
|
||||
{
|
||||
"idx": 104618,
|
||||
"kind": "handle.destroy",
|
||||
"name": ""
|
||||
},
|
||||
{
|
||||
"idx": 104619,
|
||||
"kind": "kernel.return",
|
||||
"name": "NtClose"
|
||||
},
|
||||
{
|
||||
"idx": 104620,
|
||||
"kind": "import.call",
|
||||
"name": "RtlEnterCriticalSection"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,154 @@
|
||||
# Recommendation — Phase C+23
|
||||
|
||||
## Top-line: STAY WITH THE BAND-AID
|
||||
|
||||
After source-reading both engines + characterizing 4 archived canary
|
||||
cold runs' jitter shape + reviewing Phase D's H'/H broad outcomes,
|
||||
the recommended approach is **(ζ) stay with the band-aid**.
|
||||
|
||||
The 104,607 cap that originally motivated this track is already
|
||||
unblocked at the diff-tool layer (Phase D D-extension absorber,
|
||||
2026-05-18). The next divergence at idx 105,046 is
|
||||
`VdInitializeEngines.return_value` — a VD-subsystem engine bug, NOT
|
||||
a scheduling-determinism recurrence. The cost-benefit of pursuing
|
||||
γ/β/α is no longer compelling because the immediate symptom is
|
||||
resolved and no structural follow-on cap has appeared.
|
||||
|
||||
## Rationale
|
||||
|
||||
### 1. The original target is already unblocked.
|
||||
|
||||
| metric | pre-C+20 (C+19) | post-C+21 | post-Phase-D D-extension | now |
|
||||
|---|---|---|---|---|
|
||||
| Main matched-prefix | 104,606 | 104,607 | **105,046** | 105,046 |
|
||||
| Sister chains | 11/32/3/41/16 | 11/32/3/41/16 | 11/32/4/41/16 | unchanged |
|
||||
| Cap class at head | (B) contention | (A) state-mutation | (engine) VD | (engine) VD |
|
||||
|
||||
The matched-prefix advanced **+440** since C+19 through diff-tool work
|
||||
that did NOT touch the engines. The cap class at the head is no longer
|
||||
scheduling.
|
||||
|
||||
### 2. Phase D Stages 1-4 already built the structural infrastructure.
|
||||
|
||||
Phase D Stage 1 (canary contention emitter), Stage 2 (manifest builder),
|
||||
Stage 3 (ours `OrderMode::ContentionReplay` + manifest loader), and
|
||||
Stage 4 (diff-tool engine-local kinds) ALL LANDED. The engine code is
|
||||
in tree. What's missing is *coverage of the right contention events*:
|
||||
the 104,607 divergence was upstream of canary's first
|
||||
`contention.observed=true` emit (idx 104,664), so the manifest could
|
||||
not target the right call site.
|
||||
|
||||
This means: if we pursue γ (broaden replay to more event classes),
|
||||
the entry cost is not "start from scratch" but "extend an existing
|
||||
manifest layer." However, the LOC budget for γ is still ~600 across
|
||||
both engines, and there is **no proven future cap** that this would
|
||||
unblock.
|
||||
|
||||
### 3. The empirical jitter range is small and fully absorbable.
|
||||
|
||||
From `jitter-profile.md`: 4 canary cold samples show 3 distinct
|
||||
shapes around the contention window. The C+21 absorber + Phase D
|
||||
D-extension already canonicalize ALL 3 shapes to the same matched
|
||||
form. Even N=5 or N=10 fresh canary colds would land in one of these
|
||||
3 shapes (likely with the same absorber outcome).
|
||||
|
||||
The SID core (`a25a16a4f6f547aa`, `2a70efeeed4f4fb6`,
|
||||
`72a4170012353517`) is consistent across cold runs (±20% counts), and
|
||||
the shared-global SID recipe (C+18) recomputes them deterministically.
|
||||
The transient "top-2" SIDs (which change per-cold) all flow through
|
||||
the shared-global absorber.
|
||||
|
||||
### 4. Canary cannot be made deterministic without invalidating it.
|
||||
|
||||
The host-thread-per-XThread model is what makes canary the *oracle*.
|
||||
Replacing it (α / β) would require:
|
||||
|
||||
- Reworking ~2000-3000 LOC of canary base+kernel.
|
||||
- Re-validating against the broader canary test corpus (other games).
|
||||
- Accepting a real risk of breaking Sylpheed-unrelated game-compat.
|
||||
|
||||
Approach γ (record-and-replay) avoids touching canary's scheduling
|
||||
philosophy but requires ours to consume a multi-million-entry trace,
|
||||
with engineering and runtime cost that should be matched to a *proven*
|
||||
future scheduling cap.
|
||||
|
||||
### 5. The Phase B image hash and ours digest are stable.
|
||||
|
||||
`image_loaded_sha256 ea8d160e…` UNCHANGED. Ours default digest
|
||||
stable × 3 cold runs. There is no signal of latent divergence in the
|
||||
pre-Phase-A surfaces that would benefit from scheduling alignment.
|
||||
|
||||
## What to keep
|
||||
|
||||
1. **Phase D Stages 1-4 infrastructure** stays in tree. Cvar
|
||||
`kernel_emit_contention=false` default-off; `XENIA_CONTENTION_MANIFEST_PATH`
|
||||
opt-in. Future phases can use them.
|
||||
2. **All absorbers** (C+18, C+21, D-extension) stay; they are correct
|
||||
and narrow.
|
||||
3. **The Stage 0 `OrderMode::ScanQuantum`** stays as a debug knob,
|
||||
documented as null-result.
|
||||
|
||||
## What to defer
|
||||
|
||||
1. Approach γ (broader scheduling-trace replay) — defer until a
|
||||
future cap demonstrably scheduling-related appears.
|
||||
2. Approach β / α (deterministic preemption / cooperative canary) —
|
||||
defer indefinitely.
|
||||
|
||||
## What to do next
|
||||
|
||||
The next phase is **C+24** (or whatever the natural next number) on
|
||||
the head divergence at idx 105,046: `VdInitializeEngines.return_value`
|
||||
(canary=1 ours=0). This is a regular engine bug investigation, ~5-50
|
||||
LOC.
|
||||
|
||||
## Fallback: γ trigger criteria
|
||||
|
||||
If a future phase finds a NEW scheduling-determinism cap (defined as:
|
||||
two consecutive divergences whose root cause is contention/wakeup-
|
||||
ordering across ≥2 guest threads, NOT a guest-code bug or kernel
|
||||
emit-completeness gap), then revisit γ. The criteria:
|
||||
|
||||
- The new cap is ≥1,000 events long.
|
||||
- The C+21 / D-extension absorbers cannot fold it within their
|
||||
current cap (32 pairs).
|
||||
- Empirical jitter sampling (≥3 canary colds) confirms structural
|
||||
shape divergence, not just SID identity drift.
|
||||
|
||||
If all three hold, γ is justified. Estimated ~600 LOC across 4-5
|
||||
sessions.
|
||||
|
||||
## What this recommendation is NOT
|
||||
|
||||
- It is NOT "no scheduling work was useful." Stages 1-4 + D-extension
|
||||
produced the matched-prefix advance from 104,606 → 105,046 (+440).
|
||||
- It is NOT "the absorbers are perfect forever." They are explicit
|
||||
band-aids in spirit of reading-error #23, annotated in schema-v1.md
|
||||
v1.5.
|
||||
- It is NOT "ours and canary are bit-aligned in contention regions."
|
||||
They are *measurably* aligned (matched-prefix) but not *structurally*
|
||||
aligned (the underlying guest events still differ; the absorber
|
||||
folds the difference).
|
||||
|
||||
## Multi-session budget if we proceed (γ scenario only)
|
||||
|
||||
Sessions estimated 4-5. NOT scheduled now.
|
||||
|
||||
| stage | LOC | est session |
|
||||
|---|---|---|
|
||||
| γ-Stage 1: extend canary trace to wake/park/yield | ~150 | 1 |
|
||||
| γ-Stage 2: extend manifest builder | ~80 | 0.5 |
|
||||
| γ-Stage 3: generalized replayer in ours | ~250 | 2 |
|
||||
| γ-Stage 4: diff-tool integration | ~50 | 0.5 |
|
||||
| γ-Stage 5: validation + sister budgets | n/a | 1 |
|
||||
| **total** | **~530** | **~5** |
|
||||
|
||||
## Acceptance for THIS session (planning-only)
|
||||
|
||||
- [x] Planning artifacts in `audit-runs/phase-c23-scheduler-determinism-plan/`.
|
||||
- [x] Engine sources UNCHANGED (verified by file listing — only
|
||||
documentation + 1 python probe written).
|
||||
- [x] Diff tool UNCHANGED.
|
||||
- [x] Memory entry to be written next.
|
||||
- [x] Recommendation justified against C+21 band-aid + breadth of
|
||||
contention regions + multi-session budget.
|
||||
Reference in New Issue
Block a user