handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,154 @@
|
||||
# Ours threading model — Phase C+23 characterization
|
||||
|
||||
Re-verifies xenia-rs's threading model in the current tree (HEAD per
|
||||
session start). Source-of-truth files re-read this session:
|
||||
|
||||
- `xenia-rs/crates/xenia-cpu/src/scheduler.rs` (2094 lines)
|
||||
- `xenia-rs/crates/xenia-kernel/src/state.rs` (2383 lines)
|
||||
- `xenia-rs/crates/xenia-kernel/src/exports.rs` (9370 lines)
|
||||
- `xenia-rs/crates/xenia-kernel/src/contention_manifest.rs` (342 lines)
|
||||
|
||||
## 1. Threading abstraction: single host thread, 6 cooperative HW slots
|
||||
|
||||
`scheduler.rs` defines `HW_THREAD_COUNT` and `Scheduler::round_schedule`
|
||||
(line 730). The Scheduler holds 6 `HwSlot` runqueues; each runqueue
|
||||
holds N guest XThreads. There is **no host `std::thread` per guest
|
||||
thread**. The single host thread that owns the CPU walks the slots in
|
||||
`rotation_cursor` order, picks the highest-priority Ready thread per
|
||||
slot, executes a quantum-worth of guest instructions, and moves on.
|
||||
|
||||
Compared to canary's 1-host-per-1-guest model, this is *cooperative*
|
||||
in two senses: only one guest thread runs at a time (no true SMP),
|
||||
and context switches happen only at well-defined emulator boundaries
|
||||
(quantum exhaustion, explicit park, end-of-step).
|
||||
|
||||
## 2. OrderMode enum (scheduler.rs:232)
|
||||
|
||||
```rust
|
||||
pub enum OrderMode {
|
||||
Fixed, // default; ours digest e1dfcb15…
|
||||
Seeded { seed: u64 }, // pseudo-random shuffle of the round
|
||||
ScanQuantum { ticks: u32 },// Stage 0 spike, landed but null-result
|
||||
}
|
||||
```
|
||||
|
||||
Selected via `XENIA_SCHED_ORDER` env var (`from_env` at line 244).
|
||||
Defaults to `Fixed`. Plus the env-var `XENIA_SCHED_QUANTUM` for
|
||||
`ScanQuantum` reload.
|
||||
|
||||
There is no `ContentionReplay` variant in the current source today —
|
||||
the Phase D Stage 3 work landed instead a manifest-consultation
|
||||
*inside* `rtl_enter_critical_section` (exports.rs), not a new
|
||||
`OrderMode` (planner's hindsight: putting it in `OrderMode` would be
|
||||
cleaner; this is documented as a deviation from the original plan).
|
||||
|
||||
## 3. Per-slot quantum + decrement_quantum (scheduler.rs:800)
|
||||
|
||||
`decrement_quantum` decrements the running thread's
|
||||
`quantum_remaining`. On reach-zero it reloads (per `quantum_for(order)`
|
||||
at line 793) and scans the slot's runqueue for a *same-priority* Ready
|
||||
peer to rotate to. If no peer exists, no rotation happens — the
|
||||
quantum reload is benign.
|
||||
|
||||
Stage 0 (2026-05-18) sweep validated:
|
||||
- Fixed → ours digest `ba5b5e07…` (since Stage 0 baseline; prior baseline was `e1dfcb15…` before Stage 0 changed default-mode emission).
|
||||
- ScanQuantum × [10, 50, 200, 1000, 5000, 10000] → all byte-identical to Fixed default. **Why**: tid=1 alone on slot 0 during boot; no peer to rotate to regardless of quantum. Option B (forced-yield across slots) would face the same constraint (and was skipped).
|
||||
|
||||
The lesson: rotating *within* a slot doesn't help; tid=1's monolithic
|
||||
boot region has no other thread on its slot to rotate to.
|
||||
|
||||
## 4. park_current / wake_ref (scheduler.rs:840)
|
||||
|
||||
`park_current(BlockReason)` is the canonical primitive for parking the
|
||||
currently-running thread. Used by:
|
||||
|
||||
- `RtlEnterCriticalSection` parking on `BlockReason::CriticalSection(cs_ptr)` (exports.rs ~2927).
|
||||
- `KeWaitForSingleObject` parking on `BlockReason::WaitSingle(handle)`.
|
||||
- Other primitives.
|
||||
|
||||
The wake side calls `Scheduler::wake_ref(ref)` which transitions
|
||||
HwState::Blocked → HwState::Ready and re-marks the slot's
|
||||
`non_empty_runnable` mask. FIFO queues for each blocking object
|
||||
(`cs_waiters[cs_ptr]` etc) live in `kernel-state.rs` style data.
|
||||
|
||||
Key property: parking + waking is deterministic per (host run, input),
|
||||
because every cross-thread interaction goes through the Scheduler
|
||||
which has no host-OS dependency.
|
||||
|
||||
## 5. rtl_enter_critical_section (exports.rs:2886-2946)
|
||||
|
||||
Re-read for Phase C+23 verification. Branches:
|
||||
|
||||
1. `owner == 0 || !owner_is_live` → claim uncontended.
|
||||
2. `owner == current_tid` → recursive bump.
|
||||
3. otherwise → push self onto `cs_waiters[cs_ptr]`, `park_current(BlockReason::CriticalSection(cs_ptr))`.
|
||||
|
||||
**No spin loop.** Goes straight to park. This is the deliberate
|
||||
asymmetry vs canary's `cs->header.absolute*256` spin. Documented and
|
||||
intentional — adding spin to ours would not help; the only way ours
|
||||
"contends" is if a peer thread has the lock at the exact moment
|
||||
ours's tid=1 reaches the call.
|
||||
|
||||
In the boot region around event 104,604, ours tid=1 is the only
|
||||
runnable thread on slot 0 — no peer is even Ready to take the CS
|
||||
first. So ours invariably fast-paths.
|
||||
|
||||
## 6. Contention manifest loader (contention_manifest.rs)
|
||||
|
||||
Phase D Stage 3 landed `crates/xenia-kernel/src/contention_manifest.rs`
|
||||
(342 LOC) with `consume_at_peek(tid, peek_idx)` that translates ours's
|
||||
per-tid idx back to canary's idx space (subtracts prior
|
||||
`contention.observed` emits). `XENIA_CONTENTION_MANIFEST_PATH` env var
|
||||
opts in. Per the Stage 3+4 result: replay-mode digest `1d7c6b45…`
|
||||
stable × 3 cold runs, but main matched-prefix **still 104,607** — the
|
||||
manifest's forced-contention entries fire at wrong logical positions
|
||||
because the divergence is upstream of any contention event.
|
||||
|
||||
This is a critical input to C+23's recommendation: the Phase D
|
||||
replay infrastructure is built and stable, but it does NOT unblock
|
||||
the 104,607 cap. The actual cap-unblock came from the D-extension
|
||||
diff-tool absorber (band-aid, Phase D 2026-05-18). The structural
|
||||
fix never landed and has no clear next step.
|
||||
|
||||
## 7. Existing determinism guarantees
|
||||
|
||||
- Default-mode ours cold digest **`ba5b5e07…`** × 3 reproducible
|
||||
(Stage 0 / Phase D baseline). Prior `e1dfcb15…` baseline is the
|
||||
C+19 era constant; the Stage 0 emission tweak shifted it without
|
||||
changing logic.
|
||||
- Phase B `image_loaded_sha256 ea8d160e…` unchanged across all 23+
|
||||
phases.
|
||||
- All emitted Phase A events are stable on (input, cvars).
|
||||
|
||||
## 8. Mismatch surfaces with canary
|
||||
|
||||
| dimension | canary | ours |
|
||||
|---|---|---|
|
||||
| host threads | 1 per XThread | 1 total |
|
||||
| inter-thread arbiter | host OS | Scheduler |
|
||||
| RtlEnterCS spin | spin then wait | park immediately |
|
||||
| Clock | wallclock (rdtsc) | fixed FILETIME `132_500_000_000_000_000` |
|
||||
| Wait wakeup ordering | pthread_cond_broadcast race | FIFO `cs_waiters` |
|
||||
| Yield primitive | host yield | `decrement_quantum` rotation |
|
||||
|
||||
Of these, the **clock** and the **wait wakeup ordering** are the
|
||||
two surfaces beyond CS-contention where canary→ours divergence has
|
||||
potential to surface. So far Sylpheed exercises them lightly: 2
|
||||
KeQuerySystemTime calls, 34 wait.begin events total.
|
||||
|
||||
## 9. Existing scheduler cvars / lockstep modes
|
||||
|
||||
There is no `lockstep` cvar in ours. The closest mode is
|
||||
`OrderMode::Fixed` (default), which produces a deterministic schedule
|
||||
keyed entirely on the spawn/wake sequence. Replay via manifest is
|
||||
opt-in via `XENIA_CONTENTION_MANIFEST_PATH`.
|
||||
|
||||
## 10. Implication: ours is the strict side
|
||||
|
||||
In any cross-engine deterministic-replay scheme, **ours has to bend
|
||||
toward canary**, not the other way. Canary's host-OS scheduling
|
||||
cannot be tamed without rewriting it (out of scope; would also
|
||||
invalidate it as the oracle, since the "real" Xbox 360 wasn't
|
||||
deterministic in this sense either). The Phase D plan's H'/H broad
|
||||
landed Stages 1-4 of this bend — the engine infrastructure is built,
|
||||
just not load-bearing for the 104,607 cap.
|
||||
Reference in New Issue
Block a user