handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,154 @@
# Ours threading model — Phase C+23 characterization
Re-verifies xenia-rs's threading model in the current tree (HEAD per
session start). Source-of-truth files re-read this session:
- `xenia-rs/crates/xenia-cpu/src/scheduler.rs` (2094 lines)
- `xenia-rs/crates/xenia-kernel/src/state.rs` (2383 lines)
- `xenia-rs/crates/xenia-kernel/src/exports.rs` (9370 lines)
- `xenia-rs/crates/xenia-kernel/src/contention_manifest.rs` (342 lines)
## 1. Threading abstraction: single host thread, 6 cooperative HW slots
`scheduler.rs` defines `HW_THREAD_COUNT` and `Scheduler::round_schedule`
(line 730). The Scheduler holds 6 `HwSlot` runqueues; each runqueue
holds N guest XThreads. There is **no host `std::thread` per guest
thread**. The single host thread that owns the CPU walks the slots in
`rotation_cursor` order, picks the highest-priority Ready thread per
slot, executes a quantum-worth of guest instructions, and moves on.
Compared to canary's 1-host-per-1-guest model, this is *cooperative*
in two senses: only one guest thread runs at a time (no true SMP),
and context switches happen only at well-defined emulator boundaries
(quantum exhaustion, explicit park, end-of-step).
## 2. OrderMode enum (scheduler.rs:232)
```rust
pub enum OrderMode {
Fixed, // default; ours digest e1dfcb15…
Seeded { seed: u64 }, // pseudo-random shuffle of the round
ScanQuantum { ticks: u32 },// Stage 0 spike, landed but null-result
}
```
Selected via `XENIA_SCHED_ORDER` env var (`from_env` at line 244).
Defaults to `Fixed`. Plus the env-var `XENIA_SCHED_QUANTUM` for
`ScanQuantum` reload.
There is no `ContentionReplay` variant in the current source today —
the Phase D Stage 3 work landed instead a manifest-consultation
*inside* `rtl_enter_critical_section` (exports.rs), not a new
`OrderMode` (planner's hindsight: putting it in `OrderMode` would be
cleaner; this is documented as a deviation from the original plan).
## 3. Per-slot quantum + decrement_quantum (scheduler.rs:800)
`decrement_quantum` decrements the running thread's
`quantum_remaining`. On reach-zero it reloads (per `quantum_for(order)`
at line 793) and scans the slot's runqueue for a *same-priority* Ready
peer to rotate to. If no peer exists, no rotation happens — the
quantum reload is benign.
Stage 0 (2026-05-18) sweep validated:
- Fixed → ours digest `ba5b5e07…` (since Stage 0 baseline; prior baseline was `e1dfcb15…` before Stage 0 changed default-mode emission).
- ScanQuantum × [10, 50, 200, 1000, 5000, 10000] → all byte-identical to Fixed default. **Why**: tid=1 alone on slot 0 during boot; no peer to rotate to regardless of quantum. Option B (forced-yield across slots) would face the same constraint (and was skipped).
The lesson: rotating *within* a slot doesn't help; tid=1's monolithic
boot region has no other thread on its slot to rotate to.
## 4. park_current / wake_ref (scheduler.rs:840)
`park_current(BlockReason)` is the canonical primitive for parking the
currently-running thread. Used by:
- `RtlEnterCriticalSection` parking on `BlockReason::CriticalSection(cs_ptr)` (exports.rs ~2927).
- `KeWaitForSingleObject` parking on `BlockReason::WaitSingle(handle)`.
- Other primitives.
The wake side calls `Scheduler::wake_ref(ref)` which transitions
HwState::Blocked → HwState::Ready and re-marks the slot's
`non_empty_runnable` mask. FIFO queues for each blocking object
(`cs_waiters[cs_ptr]` etc) live in `kernel-state.rs` style data.
Key property: parking + waking is deterministic per (host run, input),
because every cross-thread interaction goes through the Scheduler
which has no host-OS dependency.
## 5. rtl_enter_critical_section (exports.rs:2886-2946)
Re-read for Phase C+23 verification. Branches:
1. `owner == 0 || !owner_is_live` → claim uncontended.
2. `owner == current_tid` → recursive bump.
3. otherwise → push self onto `cs_waiters[cs_ptr]`, `park_current(BlockReason::CriticalSection(cs_ptr))`.
**No spin loop.** Goes straight to park. This is the deliberate
asymmetry vs canary's `cs->header.absolute*256` spin. Documented and
intentional — adding spin to ours would not help; the only way ours
"contends" is if a peer thread has the lock at the exact moment
ours's tid=1 reaches the call.
In the boot region around event 104,604, ours tid=1 is the only
runnable thread on slot 0 — no peer is even Ready to take the CS
first. So ours invariably fast-paths.
## 6. Contention manifest loader (contention_manifest.rs)
Phase D Stage 3 landed `crates/xenia-kernel/src/contention_manifest.rs`
(342 LOC) with `consume_at_peek(tid, peek_idx)` that translates ours's
per-tid idx back to canary's idx space (subtracts prior
`contention.observed` emits). `XENIA_CONTENTION_MANIFEST_PATH` env var
opts in. Per the Stage 3+4 result: replay-mode digest `1d7c6b45…`
stable × 3 cold runs, but main matched-prefix **still 104,607** — the
manifest's forced-contention entries fire at wrong logical positions
because the divergence is upstream of any contention event.
This is a critical input to C+23's recommendation: the Phase D
replay infrastructure is built and stable, but it does NOT unblock
the 104,607 cap. The actual cap-unblock came from the D-extension
diff-tool absorber (band-aid, Phase D 2026-05-18). The structural
fix never landed and has no clear next step.
## 7. Existing determinism guarantees
- Default-mode ours cold digest **`ba5b5e07…`** × 3 reproducible
(Stage 0 / Phase D baseline). Prior `e1dfcb15…` baseline is the
C+19 era constant; the Stage 0 emission tweak shifted it without
changing logic.
- Phase B `image_loaded_sha256 ea8d160e…` unchanged across all 23+
phases.
- All emitted Phase A events are stable on (input, cvars).
## 8. Mismatch surfaces with canary
| dimension | canary | ours |
|---|---|---|
| host threads | 1 per XThread | 1 total |
| inter-thread arbiter | host OS | Scheduler |
| RtlEnterCS spin | spin then wait | park immediately |
| Clock | wallclock (rdtsc) | fixed FILETIME `132_500_000_000_000_000` |
| Wait wakeup ordering | pthread_cond_broadcast race | FIFO `cs_waiters` |
| Yield primitive | host yield | `decrement_quantum` rotation |
Of these, the **clock** and the **wait wakeup ordering** are the
two surfaces beyond CS-contention where canary→ours divergence has
potential to surface. So far Sylpheed exercises them lightly: 2
KeQuerySystemTime calls, 34 wait.begin events total.
## 9. Existing scheduler cvars / lockstep modes
There is no `lockstep` cvar in ours. The closest mode is
`OrderMode::Fixed` (default), which produces a deterministic schedule
keyed entirely on the spawn/wake sequence. Replay via manifest is
opt-in via `XENIA_CONTENTION_MANIFEST_PATH`.
## 10. Implication: ours is the strict side
In any cross-engine deterministic-replay scheme, **ours has to bend
toward canary**, not the other way. Canary's host-OS scheduling
cannot be tamed without rewriting it (out of scope; would also
invalidate it as the oracle, since the "real" Xbox 360 wasn't
deterministic in this sense either). The Phase D plan's H'/H broad
landed Stages 1-4 of this bend — the engine infrastructure is built,
just not load-bearing for the 104,607 cap.