# Phase C+24 — post-VdSwap KeAcquireSpinLockAtRaisedIrql divergence **Date:** 2026-05-26 **Mode:** READ-only investigation. NO engine change, NO diff-tool change, NO test change. **Status:** ESCALATED (scheduler-determinism deferred class). ## TL;DR The post-C+23 first divergence at canary `tid=6` ↔ ours `tid=1` idx 105,286 is **NOT a control-flow branch chosen by guest state**. It is a **scheduling-cadence divergence**: ours fires the first VSYNC graphics interrupt callback EARLIER than canary, inserting 6 extra events (`KeAcquireSpinLockAtRaisedIrql` + `KeReleaseSpinLockFromRaisedIrql`, ×3 events each) into ours's tid=1 stream between `VdSwap.return` and `VdGetCurrentDisplayGamma`. Canary fires the SAME interrupt path with the SAME r3=0 (VSYNC) argument, just at a different wall-clock / trajectory point. Per tripstone #5 (escalation when divergence requires scheduler-determinism resolution), C+24 lands NO change. Main matched-prefix stays at 105,286. ## Event-context capture (Step 1) ### Pre-context (5 matched events) Both engines bit-identical: ``` import.call VdGetSystemCommandBuffer kernel.call VdGetSystemCommandBuffer kernel.return VdGetSystemCommandBuffer import.call VdSwap kernel.call VdSwap kernel.return VdSwap ``` ### Divergent event ``` canary[105293]: import.call VdGetCurrentDisplayGamma (ord 441) ours [105286]: import.call KeAcquireSpinLockAtRaisedIrql (ord 77) ``` ### Post-divergence flow (ours) ``` ours[105286-105288]: import/call/return KeAcquireSpinLockAtRaisedIrql ours[105289-105291]: import/call/return KeReleaseSpinLockFromRaisedIrql ours[105292-105294]: import/call/return VdGetCurrentDisplayGamma ← realigns with canary[105293-105295] ``` ### Streams re-converge at offset +6 in ours After the 6 extra ours events, both streams call **the same** import sequence: `VdGetCurrentDisplayGamma → VdSetDisplayMode → VdGetCurrentDisplayInformation → VdQueryVideoFlags (returns 3, per C+23) → VdQueryVideoMode → ...`. So the 6 events are an **inserted block in ours**, not a permanent trajectory split. But **secondary divergences appear ~24 events later**: ours's post-block stream diverges from canary again with `canary: MmFreePhysicalMemory` vs `ours: KeEnterCriticalRegion` at offset +24. This pattern of "absorb-realign-diverge" repeats; a simple 6-event absorber would expose a chain of downstream divergences, each needing separate analysis. ## LR localisation (Step 2) Ran ours with `--branch-probe=0x8284e1ec` (the KeAcquire import thunk). **First fire** at `cycle=5584980, lr=0x824bea14, r3=0x42453918` — same cycle as the divergent event's `guest_cycle=5584999`. Caller PC = `lr - 4 = 0x824bea10`, inside function **`sub_824be9a0`**. Cross-reference in `sylpheed.db`: `sub_824be9a0` has **zero `bl` callers** in the static disasm — it's NOT called directly by guest code. It IS the **graphics interrupt callback** armed via `VdSetGraphicsInterruptCallback(0x824be9a0, ctx)` per `crates/xenia-kernel/src/exports.rs:4101` and confirmed in 10+ audit logs. ## Function body of `sub_824be9a0` (the guest ISR) ```ppc 0x824be9a0 mfspr r12, LR 0x824be9a4 bl __savegprlr_29 0x824be9a8 stwu r1, -128(r1) 0x824be9ac or r31, r4, r4 ; r4 = user_data (ISR arg2) 0x824be9b0 cmpli cr6, 0, r3, 0x1 ; r3 = ISR source (arg1) 0x824be9b4 bc eq, 0x824BEA30 ; r3 == 1 → counter path ; --- r3 != 1 (i.e. r3 == 0, VSYNC) path: spinlock + bit-clear --- 0x824be9b8 lwz r10, 10772(r31) ... ; load dispatch fn pointer 0x824be9f0 mtspr CTR, r30 ; first guest-handler dispatch 0x824be9f4 bcctrl 0x824be9f8 lbz r10, 268(r13) ; per-CPU IRQL 0x824bea08 or r3, r30, r30 0x824bea0c slw r29, r11, r10 0x824bea10 bl 0x8284E1EC ; KeAcquireSpinLockAtRaisedIrql 0x824bea14 lwz r11, 0(r31) ... ; clear pending-IRQ bit 0x824bea28 bl 0x8284E1DC ; KeReleaseSpinLockFromRaisedIrql 0x824bea2c b 0x824BEAAC ; → epilogue ; --- r3 == 1 path: counter / no spinlock --- 0x824bea30 cmpli cr6, 0, r3, 0x0 0x824bea34 bc eq, 0x824BEAAC ; r3==0 already handled above 0x824bea38 addis r11, r0, 0x7FC8 ; load D1MODE_V_COUNTER MMIO 0x824bea3c lwz r11, 25924(r11) ... ; counter update + optional callback 0x824beaa4 mtspr CTR, r11 0x824beaa8 bcctrl 0x824beaac epilogue ``` ## Cross-reference to canary's source `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:303-310`: ```cpp void VdSetGraphicsInterruptCallback_entry(function_t callback, lpvoid_t user_data) { // callback takes 2 params // r3 = bool 0/1 - 0 is normal interrupt, 1 is some acquire/lock mumble // r4 = user_data (r4 of VdSetGraphicsInterruptCallback) ... } ``` So per canary's own comments: - `r3=0` (VSYNC / "normal interrupt") → guest takes the spinlock path - `r3=1` ("acquire/lock mumble", presumably the CP-interrupt) → guest takes the counter path In **both engines**, ours and canary, when the first VSYNC fires after VdSwap, the callback is invoked with `r3=0` and the spinlock path executes. **The only difference is timing.** ## Per-engine VSYNC dispatch model ### Ours - `kernel.interrupts.tick_vsync_instr(instruction_count)` accumulates instructions; fires VSYNC when `vsync_accumulator >= 150_000`. - `try_inject_graphics_interrupt` runs every scheduler round; injects the queued VSYNC into the first Ready (else Blocked) HW thread. - Lockstep / diff-harness path uses `tick_vsync_instr` (not wall-clock). - Net effect: ours fires VSYNC ~every 150k guest instructions ≈ every scheduler round once instruction count grows; the FIRST VSYNC is delivered right after VdSwap returns because that's when tid=1 becomes Ready and `is_in_callback==false`. ### Canary - A dedicated host thread `frame_limiter_worker_thread_` (`graphics_system.cc:148-237`) calls `MarkVblank()` → `DispatchInterruptCallback(0, 2)` → `EmulateCPInterruptDPC(callback, data, source=0, cpu=2)`. - Wall-clock paced via `Clock::QueryGuestTickCount()` vs `vsync_duration_d = 16.67 ms` (60 Hz). - First MarkVblank fires after at least 16.67 ms wall-clock from frame-limiter thread creation. - The callback runs on whichever XThread is current at dispatch time (not tid-locked). ## Empirical counts (sanity) | engine | total KeAcquire calls | first KeAcquire idx | first KeAcquire host_ns | |---|---|---|---| | canary | 16,000 | tid=6 idx 106,805 | 1,731,840,900 (~1.73 s) | | ours | 32 | tid=1 idx 105,286 | 1,437,632,028 (~1.44 s) | Canary's first VSYNC interrupt fires ~80 ms after canary idx 105,286 (host wall-clock from canary log) — i.e. canary's tid=6 has time to make ~1,500 more events before the first interrupt arrives. Ours's first VSYNC arrives RIGHT at idx 105,286. The total-count gap (16,000 vs 32) is largely a runtime-window artifact: canary ran 90 s of wall-clock; ours ran ~1.5 s of guest time before wedging at the C+22 cap (downstream). Within ours's runtime window, the *rate* of vsync delivery is similar to canary's; the issue is the OFFSET of the first delivery. ## Class triage | class | description | applies? | |---|---|---| | A | Different LR → different caller, real control-flow branch | NO — LR identical, function identical, both engines take the SAME `r3=0` path | | B | Same LR / computed call with different fn pointer | NO — bl to fixed import thunk | | C | Game-state-dependent (state polled, branch taken) | NO — the branch in `sub_824be9a0` is on the ISR's `r3` arg, which is `0` (VSYNC) in BOTH engines | | D | Phase A coverage gap | NO — events are accurately captured | **Actual class: scheduler-cadence divergence.** The 6 events are not in the "main thread's compute" stream; they're in an **interrupt-context insertion** that ours delivers at a different wall-clock moment than canary. ## Why this is NOT a candidate for an engine-side fix 1. **Tripstone #5**: investigation reveals scheduler-determinism issue → STOP and report. 2. **MEMORY.md** explicitly lists "scheduler determinism" in the deferred bucket (review_a_boot_state_2026_05_21 entry: "Deferred: audio/HID/XAM/scheduler-determinism/diff-tool-canonicalization"). 3. The two engines have **fundamentally different VSYNC clock sources**: ours's `tick_vsync_instr` uses guest-instruction counts, canary's `frame_limiter_worker_thread_` uses host wall-clock. To align ours's first-vsync moment with canary's would require either: - Adopting wall-clock pacing for the lockstep diff harness (invalidates 23 phases of digest stability, per Phase D forensics' explicit warning), or - Calibrating the instruction-count threshold per cold run (non-deterministic, defeats the diff-harness's purpose). 4. The natural-progression goal is to fix REAL game-logic bugs. Forcing this specific VSYNC moment to align would mask the actual scheduler-determinism problem rather than resolve it. ## Why this is NOT a candidate for a diff-tool absorber (at this layer) A naïve 6-event absorber (`absorb KeAcquire + KeRelease pair if canary doesn't have one at the same position`) would advance the matched-prefix past idx 105,286, but **only by 24 events** before the next, different divergence: canary's `MmFreePhysicalMemory` vs ours's `KeEnterCriticalRegion` at the +24 offset. The chain `absorb-realign-diverge` repeats. Each downstream divergence will need its own analysis. Adding an absorber here without first characterizing the downstream divergences risks: 1. **Reading-error #23 crossover** (band-aid masks real divergence). 2. **Reading-error #32 inflation** (timing-window absorbers should be narrow; this one would fire on every VSYNC-driven cadence offset). 3. **Spurious main-prefix advancement** that hides multiple genuine issues downstream. The Phase D D-extension absorber (nested-CS-cleanup) was a **narrow, exhaustively-characterized** band-aid for a specific cap; this VSYNC-cadence shape lacks that characterization. ## Recommended next action ESCALATE to a dedicated scheduler-determinism methodology pivot (reading-error #32 / phase-c23-scheduler-determinism-plan refresh). Options: 1. **Adopt wall-clock vsync in lockstep** under a feature flag, accept non-determinism in the diff harness, treat matched-prefix as a noisy metric — re-baseline all Phase C+nn caps. 2. **Pin first-VSYNC delivery** to a guest-instruction landmark common to both engines (e.g. first `kernel.return VdSwap` on `VdSetGraphicsInterruptCallback`'s registered callback). Requires engine-side coordination + canary patch. 3. **Build a VSYNC-cadence-aware absorber** that absorbs interrupt-callback-induced event sequences on BOTH sides up to alignment landmarks. Requires characterizing the full set of guest-ISR shapes — `sub_824be9a0` is one of N callback bodies the absorber must recognize. All three options are out-of-scope for C+24 per the original task's escalation rule. ## Files inspected (read-only) - `xenia-rs/audit-runs/phase-c23-VdQueryVideoFlags/diff-jitter-1.md` (predecessor diff report) - `xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md` (schema / absorber inventory; v1.7) - `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:303-310, 438-523` (`VdSetGraphicsInterruptCallback_entry`, `VdSwap_entry`) - `xenia-canary/src/xenia/gpu/graphics_system.cc:148-237, 352-374` (frame_limiter_worker, MarkVblank, DispatchInterruptCallback) - `xenia-canary/src/xenia/kernel/kernel_state.cc:1365-1405` (`EmulateCPInterruptDPC`) - `xenia-rs/crates/xenia-kernel/src/interrupts.rs` (full file — InterruptState, tick_vsync_instr, tick_vsync_wallclock) - `xenia-rs/crates/xenia-app/src/main.rs:2440-2474, 3700-3812` (vsync ticker + injector) - `xenia-rs/crates/xenia-kernel/src/exports.rs:4086-4108` (`vd_set_graphics_interrupt_callback`) - `xenia-rs/sylpheed.db` (xrefs, instructions on `sub_824be9a0`/`sub_824ce4d0`/`sub_824cea80`) ## Files touched (changed) NONE. C+24 is read-only investigation. ## Test suite xenia-kernel: **226 PASS** (unchanged from C+23 baseline). No code edits, no test additions. ## Phase B `image_canonical_sha256` Pinned hash `ea8d160e…` UNCHANGED — no XEX loader changes. ## Cascade | | predicted | actual | |---|---|---| | A capture event context | 95% | **PASS** | | B classify (A/B/C/D) | 75% | **PASS** (none of A/B/C/D — fifth class: scheduler-cadence) | | C identify root cause | 60% | **PASS** (ours vsync_instr_period mistimed vs canary wall-clock frame-limiter) | | D land fix or clean escalation | 65% | **PASS — clean escalation** | | E main > 105,286 | 55% | **N/A — no engine change** | ## Tripstones honored 1. Reading-error #28 — verified canary semantics by reading `xboxkrnl_video.cc:303-310` directly; the r3=0/1 contract is documented in canary's own source comments. NOT assumed. 2. Reading-error #23 — explicitly chose NOT to land a downstream- risky absorber/fix. Main matched-prefix stays at 105,286. 3. Reading-error #31 — no fresh canary run made; used the C+23 archived jitter set. State of `cache/` + `cache_host/` unchanged. 4. Reading-error #32 — the cause IS scheduling-jitter on the interrupt-cadence axis. Confirmed by the empirical first-acquire-host-ns table above. 5. Escalation rule — TRIGGERED. Root cause requires scheduler-determinism methodology pivot, deferred per MEMORY.md. 6. `--mute=true` — N/A this session (one `xrs-c23 exec` probe run for `--branch-probe` capture; no canary run).