Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
14 KiB
Phase C+24 — post-VdSwap KeAcquireSpinLockAtRaisedIrql divergence
Date: 2026-05-26 Mode: READ-only investigation. NO engine change, NO diff-tool change, NO test change. Status: ESCALATED (scheduler-determinism deferred class).
TL;DR
The post-C+23 first divergence at canary tid=6 ↔ ours tid=1 idx
105,286 is NOT a control-flow branch chosen by guest state. It is a
scheduling-cadence divergence: ours fires the first VSYNC graphics
interrupt callback EARLIER than canary, inserting 6 extra events
(KeAcquireSpinLockAtRaisedIrql + KeReleaseSpinLockFromRaisedIrql,
×3 events each) into ours's tid=1 stream between VdSwap.return and
VdGetCurrentDisplayGamma. Canary fires the SAME interrupt path with
the SAME r3=0 (VSYNC) argument, just at a different wall-clock /
trajectory point. Per tripstone #5 (escalation when divergence
requires scheduler-determinism resolution), C+24 lands NO change. Main
matched-prefix stays at 105,286.
Event-context capture (Step 1)
Pre-context (5 matched events)
Both engines bit-identical:
import.call VdGetSystemCommandBuffer
kernel.call VdGetSystemCommandBuffer
kernel.return VdGetSystemCommandBuffer
import.call VdSwap
kernel.call VdSwap
kernel.return VdSwap
Divergent event
canary[105293]: import.call VdGetCurrentDisplayGamma (ord 441)
ours [105286]: import.call KeAcquireSpinLockAtRaisedIrql (ord 77)
Post-divergence flow (ours)
ours[105286-105288]: import/call/return KeAcquireSpinLockAtRaisedIrql
ours[105289-105291]: import/call/return KeReleaseSpinLockFromRaisedIrql
ours[105292-105294]: import/call/return VdGetCurrentDisplayGamma ← realigns with canary[105293-105295]
Streams re-converge at offset +6 in ours
After the 6 extra ours events, both streams call the same import
sequence: VdGetCurrentDisplayGamma → VdSetDisplayMode → VdGetCurrentDisplayInformation → VdQueryVideoFlags (returns 3, per C+23) → VdQueryVideoMode → .... So
the 6 events are an inserted block in ours, not a permanent
trajectory split.
But secondary divergences appear ~24 events later: ours's
post-block stream diverges from canary again with
canary: MmFreePhysicalMemory vs ours: KeEnterCriticalRegion at
offset +24. This pattern of "absorb-realign-diverge" repeats; a simple
6-event absorber would expose a chain of downstream divergences, each
needing separate analysis.
LR localisation (Step 2)
Ran ours with --branch-probe=0x8284e1ec (the KeAcquire import thunk).
First fire at cycle=5584980, lr=0x824bea14, r3=0x42453918 — same
cycle as the divergent event's guest_cycle=5584999. Caller PC =
lr - 4 = 0x824bea10, inside function sub_824be9a0.
Cross-reference in sylpheed.db: sub_824be9a0 has zero bl
callers in the static disasm — it's NOT called directly by guest
code. It IS the graphics interrupt callback armed via
VdSetGraphicsInterruptCallback(0x824be9a0, ctx) per
crates/xenia-kernel/src/exports.rs:4101 and confirmed in 10+ audit
logs.
Function body of sub_824be9a0 (the guest ISR)
0x824be9a0 mfspr r12, LR
0x824be9a4 bl __savegprlr_29
0x824be9a8 stwu r1, -128(r1)
0x824be9ac or r31, r4, r4 ; r4 = user_data (ISR arg2)
0x824be9b0 cmpli cr6, 0, r3, 0x1 ; r3 = ISR source (arg1)
0x824be9b4 bc eq, 0x824BEA30 ; r3 == 1 → counter path
; --- r3 != 1 (i.e. r3 == 0, VSYNC) path: spinlock + bit-clear ---
0x824be9b8 lwz r10, 10772(r31)
... ; load dispatch fn pointer
0x824be9f0 mtspr CTR, r30 ; first guest-handler dispatch
0x824be9f4 bcctrl
0x824be9f8 lbz r10, 268(r13) ; per-CPU IRQL
0x824bea08 or r3, r30, r30
0x824bea0c slw r29, r11, r10
0x824bea10 bl 0x8284E1EC ; KeAcquireSpinLockAtRaisedIrql
0x824bea14 lwz r11, 0(r31)
... ; clear pending-IRQ bit
0x824bea28 bl 0x8284E1DC ; KeReleaseSpinLockFromRaisedIrql
0x824bea2c b 0x824BEAAC ; → epilogue
; --- r3 == 1 path: counter / no spinlock ---
0x824bea30 cmpli cr6, 0, r3, 0x0
0x824bea34 bc eq, 0x824BEAAC ; r3==0 already handled above
0x824bea38 addis r11, r0, 0x7FC8 ; load D1MODE_V_COUNTER MMIO
0x824bea3c lwz r11, 25924(r11)
... ; counter update + optional callback
0x824beaa4 mtspr CTR, r11
0x824beaa8 bcctrl
0x824beaac epilogue
Cross-reference to canary's source
xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:303-310:
void VdSetGraphicsInterruptCallback_entry(function_t callback,
lpvoid_t user_data) {
// callback takes 2 params
// r3 = bool 0/1 - 0 is normal interrupt, 1 is some acquire/lock mumble
// r4 = user_data (r4 of VdSetGraphicsInterruptCallback)
...
}
So per canary's own comments:
r3=0(VSYNC / "normal interrupt") → guest takes the spinlock pathr3=1("acquire/lock mumble", presumably the CP-interrupt) → guest takes the counter path
In both engines, ours and canary, when the first VSYNC fires after
VdSwap, the callback is invoked with r3=0 and the spinlock path
executes. The only difference is timing.
Per-engine VSYNC dispatch model
Ours
kernel.interrupts.tick_vsync_instr(instruction_count)accumulates instructions; fires VSYNC whenvsync_accumulator >= 150_000.try_inject_graphics_interruptruns every scheduler round; injects the queued VSYNC into the first Ready (else Blocked) HW thread.- Lockstep / diff-harness path uses
tick_vsync_instr(not wall-clock). - Net effect: ours fires VSYNC ~every 150k guest instructions ≈ every
scheduler round once instruction count grows; the FIRST VSYNC is
delivered right after VdSwap returns because that's when tid=1
becomes Ready and
is_in_callback==false.
Canary
- A dedicated host thread
frame_limiter_worker_thread_(graphics_system.cc:148-237) callsMarkVblank()→DispatchInterruptCallback(0, 2)→EmulateCPInterruptDPC(callback, data, source=0, cpu=2). - Wall-clock paced via
Clock::QueryGuestTickCount()vsvsync_duration_d = 16.67 ms(60 Hz). - First MarkVblank fires after at least 16.67 ms wall-clock from frame-limiter thread creation.
- The callback runs on whichever XThread is current at dispatch time (not tid-locked).
Empirical counts (sanity)
| engine | total KeAcquire calls | first KeAcquire idx | first KeAcquire host_ns |
|---|---|---|---|
| canary | 16,000 | tid=6 idx 106,805 | 1,731,840,900 (~1.73 s) |
| ours | 32 | tid=1 idx 105,286 | 1,437,632,028 (~1.44 s) |
Canary's first VSYNC interrupt fires ~80 ms after canary idx 105,286 (host wall-clock from canary log) — i.e. canary's tid=6 has time to make ~1,500 more events before the first interrupt arrives. Ours's first VSYNC arrives RIGHT at idx 105,286.
The total-count gap (16,000 vs 32) is largely a runtime-window artifact: canary ran 90 s of wall-clock; ours ran ~1.5 s of guest time before wedging at the C+22 cap (downstream). Within ours's runtime window, the rate of vsync delivery is similar to canary's; the issue is the OFFSET of the first delivery.
Class triage
| class | description | applies? |
|---|---|---|
| A | Different LR → different caller, real control-flow branch | NO — LR identical, function identical, both engines take the SAME r3=0 path |
| B | Same LR / computed call with different fn pointer | NO — bl to fixed import thunk |
| C | Game-state-dependent (state polled, branch taken) | NO — the branch in sub_824be9a0 is on the ISR's r3 arg, which is 0 (VSYNC) in BOTH engines |
| D | Phase A coverage gap | NO — events are accurately captured |
Actual class: scheduler-cadence divergence. The 6 events are not in the "main thread's compute" stream; they're in an interrupt-context insertion that ours delivers at a different wall-clock moment than canary.
Why this is NOT a candidate for an engine-side fix
- Tripstone #5: investigation reveals scheduler-determinism issue → STOP and report.
- MEMORY.md explicitly lists "scheduler determinism" in the deferred bucket (review_a_boot_state_2026_05_21 entry: "Deferred: audio/HID/XAM/scheduler-determinism/diff-tool-canonicalization").
- The two engines have fundamentally different VSYNC clock
sources: ours's
tick_vsync_instruses guest-instruction counts, canary'sframe_limiter_worker_thread_uses host wall-clock. To align ours's first-vsync moment with canary's would require either:- Adopting wall-clock pacing for the lockstep diff harness (invalidates 23 phases of digest stability, per Phase D forensics' explicit warning), or
- Calibrating the instruction-count threshold per cold run (non-deterministic, defeats the diff-harness's purpose).
- The natural-progression goal is to fix REAL game-logic bugs. Forcing this specific VSYNC moment to align would mask the actual scheduler-determinism problem rather than resolve it.
Why this is NOT a candidate for a diff-tool absorber (at this layer)
A naïve 6-event absorber (absorb KeAcquire + KeRelease pair if canary doesn't have one at the same position) would advance the
matched-prefix past idx 105,286, but only by 24 events before
the next, different divergence: canary's MmFreePhysicalMemory vs
ours's KeEnterCriticalRegion at the +24 offset. The chain
absorb-realign-diverge repeats. Each downstream divergence will
need its own analysis. Adding an absorber here without first
characterizing the downstream divergences risks:
- Reading-error #23 crossover (band-aid masks real divergence).
- Reading-error #32 inflation (timing-window absorbers should be narrow; this one would fire on every VSYNC-driven cadence offset).
- Spurious main-prefix advancement that hides multiple genuine issues downstream.
The Phase D D-extension absorber (nested-CS-cleanup) was a narrow, exhaustively-characterized band-aid for a specific cap; this VSYNC-cadence shape lacks that characterization.
Recommended next action
ESCALATE to a dedicated scheduler-determinism methodology pivot (reading-error #32 / phase-c23-scheduler-determinism-plan refresh). Options:
- Adopt wall-clock vsync in lockstep under a feature flag, accept non-determinism in the diff harness, treat matched-prefix as a noisy metric — re-baseline all Phase C+nn caps.
- Pin first-VSYNC delivery to a guest-instruction landmark common
to both engines (e.g. first
kernel.return VdSwaponVdSetGraphicsInterruptCallback's registered callback). Requires engine-side coordination + canary patch. - Build a VSYNC-cadence-aware absorber that absorbs
interrupt-callback-induced event sequences on BOTH sides up to
alignment landmarks. Requires characterizing the full set of
guest-ISR shapes —
sub_824be9a0is one of N callback bodies the absorber must recognize.
All three options are out-of-scope for C+24 per the original task's escalation rule.
Files inspected (read-only)
xenia-rs/audit-runs/phase-c23-VdQueryVideoFlags/diff-jitter-1.md(predecessor diff report)xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md(schema / absorber inventory; v1.7)xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:303-310, 438-523(VdSetGraphicsInterruptCallback_entry,VdSwap_entry)xenia-canary/src/xenia/gpu/graphics_system.cc:148-237, 352-374(frame_limiter_worker, MarkVblank, DispatchInterruptCallback)xenia-canary/src/xenia/kernel/kernel_state.cc:1365-1405(EmulateCPInterruptDPC)xenia-rs/crates/xenia-kernel/src/interrupts.rs(full file — InterruptState, tick_vsync_instr, tick_vsync_wallclock)xenia-rs/crates/xenia-app/src/main.rs:2440-2474, 3700-3812(vsync ticker + injector)xenia-rs/crates/xenia-kernel/src/exports.rs:4086-4108(vd_set_graphics_interrupt_callback)xenia-rs/sylpheed.db(xrefs, instructions onsub_824be9a0/sub_824ce4d0/sub_824cea80)
Files touched (changed)
NONE. C+24 is read-only investigation.
Test suite
xenia-kernel: 226 PASS (unchanged from C+23 baseline). No code edits, no test additions.
Phase B image_canonical_sha256
Pinned hash ea8d160e… UNCHANGED — no XEX loader changes.
Cascade
| predicted | actual | |
|---|---|---|
| A capture event context | 95% | PASS |
| B classify (A/B/C/D) | 75% | PASS (none of A/B/C/D — fifth class: scheduler-cadence) |
| C identify root cause | 60% | PASS (ours vsync_instr_period mistimed vs canary wall-clock frame-limiter) |
| D land fix or clean escalation | 65% | PASS — clean escalation |
| E main > 105,286 | 55% | N/A — no engine change |
Tripstones honored
- Reading-error #28 — verified canary semantics by reading
xboxkrnl_video.cc:303-310directly; the r3=0/1 contract is documented in canary's own source comments. NOT assumed. - Reading-error #23 — explicitly chose NOT to land a downstream- risky absorber/fix. Main matched-prefix stays at 105,286.
- Reading-error #31 — no fresh canary run made; used the C+23
archived jitter set. State of
cache/+cache_host/unchanged. - Reading-error #32 — the cause IS scheduling-jitter on the interrupt-cadence axis. Confirmed by the empirical first-acquire-host-ns table above.
- Escalation rule — TRIGGERED. Root cause requires scheduler-determinism methodology pivot, deferred per MEMORY.md.
--mute=true— N/A this session (onexrs-c23 execprobe run for--branch-probecapture; no canary run).