Files
xenia-rs/audit-runs/phase-c24-post-vdswap-branch/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

14 KiB
Raw Blame History

Phase C+24 — post-VdSwap KeAcquireSpinLockAtRaisedIrql divergence

Date: 2026-05-26 Mode: READ-only investigation. NO engine change, NO diff-tool change, NO test change. Status: ESCALATED (scheduler-determinism deferred class).

TL;DR

The post-C+23 first divergence at canary tid=6 ↔ ours tid=1 idx 105,286 is NOT a control-flow branch chosen by guest state. It is a scheduling-cadence divergence: ours fires the first VSYNC graphics interrupt callback EARLIER than canary, inserting 6 extra events (KeAcquireSpinLockAtRaisedIrql + KeReleaseSpinLockFromRaisedIrql, ×3 events each) into ours's tid=1 stream between VdSwap.return and VdGetCurrentDisplayGamma. Canary fires the SAME interrupt path with the SAME r3=0 (VSYNC) argument, just at a different wall-clock / trajectory point. Per tripstone #5 (escalation when divergence requires scheduler-determinism resolution), C+24 lands NO change. Main matched-prefix stays at 105,286.

Event-context capture (Step 1)

Pre-context (5 matched events)

Both engines bit-identical:

import.call VdGetSystemCommandBuffer
kernel.call VdGetSystemCommandBuffer
kernel.return VdGetSystemCommandBuffer
import.call VdSwap
kernel.call VdSwap
kernel.return VdSwap

Divergent event

canary[105293]: import.call VdGetCurrentDisplayGamma            (ord 441)
ours  [105286]: import.call KeAcquireSpinLockAtRaisedIrql       (ord 77)

Post-divergence flow (ours)

ours[105286-105288]: import/call/return KeAcquireSpinLockAtRaisedIrql
ours[105289-105291]: import/call/return KeReleaseSpinLockFromRaisedIrql
ours[105292-105294]: import/call/return VdGetCurrentDisplayGamma   ← realigns with canary[105293-105295]

Streams re-converge at offset +6 in ours

After the 6 extra ours events, both streams call the same import sequence: VdGetCurrentDisplayGamma → VdSetDisplayMode → VdGetCurrentDisplayInformation → VdQueryVideoFlags (returns 3, per C+23) → VdQueryVideoMode → .... So the 6 events are an inserted block in ours, not a permanent trajectory split.

But secondary divergences appear ~24 events later: ours's post-block stream diverges from canary again with canary: MmFreePhysicalMemory vs ours: KeEnterCriticalRegion at offset +24. This pattern of "absorb-realign-diverge" repeats; a simple 6-event absorber would expose a chain of downstream divergences, each needing separate analysis.

LR localisation (Step 2)

Ran ours with --branch-probe=0x8284e1ec (the KeAcquire import thunk). First fire at cycle=5584980, lr=0x824bea14, r3=0x42453918 — same cycle as the divergent event's guest_cycle=5584999. Caller PC = lr - 4 = 0x824bea10, inside function sub_824be9a0.

Cross-reference in sylpheed.db: sub_824be9a0 has zero bl callers in the static disasm — it's NOT called directly by guest code. It IS the graphics interrupt callback armed via VdSetGraphicsInterruptCallback(0x824be9a0, ctx) per crates/xenia-kernel/src/exports.rs:4101 and confirmed in 10+ audit logs.

Function body of sub_824be9a0 (the guest ISR)

0x824be9a0  mfspr   r12, LR
0x824be9a4  bl      __savegprlr_29
0x824be9a8  stwu    r1, -128(r1)
0x824be9ac  or      r31, r4, r4              ; r4 = user_data (ISR arg2)
0x824be9b0  cmpli   cr6, 0, r3, 0x1          ; r3 = ISR source (arg1)
0x824be9b4  bc      eq, 0x824BEA30           ; r3 == 1 → counter path
;   --- r3 != 1 (i.e. r3 == 0, VSYNC) path: spinlock + bit-clear ---
0x824be9b8  lwz     r10, 10772(r31)
   ...                                      ; load dispatch fn pointer
0x824be9f0  mtspr   CTR, r30                  ; first guest-handler dispatch
0x824be9f4  bcctrl
0x824be9f8  lbz     r10, 268(r13)             ; per-CPU IRQL
0x824bea08  or      r3, r30, r30
0x824bea0c  slw     r29, r11, r10
0x824bea10  bl      0x8284E1EC                ; KeAcquireSpinLockAtRaisedIrql
0x824bea14  lwz     r11, 0(r31)
   ...                                      ; clear pending-IRQ bit
0x824bea28  bl      0x8284E1DC                ; KeReleaseSpinLockFromRaisedIrql
0x824bea2c  b       0x824BEAAC                ; → epilogue
;   --- r3 == 1 path: counter / no spinlock ---
0x824bea30  cmpli   cr6, 0, r3, 0x0
0x824bea34  bc      eq, 0x824BEAAC            ; r3==0 already handled above
0x824bea38  addis   r11, r0, 0x7FC8           ; load D1MODE_V_COUNTER MMIO
0x824bea3c  lwz     r11, 25924(r11)
   ...                                      ; counter update + optional callback
0x824beaa4  mtspr   CTR, r11
0x824beaa8  bcctrl
0x824beaac  epilogue

Cross-reference to canary's source

xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:303-310:

void VdSetGraphicsInterruptCallback_entry(function_t callback,
                                          lpvoid_t user_data) {
  // callback takes 2 params
  // r3 = bool 0/1 - 0 is normal interrupt, 1 is some acquire/lock mumble
  // r4 = user_data (r4 of VdSetGraphicsInterruptCallback)
  ...
}

So per canary's own comments:

  • r3=0 (VSYNC / "normal interrupt") → guest takes the spinlock path
  • r3=1 ("acquire/lock mumble", presumably the CP-interrupt) → guest takes the counter path

In both engines, ours and canary, when the first VSYNC fires after VdSwap, the callback is invoked with r3=0 and the spinlock path executes. The only difference is timing.

Per-engine VSYNC dispatch model

Ours

  • kernel.interrupts.tick_vsync_instr(instruction_count) accumulates instructions; fires VSYNC when vsync_accumulator >= 150_000.
  • try_inject_graphics_interrupt runs every scheduler round; injects the queued VSYNC into the first Ready (else Blocked) HW thread.
  • Lockstep / diff-harness path uses tick_vsync_instr (not wall-clock).
  • Net effect: ours fires VSYNC ~every 150k guest instructions ≈ every scheduler round once instruction count grows; the FIRST VSYNC is delivered right after VdSwap returns because that's when tid=1 becomes Ready and is_in_callback==false.

Canary

  • A dedicated host thread frame_limiter_worker_thread_ (graphics_system.cc:148-237) calls MarkVblank()DispatchInterruptCallback(0, 2)EmulateCPInterruptDPC(callback, data, source=0, cpu=2).
  • Wall-clock paced via Clock::QueryGuestTickCount() vs vsync_duration_d = 16.67 ms (60 Hz).
  • First MarkVblank fires after at least 16.67 ms wall-clock from frame-limiter thread creation.
  • The callback runs on whichever XThread is current at dispatch time (not tid-locked).

Empirical counts (sanity)

engine total KeAcquire calls first KeAcquire idx first KeAcquire host_ns
canary 16,000 tid=6 idx 106,805 1,731,840,900 (~1.73 s)
ours 32 tid=1 idx 105,286 1,437,632,028 (~1.44 s)

Canary's first VSYNC interrupt fires ~80 ms after canary idx 105,286 (host wall-clock from canary log) — i.e. canary's tid=6 has time to make ~1,500 more events before the first interrupt arrives. Ours's first VSYNC arrives RIGHT at idx 105,286.

The total-count gap (16,000 vs 32) is largely a runtime-window artifact: canary ran 90 s of wall-clock; ours ran ~1.5 s of guest time before wedging at the C+22 cap (downstream). Within ours's runtime window, the rate of vsync delivery is similar to canary's; the issue is the OFFSET of the first delivery.

Class triage

class description applies?
A Different LR → different caller, real control-flow branch NO — LR identical, function identical, both engines take the SAME r3=0 path
B Same LR / computed call with different fn pointer NO — bl to fixed import thunk
C Game-state-dependent (state polled, branch taken) NO — the branch in sub_824be9a0 is on the ISR's r3 arg, which is 0 (VSYNC) in BOTH engines
D Phase A coverage gap NO — events are accurately captured

Actual class: scheduler-cadence divergence. The 6 events are not in the "main thread's compute" stream; they're in an interrupt-context insertion that ours delivers at a different wall-clock moment than canary.

Why this is NOT a candidate for an engine-side fix

  1. Tripstone #5: investigation reveals scheduler-determinism issue → STOP and report.
  2. MEMORY.md explicitly lists "scheduler determinism" in the deferred bucket (review_a_boot_state_2026_05_21 entry: "Deferred: audio/HID/XAM/scheduler-determinism/diff-tool-canonicalization").
  3. The two engines have fundamentally different VSYNC clock sources: ours's tick_vsync_instr uses guest-instruction counts, canary's frame_limiter_worker_thread_ uses host wall-clock. To align ours's first-vsync moment with canary's would require either:
    • Adopting wall-clock pacing for the lockstep diff harness (invalidates 23 phases of digest stability, per Phase D forensics' explicit warning), or
    • Calibrating the instruction-count threshold per cold run (non-deterministic, defeats the diff-harness's purpose).
  4. The natural-progression goal is to fix REAL game-logic bugs. Forcing this specific VSYNC moment to align would mask the actual scheduler-determinism problem rather than resolve it.

Why this is NOT a candidate for a diff-tool absorber (at this layer)

A naïve 6-event absorber (absorb KeAcquire + KeRelease pair if canary doesn't have one at the same position) would advance the matched-prefix past idx 105,286, but only by 24 events before the next, different divergence: canary's MmFreePhysicalMemory vs ours's KeEnterCriticalRegion at the +24 offset. The chain absorb-realign-diverge repeats. Each downstream divergence will need its own analysis. Adding an absorber here without first characterizing the downstream divergences risks:

  1. Reading-error #23 crossover (band-aid masks real divergence).
  2. Reading-error #32 inflation (timing-window absorbers should be narrow; this one would fire on every VSYNC-driven cadence offset).
  3. Spurious main-prefix advancement that hides multiple genuine issues downstream.

The Phase D D-extension absorber (nested-CS-cleanup) was a narrow, exhaustively-characterized band-aid for a specific cap; this VSYNC-cadence shape lacks that characterization.

ESCALATE to a dedicated scheduler-determinism methodology pivot (reading-error #32 / phase-c23-scheduler-determinism-plan refresh). Options:

  1. Adopt wall-clock vsync in lockstep under a feature flag, accept non-determinism in the diff harness, treat matched-prefix as a noisy metric — re-baseline all Phase C+nn caps.
  2. Pin first-VSYNC delivery to a guest-instruction landmark common to both engines (e.g. first kernel.return VdSwap on VdSetGraphicsInterruptCallback's registered callback). Requires engine-side coordination + canary patch.
  3. Build a VSYNC-cadence-aware absorber that absorbs interrupt-callback-induced event sequences on BOTH sides up to alignment landmarks. Requires characterizing the full set of guest-ISR shapes — sub_824be9a0 is one of N callback bodies the absorber must recognize.

All three options are out-of-scope for C+24 per the original task's escalation rule.

Files inspected (read-only)

  • xenia-rs/audit-runs/phase-c23-VdQueryVideoFlags/diff-jitter-1.md (predecessor diff report)
  • xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md (schema / absorber inventory; v1.7)
  • xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:303-310, 438-523 (VdSetGraphicsInterruptCallback_entry, VdSwap_entry)
  • xenia-canary/src/xenia/gpu/graphics_system.cc:148-237, 352-374 (frame_limiter_worker, MarkVblank, DispatchInterruptCallback)
  • xenia-canary/src/xenia/kernel/kernel_state.cc:1365-1405 (EmulateCPInterruptDPC)
  • xenia-rs/crates/xenia-kernel/src/interrupts.rs (full file — InterruptState, tick_vsync_instr, tick_vsync_wallclock)
  • xenia-rs/crates/xenia-app/src/main.rs:2440-2474, 3700-3812 (vsync ticker + injector)
  • xenia-rs/crates/xenia-kernel/src/exports.rs:4086-4108 (vd_set_graphics_interrupt_callback)
  • xenia-rs/sylpheed.db (xrefs, instructions on sub_824be9a0/sub_824ce4d0/sub_824cea80)

Files touched (changed)

NONE. C+24 is read-only investigation.

Test suite

xenia-kernel: 226 PASS (unchanged from C+23 baseline). No code edits, no test additions.

Phase B image_canonical_sha256

Pinned hash ea8d160e… UNCHANGED — no XEX loader changes.

Cascade

predicted actual
A capture event context 95% PASS
B classify (A/B/C/D) 75% PASS (none of A/B/C/D — fifth class: scheduler-cadence)
C identify root cause 60% PASS (ours vsync_instr_period mistimed vs canary wall-clock frame-limiter)
D land fix or clean escalation 65% PASS — clean escalation
E main > 105,286 55% N/A — no engine change

Tripstones honored

  1. Reading-error #28 — verified canary semantics by reading xboxkrnl_video.cc:303-310 directly; the r3=0/1 contract is documented in canary's own source comments. NOT assumed.
  2. Reading-error #23 — explicitly chose NOT to land a downstream- risky absorber/fix. Main matched-prefix stays at 105,286.
  3. Reading-error #31 — no fresh canary run made; used the C+23 archived jitter set. State of cache/ + cache_host/ unchanged.
  4. Reading-error #32 — the cause IS scheduling-jitter on the interrupt-cadence axis. Confirmed by the empirical first-acquire-host-ns table above.
  5. Escalation rule — TRIGGERED. Root cause requires scheduler-determinism methodology pivot, deferred per MEMORY.md.
  6. --mute=true — N/A this session (one xrs-c23 exec probe run for --branch-probe capture; no canary run).