Files
xenia-rs/audit-runs/phase-c24-post-vdswap-branch/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

315 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase C+24 — post-VdSwap KeAcquireSpinLockAtRaisedIrql divergence
**Date:** 2026-05-26
**Mode:** READ-only investigation. NO engine change, NO diff-tool change, NO test change.
**Status:** ESCALATED (scheduler-determinism deferred class).
## TL;DR
The post-C+23 first divergence at canary `tid=6` ↔ ours `tid=1` idx
105,286 is **NOT a control-flow branch chosen by guest state**. It is a
**scheduling-cadence divergence**: ours fires the first VSYNC graphics
interrupt callback EARLIER than canary, inserting 6 extra events
(`KeAcquireSpinLockAtRaisedIrql` + `KeReleaseSpinLockFromRaisedIrql`,
×3 events each) into ours's tid=1 stream between `VdSwap.return` and
`VdGetCurrentDisplayGamma`. Canary fires the SAME interrupt path with
the SAME r3=0 (VSYNC) argument, just at a different wall-clock /
trajectory point. Per tripstone #5 (escalation when divergence
requires scheduler-determinism resolution), C+24 lands NO change. Main
matched-prefix stays at 105,286.
## Event-context capture (Step 1)
### Pre-context (5 matched events)
Both engines bit-identical:
```
import.call VdGetSystemCommandBuffer
kernel.call VdGetSystemCommandBuffer
kernel.return VdGetSystemCommandBuffer
import.call VdSwap
kernel.call VdSwap
kernel.return VdSwap
```
### Divergent event
```
canary[105293]: import.call VdGetCurrentDisplayGamma (ord 441)
ours [105286]: import.call KeAcquireSpinLockAtRaisedIrql (ord 77)
```
### Post-divergence flow (ours)
```
ours[105286-105288]: import/call/return KeAcquireSpinLockAtRaisedIrql
ours[105289-105291]: import/call/return KeReleaseSpinLockFromRaisedIrql
ours[105292-105294]: import/call/return VdGetCurrentDisplayGamma ← realigns with canary[105293-105295]
```
### Streams re-converge at offset +6 in ours
After the 6 extra ours events, both streams call **the same** import
sequence: `VdGetCurrentDisplayGamma → VdSetDisplayMode → VdGetCurrentDisplayInformation
→ VdQueryVideoFlags (returns 3, per C+23) → VdQueryVideoMode → ...`. So
the 6 events are an **inserted block in ours**, not a permanent
trajectory split.
But **secondary divergences appear ~24 events later**: ours's
post-block stream diverges from canary again with
`canary: MmFreePhysicalMemory` vs `ours: KeEnterCriticalRegion` at
offset +24. This pattern of "absorb-realign-diverge" repeats; a simple
6-event absorber would expose a chain of downstream divergences, each
needing separate analysis.
## LR localisation (Step 2)
Ran ours with `--branch-probe=0x8284e1ec` (the KeAcquire import thunk).
**First fire** at `cycle=5584980, lr=0x824bea14, r3=0x42453918` — same
cycle as the divergent event's `guest_cycle=5584999`. Caller PC =
`lr - 4 = 0x824bea10`, inside function **`sub_824be9a0`**.
Cross-reference in `sylpheed.db`: `sub_824be9a0` has **zero `bl`
callers** in the static disasm — it's NOT called directly by guest
code. It IS the **graphics interrupt callback** armed via
`VdSetGraphicsInterruptCallback(0x824be9a0, ctx)` per
`crates/xenia-kernel/src/exports.rs:4101` and confirmed in 10+ audit
logs.
## Function body of `sub_824be9a0` (the guest ISR)
```ppc
0x824be9a0 mfspr r12, LR
0x824be9a4 bl __savegprlr_29
0x824be9a8 stwu r1, -128(r1)
0x824be9ac or r31, r4, r4 ; r4 = user_data (ISR arg2)
0x824be9b0 cmpli cr6, 0, r3, 0x1 ; r3 = ISR source (arg1)
0x824be9b4 bc eq, 0x824BEA30 ; r3 == 1 → counter path
; --- r3 != 1 (i.e. r3 == 0, VSYNC) path: spinlock + bit-clear ---
0x824be9b8 lwz r10, 10772(r31)
... ; load dispatch fn pointer
0x824be9f0 mtspr CTR, r30 ; first guest-handler dispatch
0x824be9f4 bcctrl
0x824be9f8 lbz r10, 268(r13) ; per-CPU IRQL
0x824bea08 or r3, r30, r30
0x824bea0c slw r29, r11, r10
0x824bea10 bl 0x8284E1EC ; KeAcquireSpinLockAtRaisedIrql
0x824bea14 lwz r11, 0(r31)
... ; clear pending-IRQ bit
0x824bea28 bl 0x8284E1DC ; KeReleaseSpinLockFromRaisedIrql
0x824bea2c b 0x824BEAAC ; → epilogue
; --- r3 == 1 path: counter / no spinlock ---
0x824bea30 cmpli cr6, 0, r3, 0x0
0x824bea34 bc eq, 0x824BEAAC ; r3==0 already handled above
0x824bea38 addis r11, r0, 0x7FC8 ; load D1MODE_V_COUNTER MMIO
0x824bea3c lwz r11, 25924(r11)
... ; counter update + optional callback
0x824beaa4 mtspr CTR, r11
0x824beaa8 bcctrl
0x824beaac epilogue
```
## Cross-reference to canary's source
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:303-310`:
```cpp
void VdSetGraphicsInterruptCallback_entry(function_t callback,
lpvoid_t user_data) {
// callback takes 2 params
// r3 = bool 0/1 - 0 is normal interrupt, 1 is some acquire/lock mumble
// r4 = user_data (r4 of VdSetGraphicsInterruptCallback)
...
}
```
So per canary's own comments:
- `r3=0` (VSYNC / "normal interrupt") → guest takes the spinlock path
- `r3=1` ("acquire/lock mumble", presumably the CP-interrupt) → guest takes the counter path
In **both engines**, ours and canary, when the first VSYNC fires after
VdSwap, the callback is invoked with `r3=0` and the spinlock path
executes. **The only difference is timing.**
## Per-engine VSYNC dispatch model
### Ours
- `kernel.interrupts.tick_vsync_instr(instruction_count)` accumulates
instructions; fires VSYNC when `vsync_accumulator >= 150_000`.
- `try_inject_graphics_interrupt` runs every scheduler round; injects
the queued VSYNC into the first Ready (else Blocked) HW thread.
- Lockstep / diff-harness path uses `tick_vsync_instr` (not wall-clock).
- Net effect: ours fires VSYNC ~every 150k guest instructions ≈ every
scheduler round once instruction count grows; the FIRST VSYNC is
delivered right after VdSwap returns because that's when tid=1
becomes Ready and `is_in_callback==false`.
### Canary
- A dedicated host thread `frame_limiter_worker_thread_`
(`graphics_system.cc:148-237`) calls `MarkVblank()`
`DispatchInterruptCallback(0, 2)``EmulateCPInterruptDPC(callback,
data, source=0, cpu=2)`.
- Wall-clock paced via `Clock::QueryGuestTickCount()` vs
`vsync_duration_d = 16.67 ms` (60 Hz).
- First MarkVblank fires after at least 16.67 ms wall-clock from
frame-limiter thread creation.
- The callback runs on whichever XThread is current at dispatch time
(not tid-locked).
## Empirical counts (sanity)
| engine | total KeAcquire calls | first KeAcquire idx | first KeAcquire host_ns |
|---|---|---|---|
| canary | 16,000 | tid=6 idx 106,805 | 1,731,840,900 (~1.73 s) |
| ours | 32 | tid=1 idx 105,286 | 1,437,632,028 (~1.44 s) |
Canary's first VSYNC interrupt fires ~80 ms after canary idx 105,286
(host wall-clock from canary log) — i.e. canary's tid=6 has time to
make ~1,500 more events before the first interrupt arrives. Ours's
first VSYNC arrives RIGHT at idx 105,286.
The total-count gap (16,000 vs 32) is largely a runtime-window
artifact: canary ran 90 s of wall-clock; ours ran ~1.5 s of guest
time before wedging at the C+22 cap (downstream). Within ours's
runtime window, the *rate* of vsync delivery is similar to canary's;
the issue is the OFFSET of the first delivery.
## Class triage
| class | description | applies? |
|---|---|---|
| A | Different LR → different caller, real control-flow branch | NO — LR identical, function identical, both engines take the SAME `r3=0` path |
| B | Same LR / computed call with different fn pointer | NO — bl to fixed import thunk |
| C | Game-state-dependent (state polled, branch taken) | NO — the branch in `sub_824be9a0` is on the ISR's `r3` arg, which is `0` (VSYNC) in BOTH engines |
| D | Phase A coverage gap | NO — events are accurately captured |
**Actual class: scheduler-cadence divergence.** The 6 events are not
in the "main thread's compute" stream; they're in an
**interrupt-context insertion** that ours delivers at a different
wall-clock moment than canary.
## Why this is NOT a candidate for an engine-side fix
1. **Tripstone #5**: investigation reveals scheduler-determinism
issue → STOP and report.
2. **MEMORY.md** explicitly lists "scheduler determinism" in the
deferred bucket (review_a_boot_state_2026_05_21 entry: "Deferred:
audio/HID/XAM/scheduler-determinism/diff-tool-canonicalization").
3. The two engines have **fundamentally different VSYNC clock
sources**: ours's `tick_vsync_instr` uses guest-instruction counts,
canary's `frame_limiter_worker_thread_` uses host wall-clock. To
align ours's first-vsync moment with canary's would require either:
- Adopting wall-clock pacing for the lockstep diff harness
(invalidates 23 phases of digest stability, per Phase D
forensics' explicit warning), or
- Calibrating the instruction-count threshold per cold run
(non-deterministic, defeats the diff-harness's purpose).
4. The natural-progression goal is to fix REAL game-logic bugs.
Forcing this specific VSYNC moment to align would mask the actual
scheduler-determinism problem rather than resolve it.
## Why this is NOT a candidate for a diff-tool absorber (at this layer)
A naïve 6-event absorber (`absorb KeAcquire + KeRelease pair if
canary doesn't have one at the same position`) would advance the
matched-prefix past idx 105,286, but **only by 24 events** before
the next, different divergence: canary's `MmFreePhysicalMemory` vs
ours's `KeEnterCriticalRegion` at the +24 offset. The chain
`absorb-realign-diverge` repeats. Each downstream divergence will
need its own analysis. Adding an absorber here without first
characterizing the downstream divergences risks:
1. **Reading-error #23 crossover** (band-aid masks real divergence).
2. **Reading-error #32 inflation** (timing-window absorbers should be
narrow; this one would fire on every VSYNC-driven cadence offset).
3. **Spurious main-prefix advancement** that hides multiple genuine
issues downstream.
The Phase D D-extension absorber (nested-CS-cleanup) was a
**narrow, exhaustively-characterized** band-aid for a specific cap;
this VSYNC-cadence shape lacks that characterization.
## Recommended next action
ESCALATE to a dedicated scheduler-determinism methodology pivot
(reading-error #32 / phase-c23-scheduler-determinism-plan refresh).
Options:
1. **Adopt wall-clock vsync in lockstep** under a feature flag, accept
non-determinism in the diff harness, treat matched-prefix as a
noisy metric — re-baseline all Phase C+nn caps.
2. **Pin first-VSYNC delivery** to a guest-instruction landmark common
to both engines (e.g. first `kernel.return VdSwap` on
`VdSetGraphicsInterruptCallback`'s registered callback). Requires
engine-side coordination + canary patch.
3. **Build a VSYNC-cadence-aware absorber** that absorbs
interrupt-callback-induced event sequences on BOTH sides up to
alignment landmarks. Requires characterizing the full set of
guest-ISR shapes — `sub_824be9a0` is one of N callback bodies the
absorber must recognize.
All three options are out-of-scope for C+24 per the original task's
escalation rule.
## Files inspected (read-only)
- `xenia-rs/audit-runs/phase-c23-VdQueryVideoFlags/diff-jitter-1.md`
(predecessor diff report)
- `xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md` (schema /
absorber inventory; v1.7)
- `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:303-310,
438-523` (`VdSetGraphicsInterruptCallback_entry`, `VdSwap_entry`)
- `xenia-canary/src/xenia/gpu/graphics_system.cc:148-237, 352-374`
(frame_limiter_worker, MarkVblank, DispatchInterruptCallback)
- `xenia-canary/src/xenia/kernel/kernel_state.cc:1365-1405`
(`EmulateCPInterruptDPC`)
- `xenia-rs/crates/xenia-kernel/src/interrupts.rs` (full file —
InterruptState, tick_vsync_instr, tick_vsync_wallclock)
- `xenia-rs/crates/xenia-app/src/main.rs:2440-2474, 3700-3812`
(vsync ticker + injector)
- `xenia-rs/crates/xenia-kernel/src/exports.rs:4086-4108`
(`vd_set_graphics_interrupt_callback`)
- `xenia-rs/sylpheed.db` (xrefs, instructions on
`sub_824be9a0`/`sub_824ce4d0`/`sub_824cea80`)
## Files touched (changed)
NONE. C+24 is read-only investigation.
## Test suite
xenia-kernel: **226 PASS** (unchanged from C+23 baseline). No code
edits, no test additions.
## Phase B `image_canonical_sha256`
Pinned hash `ea8d160e…` UNCHANGED — no XEX loader changes.
## Cascade
| | predicted | actual |
|---|---|---|
| A capture event context | 95% | **PASS** |
| B classify (A/B/C/D) | 75% | **PASS** (none of A/B/C/D — fifth class: scheduler-cadence) |
| C identify root cause | 60% | **PASS** (ours vsync_instr_period mistimed vs canary wall-clock frame-limiter) |
| D land fix or clean escalation | 65% | **PASS — clean escalation** |
| E main > 105,286 | 55% | **N/A — no engine change** |
## Tripstones honored
1. Reading-error #28 — verified canary semantics by reading
`xboxkrnl_video.cc:303-310` directly; the r3=0/1 contract is
documented in canary's own source comments. NOT assumed.
2. Reading-error #23 — explicitly chose NOT to land a downstream-
risky absorber/fix. Main matched-prefix stays at 105,286.
3. Reading-error #31 — no fresh canary run made; used the C+23
archived jitter set. State of `cache/` + `cache_host/` unchanged.
4. Reading-error #32 — the cause IS scheduling-jitter on the
interrupt-cadence axis. Confirmed by the empirical
first-acquire-host-ns table above.
5. Escalation rule — TRIGGERED. Root cause requires
scheduler-determinism methodology pivot, deferred per MEMORY.md.
6. `--mute=true` — N/A this session (one `xrs-c23 exec` probe run
for `--branch-probe` capture; no canary run).