handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions
--- a/audit-runs/phase-c24-post-vdswap-branch/investigation.md
+++ b/audit-runs/phase-c24-post-vdswap-branch/investigation.md
@@ -0,0 +1,314 @@
+# Phase C+24 — post-VdSwap KeAcquireSpinLockAtRaisedIrql divergence
+
+**Date:** 2026-05-26
+**Mode:** READ-only investigation. NO engine change, NO diff-tool change, NO test change.
+**Status:** ESCALATED (scheduler-determinism deferred class).
+
+## TL;DR
+
+The post-C+23 first divergence at canary `tid=6` ↔ ours `tid=1` idx
+105,286 is **NOT a control-flow branch chosen by guest state**. It is a
+**scheduling-cadence divergence**: ours fires the first VSYNC graphics
+interrupt callback EARLIER than canary, inserting 6 extra events
+(`KeAcquireSpinLockAtRaisedIrql` + `KeReleaseSpinLockFromRaisedIrql`,
+×3 events each) into ours's tid=1 stream between `VdSwap.return` and
+`VdGetCurrentDisplayGamma`. Canary fires the SAME interrupt path with
+the SAME r3=0 (VSYNC) argument, just at a different wall-clock /
+trajectory point. Per tripstone #5 (escalation when divergence
+requires scheduler-determinism resolution), C+24 lands NO change. Main
+matched-prefix stays at 105,286.
+
+## Event-context capture (Step 1)
+
+### Pre-context (5 matched events)
+
+Both engines bit-identical:
+
+```
+import.call VdGetSystemCommandBuffer
+kernel.call VdGetSystemCommandBuffer
+kernel.return VdGetSystemCommandBuffer
+import.call VdSwap
+kernel.call VdSwap
+kernel.return VdSwap
+```
+
+### Divergent event
+
+```
+canary[105293]: import.call VdGetCurrentDisplayGamma            (ord 441)
+ours  [105286]: import.call KeAcquireSpinLockAtRaisedIrql       (ord 77)
+```
+
+### Post-divergence flow (ours)
+
+```
+ours[105286-105288]: import/call/return KeAcquireSpinLockAtRaisedIrql
+ours[105289-105291]: import/call/return KeReleaseSpinLockFromRaisedIrql
+ours[105292-105294]: import/call/return VdGetCurrentDisplayGamma   ← realigns with canary[105293-105295]
+```
+
+### Streams re-converge at offset +6 in ours
+
+After the 6 extra ours events, both streams call **the same** import
+sequence: `VdGetCurrentDisplayGamma → VdSetDisplayMode → VdGetCurrentDisplayInformation
+→ VdQueryVideoFlags (returns 3, per C+23) → VdQueryVideoMode → ...`. So
+the 6 events are an **inserted block in ours**, not a permanent
+trajectory split.
+
+But **secondary divergences appear ~24 events later**: ours's
+post-block stream diverges from canary again with
+`canary: MmFreePhysicalMemory` vs `ours: KeEnterCriticalRegion` at
+offset +24. This pattern of "absorb-realign-diverge" repeats; a simple
+6-event absorber would expose a chain of downstream divergences, each
+needing separate analysis.
+
+## LR localisation (Step 2)
+
+Ran ours with `--branch-probe=0x8284e1ec` (the KeAcquire import thunk).
+**First fire** at `cycle=5584980, lr=0x824bea14, r3=0x42453918` — same
+cycle as the divergent event's `guest_cycle=5584999`. Caller PC =
+`lr - 4 = 0x824bea10`, inside function **`sub_824be9a0`**.
+
+Cross-reference in `sylpheed.db`: `sub_824be9a0` has **zero `bl`
+callers** in the static disasm — it's NOT called directly by guest
+code. It IS the **graphics interrupt callback** armed via
+`VdSetGraphicsInterruptCallback(0x824be9a0, ctx)` per
+`crates/xenia-kernel/src/exports.rs:4101` and confirmed in 10+ audit
+logs.
+
+## Function body of `sub_824be9a0` (the guest ISR)
+
+```ppc
+0x824be9a0  mfspr   r12, LR
+0x824be9a4  bl      __savegprlr_29
+0x824be9a8  stwu    r1, -128(r1)
+0x824be9ac  or      r31, r4, r4              ; r4 = user_data (ISR arg2)
+0x824be9b0  cmpli   cr6, 0, r3, 0x1          ; r3 = ISR source (arg1)
+0x824be9b4  bc      eq, 0x824BEA30           ; r3 == 1 → counter path
+;   --- r3 != 1 (i.e. r3 == 0, VSYNC) path: spinlock + bit-clear ---
+0x824be9b8  lwz     r10, 10772(r31)
+   ...                                      ; load dispatch fn pointer
+0x824be9f0  mtspr   CTR, r30                  ; first guest-handler dispatch
+0x824be9f4  bcctrl
+0x824be9f8  lbz     r10, 268(r13)             ; per-CPU IRQL
+0x824bea08  or      r3, r30, r30
+0x824bea0c  slw     r29, r11, r10
+0x824bea10  bl      0x8284E1EC                ; KeAcquireSpinLockAtRaisedIrql
+0x824bea14  lwz     r11, 0(r31)
+   ...                                      ; clear pending-IRQ bit
+0x824bea28  bl      0x8284E1DC                ; KeReleaseSpinLockFromRaisedIrql
+0x824bea2c  b       0x824BEAAC                ; → epilogue
+;   --- r3 == 1 path: counter / no spinlock ---
+0x824bea30  cmpli   cr6, 0, r3, 0x0
+0x824bea34  bc      eq, 0x824BEAAC            ; r3==0 already handled above
+0x824bea38  addis   r11, r0, 0x7FC8           ; load D1MODE_V_COUNTER MMIO
+0x824bea3c  lwz     r11, 25924(r11)
+   ...                                      ; counter update + optional callback
+0x824beaa4  mtspr   CTR, r11
+0x824beaa8  bcctrl
+0x824beaac  epilogue
+```
+
+## Cross-reference to canary's source
+
+`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:303-310`:
+
+```cpp
+void VdSetGraphicsInterruptCallback_entry(function_t callback,
+                                          lpvoid_t user_data) {
+  // callback takes 2 params
+  // r3 = bool 0/1 - 0 is normal interrupt, 1 is some acquire/lock mumble
+  // r4 = user_data (r4 of VdSetGraphicsInterruptCallback)
+  ...
+}
+```
+
+So per canary's own comments:
+- `r3=0` (VSYNC / "normal interrupt") → guest takes the spinlock path
+- `r3=1` ("acquire/lock mumble", presumably the CP-interrupt) → guest takes the counter path
+
+In **both engines**, ours and canary, when the first VSYNC fires after
+VdSwap, the callback is invoked with `r3=0` and the spinlock path
+executes. **The only difference is timing.**
+
+## Per-engine VSYNC dispatch model
+
+### Ours
+- `kernel.interrupts.tick_vsync_instr(instruction_count)` accumulates
+  instructions; fires VSYNC when `vsync_accumulator >= 150_000`.
+- `try_inject_graphics_interrupt` runs every scheduler round; injects
+  the queued VSYNC into the first Ready (else Blocked) HW thread.
+- Lockstep / diff-harness path uses `tick_vsync_instr` (not wall-clock).
+- Net effect: ours fires VSYNC ~every 150k guest instructions ≈ every
+  scheduler round once instruction count grows; the FIRST VSYNC is
+  delivered right after VdSwap returns because that's when tid=1
+  becomes Ready and `is_in_callback==false`.
+
+### Canary
+- A dedicated host thread `frame_limiter_worker_thread_`
+  (`graphics_system.cc:148-237`) calls `MarkVblank()` →
+  `DispatchInterruptCallback(0, 2)` → `EmulateCPInterruptDPC(callback,
+  data, source=0, cpu=2)`.
+- Wall-clock paced via `Clock::QueryGuestTickCount()` vs
+  `vsync_duration_d = 16.67 ms` (60 Hz).
+- First MarkVblank fires after at least 16.67 ms wall-clock from
+  frame-limiter thread creation.
+- The callback runs on whichever XThread is current at dispatch time
+  (not tid-locked).
+
+## Empirical counts (sanity)
+
+| engine | total KeAcquire calls | first KeAcquire idx | first KeAcquire host_ns |
+|---|---|---|---|
+| canary | 16,000 | tid=6 idx 106,805 | 1,731,840,900 (~1.73 s) |
+| ours   | 32     | tid=1 idx 105,286 | 1,437,632,028 (~1.44 s) |
+
+Canary's first VSYNC interrupt fires ~80 ms after canary idx 105,286
+(host wall-clock from canary log) — i.e. canary's tid=6 has time to
+make ~1,500 more events before the first interrupt arrives. Ours's
+first VSYNC arrives RIGHT at idx 105,286.
+
+The total-count gap (16,000 vs 32) is largely a runtime-window
+artifact: canary ran 90 s of wall-clock; ours ran ~1.5 s of guest
+time before wedging at the C+22 cap (downstream). Within ours's
+runtime window, the *rate* of vsync delivery is similar to canary's;
+the issue is the OFFSET of the first delivery.
+
+## Class triage
+
+| class | description | applies? |
+|---|---|---|
+| A | Different LR → different caller, real control-flow branch | NO — LR identical, function identical, both engines take the SAME `r3=0` path |
+| B | Same LR / computed call with different fn pointer | NO — bl to fixed import thunk |
+| C | Game-state-dependent (state polled, branch taken) | NO — the branch in `sub_824be9a0` is on the ISR's `r3` arg, which is `0` (VSYNC) in BOTH engines |
+| D | Phase A coverage gap | NO — events are accurately captured |
+
+**Actual class: scheduler-cadence divergence.** The 6 events are not
+in the "main thread's compute" stream; they're in an
+**interrupt-context insertion** that ours delivers at a different
+wall-clock moment than canary.
+
+## Why this is NOT a candidate for an engine-side fix
+
+1. **Tripstone #5**: investigation reveals scheduler-determinism
+   issue → STOP and report.
+2. **MEMORY.md** explicitly lists "scheduler determinism" in the
+   deferred bucket (review_a_boot_state_2026_05_21 entry: "Deferred:
+   audio/HID/XAM/scheduler-determinism/diff-tool-canonicalization").
+3. The two engines have **fundamentally different VSYNC clock
+   sources**: ours's `tick_vsync_instr` uses guest-instruction counts,
+   canary's `frame_limiter_worker_thread_` uses host wall-clock. To
+   align ours's first-vsync moment with canary's would require either:
+   - Adopting wall-clock pacing for the lockstep diff harness
+     (invalidates 23 phases of digest stability, per Phase D
+     forensics' explicit warning), or
+   - Calibrating the instruction-count threshold per cold run
+     (non-deterministic, defeats the diff-harness's purpose).
+4. The natural-progression goal is to fix REAL game-logic bugs.
+   Forcing this specific VSYNC moment to align would mask the actual
+   scheduler-determinism problem rather than resolve it.
+
+## Why this is NOT a candidate for a diff-tool absorber (at this layer)
+
+A naïve 6-event absorber (`absorb KeAcquire + KeRelease pair if
+canary doesn't have one at the same position`) would advance the
+matched-prefix past idx 105,286, but **only by 24 events** before
+the next, different divergence: canary's `MmFreePhysicalMemory` vs
+ours's `KeEnterCriticalRegion` at the +24 offset. The chain
+`absorb-realign-diverge` repeats. Each downstream divergence will
+need its own analysis. Adding an absorber here without first
+characterizing the downstream divergences risks:
+
+1. **Reading-error #23 crossover** (band-aid masks real divergence).
+2. **Reading-error #32 inflation** (timing-window absorbers should be
+   narrow; this one would fire on every VSYNC-driven cadence offset).
+3. **Spurious main-prefix advancement** that hides multiple genuine
+   issues downstream.
+
+The Phase D D-extension absorber (nested-CS-cleanup) was a
+**narrow, exhaustively-characterized** band-aid for a specific cap;
+this VSYNC-cadence shape lacks that characterization.
+
+## Recommended next action
+
+ESCALATE to a dedicated scheduler-determinism methodology pivot
+(reading-error #32 / phase-c23-scheduler-determinism-plan refresh).
+Options:
+
+1. **Adopt wall-clock vsync in lockstep** under a feature flag, accept
+   non-determinism in the diff harness, treat matched-prefix as a
+   noisy metric — re-baseline all Phase C+nn caps.
+2. **Pin first-VSYNC delivery** to a guest-instruction landmark common
+   to both engines (e.g. first `kernel.return VdSwap` on
+   `VdSetGraphicsInterruptCallback`'s registered callback). Requires
+   engine-side coordination + canary patch.
+3. **Build a VSYNC-cadence-aware absorber** that absorbs
+   interrupt-callback-induced event sequences on BOTH sides up to
+   alignment landmarks. Requires characterizing the full set of
+   guest-ISR shapes — `sub_824be9a0` is one of N callback bodies the
+   absorber must recognize.
+
+All three options are out-of-scope for C+24 per the original task's
+escalation rule.
+
+## Files inspected (read-only)
+
+- `xenia-rs/audit-runs/phase-c23-VdQueryVideoFlags/diff-jitter-1.md`
+  (predecessor diff report)
+- `xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md` (schema /
+  absorber inventory; v1.7)
+- `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:303-310,
+  438-523` (`VdSetGraphicsInterruptCallback_entry`, `VdSwap_entry`)
+- `xenia-canary/src/xenia/gpu/graphics_system.cc:148-237, 352-374`
+  (frame_limiter_worker, MarkVblank, DispatchInterruptCallback)
+- `xenia-canary/src/xenia/kernel/kernel_state.cc:1365-1405`
+  (`EmulateCPInterruptDPC`)
+- `xenia-rs/crates/xenia-kernel/src/interrupts.rs` (full file —
+  InterruptState, tick_vsync_instr, tick_vsync_wallclock)
+- `xenia-rs/crates/xenia-app/src/main.rs:2440-2474, 3700-3812`
+  (vsync ticker + injector)
+- `xenia-rs/crates/xenia-kernel/src/exports.rs:4086-4108`
+  (`vd_set_graphics_interrupt_callback`)
+- `xenia-rs/sylpheed.db` (xrefs, instructions on
+  `sub_824be9a0`/`sub_824ce4d0`/`sub_824cea80`)
+
+## Files touched (changed)
+
+NONE. C+24 is read-only investigation.
+
+## Test suite
+
+xenia-kernel: **226 PASS** (unchanged from C+23 baseline). No code
+edits, no test additions.
+
+## Phase B `image_canonical_sha256`
+
+Pinned hash `ea8d160e…` UNCHANGED — no XEX loader changes.
+
+## Cascade
+
+| | predicted | actual |
+|---|---|---|
+| A capture event context | 95% | **PASS** |
+| B classify (A/B/C/D) | 75% | **PASS** (none of A/B/C/D — fifth class: scheduler-cadence) |
+| C identify root cause | 60% | **PASS** (ours vsync_instr_period mistimed vs canary wall-clock frame-limiter) |
+| D land fix or clean escalation | 65% | **PASS — clean escalation** |
+| E main > 105,286 | 55% | **N/A — no engine change** |
+
+## Tripstones honored
+
+1. Reading-error #28 — verified canary semantics by reading
+   `xboxkrnl_video.cc:303-310` directly; the r3=0/1 contract is
+   documented in canary's own source comments. NOT assumed.
+2. Reading-error #23 — explicitly chose NOT to land a downstream-
+   risky absorber/fix. Main matched-prefix stays at 105,286.
+3. Reading-error #31 — no fresh canary run made; used the C+23
+   archived jitter set. State of `cache/` + `cache_host/` unchanged.
+4. Reading-error #32 — the cause IS scheduling-jitter on the
+   interrupt-cadence axis. Confirmed by the empirical
+   first-acquire-host-ns table above.
+5. Escalation rule — TRIGGERED. Root cause requires
+   scheduler-determinism methodology pivot, deferred per MEMORY.md.
+6. `--mute=true` — N/A this session (one `xrs-c23 exec` probe run
+   for `--branch-probe` capture; no canary run).