Files
xenia-rs/audit-runs/iterate-2AX-isr-cadence/findings.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

3.1 KiB

2.AX — Why ours's VSync ISR stops after cycle 7.46M

Mechanism: HOST-TICKER-STALL (lockstep ticker keyed to guest instruction progress)

  • Audit run = LOCKSTEP -> coord_pre_round uses tick_vsync_instr(stats.instruction_count) (main.rs:2457), fires 1 VSYNC per 150K (VSYNC_INSTR_PERIOD) guest instructions.
  • stats.instruction_count is bumped ONLY by real guest execution (main.rs:2868/2945/3056).
  • When round_schedule() returns empty (ALL threads Blocked/Exited) the round skips execution and calls coord_idle_advance (main.rs:3193), which advances guest timebase (scheduler.rs:1189-1196) for timer deadlines but NEVER bumps instruction_count.
  • => once tid=1 wedges on Event 0x10e8 and every other thread is Blocked/Exited, the guest executes 0 instructions/round, instruction_count FREEZES, tick_vsync_instr delta=0 -> no VSYNC queued -> try_inject_graphics_interrupt has nothing to inject -> ISR stops.

Trace evidence (AQ lr-trace on 0x824be9a0)

  • 77 ISR fires total: 76 r3==0 (INTERRUPT_SOURCE_VSYNC=0), 1 r3==1 (INTERRUPT_SOURCE_CP=1).
  • First fire cyc 283,678; LAST fire cyc 7,461,492; then 0 fires for the rest of a 66M-event run.
  • Early fires tid=7; from cyc 5.58M on tid=1; stops exactly when all threads block.
  • The injector (main.rs:3729) HAS a Blocked-thread fallback (Pass 2), so it is NOT the blocker — it simply never receives a queued VSYNC after the ticker stalls.

r3==1 (CP) path

  • Fires exactly ONCE (cyc 5,577,159), the only CP interrupt ever queued (gpu.has_pending_interrupts, main.rs:2622). Does NOT reach 0x824bea80 (the r3==0 opt_callback branch). Takes the [user_data+10772]->[+16]/[+20] gfx-int sub-callback path. Even if it KeSetEvent'd 0x10e8 it would do so once, not 60Hz. NOT a viable sustained producer in ours.

Cross-engine symmetry

  • Canary delivers VSync 60Hz continuously (tid=2 NtSetEvent 4660x @16.667ms) because canary's vsync is host-wall-clock / GPU-thread driven, independent of guest CPU progress. ours's lockstep ticker is guest-instruction driven -> self-stalls. The stop IS a bug (canary analog is sustained).

Fix surface (NAME ONLY, no patch)

  • File crates/xenia-app/src/main.rs coord_pre_round ~2454-2465 (and/or coord_idle_advance ~2528).
  • Condition to change: in LOCKSTEP, the VSync ticker must advance off a clock that keeps moving when the guest is wedged (the guest TIMEBASE that advance_all_timebases_to already advances during idle), NOT off stats.instruction_count. Options: (a) drive tick on timebase delta; (b) also call the ticker + injector from the idle path (coord_idle_advance) so a wedged-but-time-advancing guest still gets VSync injected on a Blocked thread (injector Pass-2 already supports Blocked victims).
  • LOC: ~10-30 (MEDIUM). Determinism: must derive cadence from the deterministic guest timebase, not host wall-clock, to keep golden oracles bit-stable.

Caveat

  • This unsticks ISR delivery cadence. Whether the delivered r3==0 ISR then actually signals 0x10e8 is the SEPARATE 2.AV question (opt_callback +44 is a dead-end; real 0x10e8 producer still unconfirmed). Fixing cadence is necessary but may not be sufficient.