handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,45 @@
# 2.AX — Why ours's VSync ISR stops after cycle 7.46M
## Mechanism: HOST-TICKER-STALL (lockstep ticker keyed to guest instruction progress)
- Audit run = LOCKSTEP -> coord_pre_round uses `tick_vsync_instr(stats.instruction_count)` (main.rs:2457),
fires 1 VSYNC per 150K (VSYNC_INSTR_PERIOD) guest *instructions*.
- `stats.instruction_count` is bumped ONLY by real guest execution (main.rs:2868/2945/3056).
- When `round_schedule()` returns empty (ALL threads Blocked/Exited) the round skips execution and
calls `coord_idle_advance` (main.rs:3193), which advances guest *timebase* (scheduler.rs:1189-1196)
for timer deadlines but NEVER bumps instruction_count.
- => once tid=1 wedges on Event 0x10e8 and every other thread is Blocked/Exited, the guest executes
0 instructions/round, instruction_count FREEZES, tick_vsync_instr delta=0 -> no VSYNC queued ->
try_inject_graphics_interrupt has nothing to inject -> ISR stops.
## Trace evidence (AQ lr-trace on 0x824be9a0)
- 77 ISR fires total: 76 r3==0 (INTERRUPT_SOURCE_VSYNC=0), 1 r3==1 (INTERRUPT_SOURCE_CP=1).
- First fire cyc 283,678; LAST fire cyc 7,461,492; then 0 fires for the rest of a 66M-event run.
- Early fires tid=7; from cyc 5.58M on tid=1; stops exactly when all threads block.
- The injector (main.rs:3729) HAS a Blocked-thread fallback (Pass 2), so it is NOT the blocker —
it simply never receives a queued VSYNC after the ticker stalls.
## r3==1 (CP) path
- Fires exactly ONCE (cyc 5,577,159), the only CP interrupt ever queued (gpu.has_pending_interrupts,
main.rs:2622). Does NOT reach 0x824bea80 (the r3==0 opt_callback branch). Takes the
[user_data+10772]->[+16]/[+20] gfx-int sub-callback path. Even if it KeSetEvent'd 0x10e8 it would
do so once, not 60Hz. NOT a viable sustained producer in ours.
## Cross-engine symmetry
- Canary delivers VSync 60Hz continuously (tid=2 NtSetEvent 4660x @16.667ms) because canary's vsync
is host-wall-clock / GPU-thread driven, independent of guest CPU progress. ours's lockstep ticker
is guest-instruction driven -> self-stalls. The stop IS a bug (canary analog is sustained).
## Fix surface (NAME ONLY, no patch)
- File crates/xenia-app/src/main.rs `coord_pre_round` ~2454-2465 (and/or coord_idle_advance ~2528).
- Condition to change: in LOCKSTEP, the VSync ticker must advance off a clock that keeps moving when
the guest is wedged (the guest TIMEBASE that advance_all_timebases_to already advances during idle),
NOT off stats.instruction_count. Options: (a) drive tick on timebase delta; (b) also call the
ticker + injector from the idle path (coord_idle_advance) so a wedged-but-time-advancing guest still
gets VSync injected on a Blocked thread (injector Pass-2 already supports Blocked victims).
- LOC: ~10-30 (MEDIUM). Determinism: must derive cadence from the deterministic guest timebase, not
host wall-clock, to keep golden oracles bit-stable.
## Caveat
- This unsticks ISR *delivery* cadence. Whether the delivered r3==0 ISR then actually signals 0x10e8
is the SEPARATE 2.AV question (opt_callback +44 is a dead-end; real 0x10e8 producer still unconfirmed).
Fixing cadence is necessary but may not be sufficient.