The title's per-frame loop (sub_822F1AA8) is clock-B-paced and only re-fires
when the swap count [controller+88] changes, which advances only on source=1
CP swap-complete interrupts. Each present batch the guest submits (via the
sub_824CE348 -> sub_824BF4D0 builder) ends with a WAIT_REG_MEM on a per-CPU
swap-acknowledge fence [GCTX+0] (GCTX = [device+10772]); the GPU parks there
until the graphics ISR (sub_824BE9A0) clears that CPU's bit. Two coupled gaps
kept ours emitting only ONE source=1 then dead-locking (draws plateaued at 28,
run halted ~19.27M):
1. GPU MMIO register 0x1961 (AVIVO_D1MODE_VIEWPORT_SIZE) read as 0. The swap
callback sub_824CE2B8 divides by its low 12 bits (display height) as a
refresh-pacing term, so a 0 read tripped its `twi` divide-by-zero guard and
aborted the ISR before it reached the fence-clear. Mirror canary
GraphicsSystem::ReadRegister (graphics_system.cc:311): return 0x050002D0
(1280x720).
2. The ISR ran on an arbitrary borrowed thread, so [r13+268] (the PCR
processor number) did not match the interrupt's target CPU. The ISR clears
`1 << current_cpu` from the fence; running on the wrong CPU cleared the
wrong bit and the fence (bit 2, from cpu_mask 0x4) never reached 0. Carry
the target CPU through the interrupt queue (bit index of the PM4_INTERRUPT
cpu_mask for CP, 2 for vsync per canary DispatchInterruptCallback(0, 2)) and
impersonate it on the borrowed thread's PCR around the ISR, mirroring canary
EmulateCPInterruptDPC -> XThread::SetActiveCpu.
With both fixes the fence clears, the GPU drains each present batch, source=1
sustains per-present, clock B advances, and the loop runs continuously. Draws
climb linearly with the budget (no re-stall): 50M 28->718, 200M ->3411,
1B ->18734; swaps 2->147/950/6060. No "Unanticipated CPU_INTERRUPT" trap.
Inline-deterministic (--stable-digest byte-identical x2); n50m golden
re-baselined. 675 tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>