[iterate-2W] Sustain the title present loop: viewport-size register + ISR CPU impersonation

The title's per-frame loop (sub_822F1AA8) is clock-B-paced and only re-fires
when the swap count [controller+88] changes, which advances only on source=1
CP swap-complete interrupts. Each present batch the guest submits (via the
sub_824CE348 -> sub_824BF4D0 builder) ends with a WAIT_REG_MEM on a per-CPU
swap-acknowledge fence [GCTX+0] (GCTX = [device+10772]); the GPU parks there
until the graphics ISR (sub_824BE9A0) clears that CPU's bit. Two coupled gaps
kept ours emitting only ONE source=1 then dead-locking (draws plateaued at 28,
run halted ~19.27M):

1. GPU MMIO register 0x1961 (AVIVO_D1MODE_VIEWPORT_SIZE) read as 0. The swap
   callback sub_824CE2B8 divides by its low 12 bits (display height) as a
   refresh-pacing term, so a 0 read tripped its `twi` divide-by-zero guard and
   aborted the ISR before it reached the fence-clear. Mirror canary
   GraphicsSystem::ReadRegister (graphics_system.cc:311): return 0x050002D0
   (1280x720).

2. The ISR ran on an arbitrary borrowed thread, so [r13+268] (the PCR
   processor number) did not match the interrupt's target CPU. The ISR clears
   `1 << current_cpu` from the fence; running on the wrong CPU cleared the
   wrong bit and the fence (bit 2, from cpu_mask 0x4) never reached 0. Carry
   the target CPU through the interrupt queue (bit index of the PM4_INTERRUPT
   cpu_mask for CP, 2 for vsync per canary DispatchInterruptCallback(0, 2)) and
   impersonate it on the borrowed thread's PCR around the ISR, mirroring canary
   EmulateCPInterruptDPC -> XThread::SetActiveCpu.

With both fixes the fence clears, the GPU drains each present batch, source=1
sustains per-present, clock B advances, and the loop runs continuously. Draws
climb linearly with the budget (no re-stall): 50M 28->718, 200M ->3411,
1B ->18734; swaps 2->147/950/6060. No "Unanticipated CPU_INTERRUPT" trap.
Inline-deterministic (--stable-digest byte-identical x2); n50m golden
re-baselined. 675 tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-14 20:49:32 +02:00
parent 66bd805726
commit a91f4c550b
5 changed files with 97 additions and 24 deletions

View File

@@ -2338,10 +2338,22 @@ fn coord_post_round(
}
if kernel.gpu.has_pending_interrupts() {
for _pi in kernel.gpu.take_pending_interrupts() {
for pi in kernel.gpu.take_pending_interrupts() {
// Canary `ExecutePacketType3_INTERRUPT` dispatches the callback
// once per set bit of `cpu_mask` with that bit's index as the
// target CPU (`DispatchInterruptCallback(1, n)`). The guest's
// swap-acknowledge fence stores `cpu_mask`, and the ISR clears
// `1 << current_cpu` from it — so the ISR must run impersonating
// the masked CPU or the fence never reaches 0. Sylpheed uses a
// single-bit mask (`0x4` → CPU 2); take the lowest set bit.
let cpu = if pi.cpu_mask == 0 {
xenia_kernel::interrupts::VSYNC_TARGET_CPU
} else {
pi.cpu_mask.trailing_zeros().min(5) as u8
};
kernel
.interrupts
.queue_interrupt(xenia_kernel::INTERRUPT_SOURCE_CP);
.queue_interrupt(xenia_kernel::INTERRUPT_SOURCE_CP, cpu);
}
}
@@ -3545,7 +3557,17 @@ fn dispatch_graphics_interrupts(
None
};
/// X_KPCR offset of `prcb_data.current_cpu` (canary `xthread.cc`
/// `SetActiveCpu` → `pcr.prcb_data.current_cpu`). The guest graphics
/// ISR reads it via `lbz r10, 268(r13)` to decide which per-CPU bit of
/// the swap-acknowledge fence to clear.
const PCR_CURRENT_CPU_OFF: u32 = 268;
while let Some(source) = kernel.interrupts.peek_next() {
let target_cpu = kernel
.interrupts
.peek_next_cpu()
.unwrap_or(xenia_kernel::interrupts::VSYNC_TARGET_CPU);
// Victim selection: Ready first, then Blocked (canary's
// `XThread::GetCurrentThread()` analog — any live thread will
// do for borrowing context). Skip Idle/Exited/ServicingIrq.
@@ -3615,6 +3637,19 @@ fn dispatch_graphics_interrupts(
saved
};
// Impersonate the interrupt's target CPU on the borrowed thread's
// PCR, mirroring canary `EmulateCPInterruptDPC` →
// `XThread::SetActiveCpu(cpu)`. The guest swap-complete ISR clears
// `1 << [pcr.current_cpu]` from the per-present swap-acknowledge
// fence; if it runs on the wrong CPU it clears the wrong bit and
// the GPU's trailing `WAIT_REG_MEM` on that fence never releases —
// stranding the present/title loop. Save/restore so borrowing a
// thread doesn't permanently rewrite its processor number.
let pcr_addr = (kernel.scheduler.ctx_mut_ref(target_ref).gpr[13] as u32)
.wrapping_add(PCR_CURRENT_CPU_OFF);
let saved_cpu = mem.read_u8(pcr_addr);
mem.write_u8(pcr_addr, target_cpu);
// Stash the previous `scheduler.current` (call_export reaches
// it; imports the ISR calls must dispatch on the borrowed
// thread). Restore on the way out.
@@ -3707,6 +3742,7 @@ fn dispatch_graphics_interrupts(
// Restore the borrowed context.
saved.restore(kernel.scheduler.ctx_mut_ref(target_ref));
mem.write_u8(pcr_addr, saved_cpu);
kernel.scheduler.current = prev_current;
kernel.interrupts.delivered += 1;

View File

@@ -1,10 +1,10 @@
{
"instructions": 19274336,
"imports": 72513,
"instructions": 50000014,
"imports": 352251,
"unimpl": 0,
"draws": 28,
"swaps": 2,
"draws": 718,
"swaps": 147,
"unique_render_targets": 2,
"shader_blobs_live": 3,
"shader_blobs_live": 6,
"texture_cache_entries": 0
}