[iterate-2W] Sustain the title present loop: viewport-size register + ISR CPU impersonation
The title's per-frame loop (sub_822F1AA8) is clock-B-paced and only re-fires when the swap count [controller+88] changes, which advances only on source=1 CP swap-complete interrupts. Each present batch the guest submits (via the sub_824CE348 -> sub_824BF4D0 builder) ends with a WAIT_REG_MEM on a per-CPU swap-acknowledge fence [GCTX+0] (GCTX = [device+10772]); the GPU parks there until the graphics ISR (sub_824BE9A0) clears that CPU's bit. Two coupled gaps kept ours emitting only ONE source=1 then dead-locking (draws plateaued at 28, run halted ~19.27M): 1. GPU MMIO register 0x1961 (AVIVO_D1MODE_VIEWPORT_SIZE) read as 0. The swap callback sub_824CE2B8 divides by its low 12 bits (display height) as a refresh-pacing term, so a 0 read tripped its `twi` divide-by-zero guard and aborted the ISR before it reached the fence-clear. Mirror canary GraphicsSystem::ReadRegister (graphics_system.cc:311): return 0x050002D0 (1280x720). 2. The ISR ran on an arbitrary borrowed thread, so [r13+268] (the PCR processor number) did not match the interrupt's target CPU. The ISR clears `1 << current_cpu` from the fence; running on the wrong CPU cleared the wrong bit and the fence (bit 2, from cpu_mask 0x4) never reached 0. Carry the target CPU through the interrupt queue (bit index of the PM4_INTERRUPT cpu_mask for CP, 2 for vsync per canary DispatchInterruptCallback(0, 2)) and impersonate it on the borrowed thread's PCR around the ISR, mirroring canary EmulateCPInterruptDPC -> XThread::SetActiveCpu. With both fixes the fence clears, the GPU drains each present batch, source=1 sustains per-present, clock B advances, and the loop runs continuously. Draws climb linearly with the budget (no re-stall): 50M 28->718, 200M ->3411, 1B ->18734; swaps 2->147/950/6060. No "Unanticipated CPU_INTERRUPT" trap. Inline-deterministic (--stable-digest byte-identical x2); n50m golden re-baselined. 675 tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2338,10 +2338,22 @@ fn coord_post_round(
|
||||
}
|
||||
|
||||
if kernel.gpu.has_pending_interrupts() {
|
||||
for _pi in kernel.gpu.take_pending_interrupts() {
|
||||
for pi in kernel.gpu.take_pending_interrupts() {
|
||||
// Canary `ExecutePacketType3_INTERRUPT` dispatches the callback
|
||||
// once per set bit of `cpu_mask` with that bit's index as the
|
||||
// target CPU (`DispatchInterruptCallback(1, n)`). The guest's
|
||||
// swap-acknowledge fence stores `cpu_mask`, and the ISR clears
|
||||
// `1 << current_cpu` from it — so the ISR must run impersonating
|
||||
// the masked CPU or the fence never reaches 0. Sylpheed uses a
|
||||
// single-bit mask (`0x4` → CPU 2); take the lowest set bit.
|
||||
let cpu = if pi.cpu_mask == 0 {
|
||||
xenia_kernel::interrupts::VSYNC_TARGET_CPU
|
||||
} else {
|
||||
pi.cpu_mask.trailing_zeros().min(5) as u8
|
||||
};
|
||||
kernel
|
||||
.interrupts
|
||||
.queue_interrupt(xenia_kernel::INTERRUPT_SOURCE_CP);
|
||||
.queue_interrupt(xenia_kernel::INTERRUPT_SOURCE_CP, cpu);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -3545,7 +3557,17 @@ fn dispatch_graphics_interrupts(
|
||||
None
|
||||
};
|
||||
|
||||
/// X_KPCR offset of `prcb_data.current_cpu` (canary `xthread.cc`
|
||||
/// `SetActiveCpu` → `pcr.prcb_data.current_cpu`). The guest graphics
|
||||
/// ISR reads it via `lbz r10, 268(r13)` to decide which per-CPU bit of
|
||||
/// the swap-acknowledge fence to clear.
|
||||
const PCR_CURRENT_CPU_OFF: u32 = 268;
|
||||
|
||||
while let Some(source) = kernel.interrupts.peek_next() {
|
||||
let target_cpu = kernel
|
||||
.interrupts
|
||||
.peek_next_cpu()
|
||||
.unwrap_or(xenia_kernel::interrupts::VSYNC_TARGET_CPU);
|
||||
// Victim selection: Ready first, then Blocked (canary's
|
||||
// `XThread::GetCurrentThread()` analog — any live thread will
|
||||
// do for borrowing context). Skip Idle/Exited/ServicingIrq.
|
||||
@@ -3615,6 +3637,19 @@ fn dispatch_graphics_interrupts(
|
||||
saved
|
||||
};
|
||||
|
||||
// Impersonate the interrupt's target CPU on the borrowed thread's
|
||||
// PCR, mirroring canary `EmulateCPInterruptDPC` →
|
||||
// `XThread::SetActiveCpu(cpu)`. The guest swap-complete ISR clears
|
||||
// `1 << [pcr.current_cpu]` from the per-present swap-acknowledge
|
||||
// fence; if it runs on the wrong CPU it clears the wrong bit and
|
||||
// the GPU's trailing `WAIT_REG_MEM` on that fence never releases —
|
||||
// stranding the present/title loop. Save/restore so borrowing a
|
||||
// thread doesn't permanently rewrite its processor number.
|
||||
let pcr_addr = (kernel.scheduler.ctx_mut_ref(target_ref).gpr[13] as u32)
|
||||
.wrapping_add(PCR_CURRENT_CPU_OFF);
|
||||
let saved_cpu = mem.read_u8(pcr_addr);
|
||||
mem.write_u8(pcr_addr, target_cpu);
|
||||
|
||||
// Stash the previous `scheduler.current` (call_export reaches
|
||||
// it; imports the ISR calls must dispatch on the borrowed
|
||||
// thread). Restore on the way out.
|
||||
@@ -3707,6 +3742,7 @@ fn dispatch_graphics_interrupts(
|
||||
|
||||
// Restore the borrowed context.
|
||||
saved.restore(kernel.scheduler.ctx_mut_ref(target_ref));
|
||||
mem.write_u8(pcr_addr, saved_cpu);
|
||||
kernel.scheduler.current = prev_current;
|
||||
kernel.interrupts.delivered += 1;
|
||||
|
||||
|
||||
Reference in New Issue
Block a user