[iterate-2K] GPU physical-mirror aliasing: ring/IB/RPtr/resolve read wrong host region

Root cause (physical-mirror aliasing gap → GPU read wrong region → ring never
truly drained → render worker ring-space wait → no frame → no draw):

The Xbox 360 maps its 512 MB of physical DRAM into several virtual mirror
windows differing only in cache policy — bare physical (0x0xxxxxxx),
write-combine (0x4xxxxxxx), and cached 0xA/0xC/0xExxxxxxx — all aliasing
addr & 0x1FFF_FFFF. Ours has one flat membase and `heap_alloc`
(MmAllocatePhysicalMemoryEx) commits physical backing in the 0x4xxxxxxx
window. The guest masks its CP-ring allocation base to bare physical
(0x4adcc000 & 0x1FFFFFFF = 0x0adcc000) before handing it to
VdInitializeRingBuffer, and PM4 INDIRECT_BUFFER / writeback / resolve
pointers are likewise bare-physical. Ours stored those verbatim and read
`membase + 0x0adcc000`, a never-committed zero-filled page — so the GPU
drained ~718k zero PM4 headers, never executed the real Type3/DRAW stream,
and the RPtr writeback landed on a zero page the render worker (tid=8) polls,
freezing it forever.

Fix (GPU/Vd-boundary translation, not memory-layer): add
`physical_to_backing(addr)` deriving the committed backing exactly from
`heap_alloc`'s placement (0x4000_0000 | (addr & 0x1FFF_FFFF), idempotent for
the WC window, flat for non-physical code/stack). Apply it at every point the
GPU/kernel consumes a guest physical address: ring base
(initialize_ring_buffer), RPtr writeback (enable_rptr_writeback), PM4
INDIRECT_BUFFER pointer, WAIT_REG_MEM / COND_WRITE memory poll+write,
REG_TO_MEM / MEM_WRITE / EVENT_WRITE* / LOAD_ALU_CONSTANT / IM_LOAD addresses,
the resolve dest write, and the vd_swap frontbuffer present read. This was
chosen over memory-layer aliasing because the latter re-projects every CPU
load/store and corrupts the guest's flat 0xA/0xC/0xE accesses (it caused an
early PC=0xfffffffc fault).

Two adjacent GPU-backend gates this exposed and also fixed (canary-faithful):
- WaitCmp::from_wait_info was off by one vs canary's MatchValueAndRef
  selector (it decoded wait_info&7==3 as NotEqual instead of Equal),
  inverting the standard CP coherency wait so the GPU parked forever on the
  first INDIRECT_BUFFER. Remapped to 1=Less..7=Always, 0=Never.
- Added MakeCoherent: a WAIT polling COHER_STATUS_HOST clears the status bit
  (mirrors command_processor.cc:801-838) so the coherency handshake resolves.

Result: the GPU now decodes the real Type3 packets at 0x4adcc000 (ME_INIT,
INDIRECT_BUFFER → real Type0/WAIT_REG_MEM at 0x4adf5080) instead of
zero-headers; RPtr at 0x408619fc advances (0x13, 0x16, … written by the GPU
worker); the frame loop sub_822F1AA8 actively writes the controller at
0x40d09a40 (0x20→0x21→0x23); no fault, full 200M/1B budget runs clean.

draws_seen is still 0: the remaining gate is upstream and separate — the main
frame loop never sets controller bit-28 (frame-ready) at [0x40d09a40] (stalls
at 0x23, the known iterate-2C state-divergence gate), so the guest never
enqueues a render IB; the GPU only ever replays the init IB. This fix
correctly unblocks the GPU ring/IB/RPtr data path (gate-2 GPU backend); the
bit-28 frame-ready gate is the next target.

Stable golden (sylpheed_n50m) unchanged (draws/swaps/RTs/shaders identical at
50M); regenerated twice byte-identical. cargo test --workspace: 672 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-13 13:39:57 +02:00
parent ed2e0e72fd
commit 2bdb93e51e
4 changed files with 145 additions and 25 deletions

View File

@@ -3169,13 +3169,18 @@ fn vd_swap(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
// safer to cap the read at the known total size to avoid OOB.
let mut tiled = Vec::with_capacity(total_tiled_bytes);
let mut ok = true;
// The frontbuffer is a guest *physical* address; project onto the
// committed backing window (see `xenia_gpu::physical_to_backing`)
// so the present reads the pixels the GPU resolved, not a stale /
// zero mirror page.
let fb_backing = xenia_gpu::physical_to_backing(swap.frontbuffer_phys);
for i in 0..total_tiled_bytes {
// read_u8 is cheap — the VirtualMemory handler returns 0
// for unmapped pages so we get a recognisable dark frame
// rather than a crash if the address turned out bogus.
let addr = swap.frontbuffer_phys.wrapping_add(i as u32);
let addr = fb_backing.wrapping_add(i as u32);
tiled.push(mem.read_u8(addr));
if addr < swap.frontbuffer_phys {
if addr < fb_backing {
ok = false;
break;
}