Commit Graph

2 Commits

Author SHA1 Message Date
MechaCat02
2bdb93e51e [iterate-2K] GPU physical-mirror aliasing: ring/IB/RPtr/resolve read wrong host region
Root cause (physical-mirror aliasing gap → GPU read wrong region → ring never
truly drained → render worker ring-space wait → no frame → no draw):

The Xbox 360 maps its 512 MB of physical DRAM into several virtual mirror
windows differing only in cache policy — bare physical (0x0xxxxxxx),
write-combine (0x4xxxxxxx), and cached 0xA/0xC/0xExxxxxxx — all aliasing
addr & 0x1FFF_FFFF. Ours has one flat membase and `heap_alloc`
(MmAllocatePhysicalMemoryEx) commits physical backing in the 0x4xxxxxxx
window. The guest masks its CP-ring allocation base to bare physical
(0x4adcc000 & 0x1FFFFFFF = 0x0adcc000) before handing it to
VdInitializeRingBuffer, and PM4 INDIRECT_BUFFER / writeback / resolve
pointers are likewise bare-physical. Ours stored those verbatim and read
`membase + 0x0adcc000`, a never-committed zero-filled page — so the GPU
drained ~718k zero PM4 headers, never executed the real Type3/DRAW stream,
and the RPtr writeback landed on a zero page the render worker (tid=8) polls,
freezing it forever.

Fix (GPU/Vd-boundary translation, not memory-layer): add
`physical_to_backing(addr)` deriving the committed backing exactly from
`heap_alloc`'s placement (0x4000_0000 | (addr & 0x1FFF_FFFF), idempotent for
the WC window, flat for non-physical code/stack). Apply it at every point the
GPU/kernel consumes a guest physical address: ring base
(initialize_ring_buffer), RPtr writeback (enable_rptr_writeback), PM4
INDIRECT_BUFFER pointer, WAIT_REG_MEM / COND_WRITE memory poll+write,
REG_TO_MEM / MEM_WRITE / EVENT_WRITE* / LOAD_ALU_CONSTANT / IM_LOAD addresses,
the resolve dest write, and the vd_swap frontbuffer present read. This was
chosen over memory-layer aliasing because the latter re-projects every CPU
load/store and corrupts the guest's flat 0xA/0xC/0xE accesses (it caused an
early PC=0xfffffffc fault).

Two adjacent GPU-backend gates this exposed and also fixed (canary-faithful):
- WaitCmp::from_wait_info was off by one vs canary's MatchValueAndRef
  selector (it decoded wait_info&7==3 as NotEqual instead of Equal),
  inverting the standard CP coherency wait so the GPU parked forever on the
  first INDIRECT_BUFFER. Remapped to 1=Less..7=Always, 0=Never.
- Added MakeCoherent: a WAIT polling COHER_STATUS_HOST clears the status bit
  (mirrors command_processor.cc:801-838) so the coherency handshake resolves.

Result: the GPU now decodes the real Type3 packets at 0x4adcc000 (ME_INIT,
INDIRECT_BUFFER → real Type0/WAIT_REG_MEM at 0x4adf5080) instead of
zero-headers; RPtr at 0x408619fc advances (0x13, 0x16, … written by the GPU
worker); the frame loop sub_822F1AA8 actively writes the controller at
0x40d09a40 (0x20→0x21→0x23); no fault, full 200M/1B budget runs clean.

draws_seen is still 0: the remaining gate is upstream and separate — the main
frame loop never sets controller bit-28 (frame-ready) at [0x40d09a40] (stalls
at 0x23, the known iterate-2C state-divergence gate), so the guest never
enqueues a render IB; the GPU only ever replays the init IB. This fix
correctly unblocks the GPU ring/IB/RPtr data path (gate-2 GPU backend); the
bit-28 frame-ready gate is the next target.

Stable golden (sylpheed_n50m) unchanged (draws/swaps/RTs/shaders identical at
50M); regenerated twice byte-identical. cargo test --workspace: 672 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 13:39:57 +02:00
MechaCat02
79eb52c378 xenia-gpu: end-to-end Xenos pipeline (PM4, ucode, EDRAM, resolve)
First real GPU implementation. Ring/PM4 frontend (ring_view,
ring_drain, pm4) drains the command processor; gpu_system owns the
threaded backend (DrainFence RPC + parker/fence helpers from M1) and
the MMIO-mapped register block (mmio_region).

Xenos shader frontend: ucode/{alu,control_flow,fetch,mod}.rs decode
the Xbox 360 microcode, translator.rs lowers it onto the WGSL
xenos_interp interpreter shader (shaders/xenos_interp.wgsl).
shader_metrics.rs counts decode/translate work.

Render state: draw_state, primitive, render_target_cache,
texture_cache, tiled_address (Xenos's swizzled tiled-memory layout),
xenos_constants (register field constants), edram (the 10 MiB EDRAM
model with MSAA), and resolve.rs (TILE_FLUSH copy-out — clear-resolve
plus bitwise-equivalent 32 bpp + 64 bpp paths landed). handle.rs
owns the typed GPU-resource handles the kernel hands out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:29:38 +02:00