The pre-fix VdSwap zero-filled the guest's reserved buffer with NOPs and
called `state.gpu.notify_xe_swap` directly — bypassing the ring, leaving
the PM4_XE_SWAP handler at gpu_system.rs:1232 dead code, and skipping
the PM4_TYPE0(SHADER_CONSTANT_FETCH_00_0, 6) patch. Sylpheed's bloom/
blur "sample frame N for frame N+1" path samples fetch-constant slot 0
expecting the frontbuffer descriptor; without the patch, slot 0 stayed
stale and any shader sampling it read garbage.
This commit writes the canary VdSwap PM4 sequence directly into the
primary ring at the current write pointer (read via the shared MMIO
atomic), then advances WPTR over the injection. The natural CP drain
consumes PM4_XE_SWAP — bumping `swaps_seen` and patching fetch-constant
slot 0 — without going through any direct kernel→GPU bypass.
Sequence per xenia-canary VdSwap_entry (xboxkrnl_video.cc:438-521):
1) PM4_TYPE0(0x4800, count=6) + 6 fetch-header dwords (with
base_address re-patched from virtual to physical >> 12).
2) PM4_TYPE3(PM4_XE_SWAP, count=4) + signature + frontbuffer_phys
+ width + height.
Mechanism notes:
- buffer_ptr in xenia-rs is in the system command buffer, NOT the
primary ring (verified empirically: buffer_ptr=0x4acd4df8 vs
ring_base=0x0accb000, size 4 KB). Canary's VdSwap writes to
buffer_ptr because its ring layout maps the reserved slot inside
the ring; xenia-rs's doesn't, so we have to write at the actual
ring WPTR address (cached on KernelState.ring_base from
VdInitializeRingBuffer).
- The original "buffer_ptr zero-fill + bump WPTR by 64" path is
preserved before the injection — it exposes any game-batched PM4
packets and keeps the buffer_ptr region skippable per existing
game compat behavior.
- A safety-net fallback at the end calls `notify_xe_swap` directly if
swaps_seen didn't advance during the drain (e.g. a ring-arithmetic
edge case). Idempotent — only fires when the PM4 path didn't.
- KRNBUG-Mm-04 deferred: virt→phys uses the masked stub
`virt & 0x1FFF_FFFF`, sufficient for the standard heap.
Mechanical changes:
- crates/xenia-gpu/src/pm4.rs: add make_packet_type0 / type2 / type3
helpers + round-trip unit test (mirrors canary xenos.h:1682-1709).
- crates/xenia-gpu/src/handle.rs: add mmio_cp_rb_wptr_load accessor
(Acquire-load) so the kernel can compute ring offsets.
- crates/xenia-kernel/src/state.rs: cache ring_base / ring_size_dwords
on KernelState (set by VdInitializeRingBuffer).
- crates/xenia-kernel/src/exports.rs: rewrite the vd_swap PM4-emit
block; patch fetch_dwords[1] base_address virt→phys before injection.
Verification at -n 100M lockstep:
swaps: 2 → 2 (game fires VdSwap exactly twice)
draws: 0 → 0 (gated by Phases D+E)
fallback warning: 0 occurrences (PM4 path consumed both swaps)
instructions: ~100M
Tests: 552 passing (553 with new pm4 round-trip test). Lockstep
stable-fields determinism: byte-identical across two 100M runs.
The "swaps > 2" prediction in the audit's plan assumed the game would
fire VdSwap more often once the path worked; empirically Sylpheed only
calls VdSwap twice within 100M instructions (this is the renderer
plateau the audit identified). The success criterion for Phase C is
that the PM4 path is now operational, which Phases D+E require for
visible draws.
Closes KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rust reimplementation of the xenia Xbox 360 emulator targeting reverse-
engineering and preservation, initially scoped to Project Sylpheed.
Includes:
- XEX2 loader (LZX decompression, AES decryption, PE parsing)
- XISO / XGD2 disc image VFS
- PPC interpreter with 200+ opcodes and VMX128 decoding
- Static analyzer: functions, cross-references, labels, asm + SQLite output
- HLE kernel covering the xboxkrnl/xam subset used by Sylpheed init
- Debugger with in-memory and SQLite-backed execution tracing
- `xenia-rs` CLI with extract/dis/exec commands that produce cumulative,
superset SQLite databases and opt-in instruction/import/branch traces
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>