The pre-fix VdSwap zero-filled the guest's reserved buffer with NOPs and
called `state.gpu.notify_xe_swap` directly — bypassing the ring, leaving
the PM4_XE_SWAP handler at gpu_system.rs:1232 dead code, and skipping
the PM4_TYPE0(SHADER_CONSTANT_FETCH_00_0, 6) patch. Sylpheed's bloom/
blur "sample frame N for frame N+1" path samples fetch-constant slot 0
expecting the frontbuffer descriptor; without the patch, slot 0 stayed
stale and any shader sampling it read garbage.
This commit writes the canary VdSwap PM4 sequence directly into the
primary ring at the current write pointer (read via the shared MMIO
atomic), then advances WPTR over the injection. The natural CP drain
consumes PM4_XE_SWAP — bumping `swaps_seen` and patching fetch-constant
slot 0 — without going through any direct kernel→GPU bypass.
Sequence per xenia-canary VdSwap_entry (xboxkrnl_video.cc:438-521):
1) PM4_TYPE0(0x4800, count=6) + 6 fetch-header dwords (with
base_address re-patched from virtual to physical >> 12).
2) PM4_TYPE3(PM4_XE_SWAP, count=4) + signature + frontbuffer_phys
+ width + height.
Mechanism notes:
- buffer_ptr in xenia-rs is in the system command buffer, NOT the
primary ring (verified empirically: buffer_ptr=0x4acd4df8 vs
ring_base=0x0accb000, size 4 KB). Canary's VdSwap writes to
buffer_ptr because its ring layout maps the reserved slot inside
the ring; xenia-rs's doesn't, so we have to write at the actual
ring WPTR address (cached on KernelState.ring_base from
VdInitializeRingBuffer).
- The original "buffer_ptr zero-fill + bump WPTR by 64" path is
preserved before the injection — it exposes any game-batched PM4
packets and keeps the buffer_ptr region skippable per existing
game compat behavior.
- A safety-net fallback at the end calls `notify_xe_swap` directly if
swaps_seen didn't advance during the drain (e.g. a ring-arithmetic
edge case). Idempotent — only fires when the PM4 path didn't.
- KRNBUG-Mm-04 deferred: virt→phys uses the masked stub
`virt & 0x1FFF_FFFF`, sufficient for the standard heap.
Mechanical changes:
- crates/xenia-gpu/src/pm4.rs: add make_packet_type0 / type2 / type3
helpers + round-trip unit test (mirrors canary xenos.h:1682-1709).
- crates/xenia-gpu/src/handle.rs: add mmio_cp_rb_wptr_load accessor
(Acquire-load) so the kernel can compute ring offsets.
- crates/xenia-kernel/src/state.rs: cache ring_base / ring_size_dwords
on KernelState (set by VdInitializeRingBuffer).
- crates/xenia-kernel/src/exports.rs: rewrite the vd_swap PM4-emit
block; patch fetch_dwords[1] base_address virt→phys before injection.
Verification at -n 100M lockstep:
swaps: 2 → 2 (game fires VdSwap exactly twice)
draws: 0 → 0 (gated by Phases D+E)
fallback warning: 0 occurrences (PM4 path consumed both swaps)
instructions: ~100M
Tests: 552 passing (553 with new pm4 round-trip test). Lockstep
stable-fields determinism: byte-identical across two 100M runs.
The "swaps > 2" prediction in the audit's plan assumed the game would
fire VdSwap more often once the path worked; empirically Sylpheed only
calls VdSwap twice within 100M instructions (this is the renderer
plateau the audit identified). The success criterion for Phase C is
that the PM4 path is now operational, which Phases D+E require for
visible draws.
Closes KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
287 lines
10 KiB
Rust
287 lines
10 KiB
Rust
//! PM4 packet format — header decoding + Type-3 opcode set.
|
|
//!
|
|
//! Xenos PM4 packet layout mirrors `xenia-canary/src/xenia/gpu/packet_disassembler.cc`:
|
|
//!
|
|
//! - **Type 0** (`packet >> 30 == 0`): register-write run.
|
|
//! `count = ((packet >> 16) & 0x3FFF) + 1`. Total dwords = `1 + count`.
|
|
//! With `(packet >> 15) & 1 == 1`, all writes target the same register.
|
|
//! - **Type 1** (`packet >> 30 == 1`): two-register write. Total dwords = 3.
|
|
//! - **Type 2** (`packet >> 30 == 2`): NOP — a single skipped dword.
|
|
//! - **Type 3** (`packet >> 30 == 3`): command.
|
|
//! `opcode = (packet >> 8) & 0x7F`,
|
|
//! `count = ((packet >> 16) & 0x3FFF) + 1`.
|
|
//! Total dwords = `1 + count`.
|
|
|
|
/// The cookie canary writes alongside `PM4_XE_SWAP` so tooling can recognize
|
|
/// swap packets. `'X','E','N','X'` big-endian (`kSwapSignature`).
|
|
pub const SWAP_SIGNATURE: u32 = 0x584E_4558;
|
|
|
|
// ── Named Type-3 opcodes (from xenia-canary/src/xenia/gpu/xenos.h:1617-1679) ──
|
|
|
|
pub const PM4_ME_INIT: u8 = 0x48;
|
|
pub const PM4_NOP: u8 = 0x10;
|
|
pub const PM4_INDIRECT_BUFFER: u8 = 0x3F;
|
|
pub const PM4_INDIRECT_BUFFER_PFD: u8 = 0x37;
|
|
pub const PM4_WAIT_FOR_IDLE: u8 = 0x26;
|
|
pub const PM4_WAIT_REG_MEM: u8 = 0x3C;
|
|
pub const PM4_REG_RMW: u8 = 0x21;
|
|
pub const PM4_REG_TO_MEM: u8 = 0x3E;
|
|
pub const PM4_MEM_WRITE: u8 = 0x3D;
|
|
pub const PM4_COND_WRITE: u8 = 0x45;
|
|
pub const PM4_EVENT_WRITE: u8 = 0x46;
|
|
pub const PM4_EVENT_WRITE_SHD: u8 = 0x58;
|
|
pub const PM4_EVENT_WRITE_EXT: u8 = 0x5A;
|
|
pub const PM4_EVENT_WRITE_ZPD: u8 = 0x5B;
|
|
pub const PM4_DRAW_INDX: u8 = 0x22;
|
|
pub const PM4_DRAW_INDX_2: u8 = 0x36;
|
|
pub const PM4_VIZ_QUERY: u8 = 0x23;
|
|
pub const PM4_SET_CONSTANT: u8 = 0x2D;
|
|
pub const PM4_SET_CONSTANT2: u8 = 0x55;
|
|
pub const PM4_SET_SHADER_CONSTANTS: u8 = 0x56;
|
|
pub const PM4_LOAD_ALU_CONSTANT: u8 = 0x2F;
|
|
pub const PM4_IM_LOAD: u8 = 0x27;
|
|
pub const PM4_IM_LOAD_IMMEDIATE: u8 = 0x2B;
|
|
pub const PM4_LOAD_CONSTANT_CONTEXT: u8 = 0x2E;
|
|
pub const PM4_INVALIDATE_STATE: u8 = 0x3B;
|
|
pub const PM4_INTERRUPT: u8 = 0x54;
|
|
pub const PM4_SET_SHADER_BASES: u8 = 0x4A;
|
|
pub const PM4_SET_BIN_MASK_LO: u8 = 0x60;
|
|
pub const PM4_SET_BIN_MASK_HI: u8 = 0x61;
|
|
pub const PM4_SET_BIN_SELECT_LO: u8 = 0x62;
|
|
pub const PM4_SET_BIN_SELECT_HI: u8 = 0x63;
|
|
pub const PM4_SET_BIN_MASK: u8 = 0x50;
|
|
pub const PM4_SET_BIN_SELECT: u8 = 0x51;
|
|
pub const PM4_CONTEXT_UPDATE: u8 = 0x5E;
|
|
/// Xenia-specific: `VdSwap` writes this to trigger a present.
|
|
pub const PM4_XE_SWAP: u8 = 0x64;
|
|
|
|
/// Human-readable name for a Type-3 opcode. Used for tracing spans.
|
|
pub fn type3_opcode_name(op: u8) -> &'static str {
|
|
match op {
|
|
PM4_ME_INIT => "ME_INIT",
|
|
PM4_NOP => "NOP",
|
|
PM4_INDIRECT_BUFFER => "INDIRECT_BUFFER",
|
|
PM4_INDIRECT_BUFFER_PFD => "INDIRECT_BUFFER_PFD",
|
|
PM4_WAIT_FOR_IDLE => "WAIT_FOR_IDLE",
|
|
PM4_WAIT_REG_MEM => "WAIT_REG_MEM",
|
|
PM4_REG_RMW => "REG_RMW",
|
|
PM4_REG_TO_MEM => "REG_TO_MEM",
|
|
PM4_MEM_WRITE => "MEM_WRITE",
|
|
PM4_COND_WRITE => "COND_WRITE",
|
|
PM4_EVENT_WRITE => "EVENT_WRITE",
|
|
PM4_EVENT_WRITE_SHD => "EVENT_WRITE_SHD",
|
|
PM4_EVENT_WRITE_EXT => "EVENT_WRITE_EXT",
|
|
PM4_EVENT_WRITE_ZPD => "EVENT_WRITE_ZPD",
|
|
PM4_DRAW_INDX => "DRAW_INDX",
|
|
PM4_DRAW_INDX_2 => "DRAW_INDX_2",
|
|
PM4_VIZ_QUERY => "VIZ_QUERY",
|
|
PM4_SET_CONSTANT => "SET_CONSTANT",
|
|
PM4_SET_CONSTANT2 => "SET_CONSTANT2",
|
|
PM4_SET_SHADER_CONSTANTS => "SET_SHADER_CONSTANTS",
|
|
PM4_LOAD_ALU_CONSTANT => "LOAD_ALU_CONSTANT",
|
|
PM4_LOAD_CONSTANT_CONTEXT => "LOAD_CONSTANT_CONTEXT",
|
|
PM4_IM_LOAD => "IM_LOAD",
|
|
PM4_IM_LOAD_IMMEDIATE => "IM_LOAD_IMMEDIATE",
|
|
PM4_INVALIDATE_STATE => "INVALIDATE_STATE",
|
|
PM4_INTERRUPT => "INTERRUPT",
|
|
PM4_SET_SHADER_BASES => "SET_SHADER_BASES",
|
|
PM4_SET_BIN_MASK_LO => "SET_BIN_MASK_LO",
|
|
PM4_SET_BIN_MASK_HI => "SET_BIN_MASK_HI",
|
|
PM4_SET_BIN_SELECT_LO => "SET_BIN_SELECT_LO",
|
|
PM4_SET_BIN_SELECT_HI => "SET_BIN_SELECT_HI",
|
|
PM4_SET_BIN_MASK => "SET_BIN_MASK",
|
|
PM4_SET_BIN_SELECT => "SET_BIN_SELECT",
|
|
PM4_CONTEXT_UPDATE => "CONTEXT_UPDATE",
|
|
PM4_XE_SWAP => "XE_SWAP",
|
|
_ => "UNKNOWN",
|
|
}
|
|
}
|
|
|
|
/// Decoded single PM4 packet header.
|
|
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
|
pub struct PacketHeader {
|
|
pub kind: PacketKind,
|
|
/// Total size of the packet (including header) in dwords.
|
|
pub total_dwords: u32,
|
|
}
|
|
|
|
/// Classification of a PM4 packet.
|
|
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
|
pub enum PacketKind {
|
|
/// Type-0 register-write run. `base_index` is the first register index
|
|
/// (the register offset / 4). `write_one` is true if all `count` data
|
|
/// dwords write to the same register.
|
|
Type0 {
|
|
base_index: u32,
|
|
count: u32,
|
|
write_one: bool,
|
|
},
|
|
/// Type-1 two-register write.
|
|
Type1 { reg_index_1: u32, reg_index_2: u32 },
|
|
/// Type-2 NOP (a single skipped dword).
|
|
Type2,
|
|
/// Type-3 command.
|
|
Type3 {
|
|
opcode: u8,
|
|
count: u32,
|
|
predicated: bool,
|
|
},
|
|
}
|
|
|
|
/// Build a Type-0 register-write packet header. Mirrors canary's
|
|
/// `MakePacketType0` at `xenia-canary/src/xenia/gpu/xenos.h:1682`.
|
|
/// `count` is the number of data dwords that follow (inclusive: 1..=0x4000).
|
|
pub fn make_packet_type0(reg_index: u16, count: u16) -> u32 {
|
|
debug_assert!(reg_index <= 0x7FFF);
|
|
debug_assert!(count >= 1 && count as u32 <= 0x4000);
|
|
(0u32 << 30) | (((count as u32 - 1) & 0x3FFF) << 16) | (reg_index as u32 & 0x7FFF)
|
|
}
|
|
|
|
/// Build a Type-2 NOP packet header. Single dword, no payload.
|
|
pub const fn make_packet_type2() -> u32 {
|
|
2u32 << 30
|
|
}
|
|
|
|
/// Build a Type-3 command packet header. Mirrors canary's `MakePacketType3`.
|
|
/// `count` is the number of data dwords that follow (inclusive: 1..=0x4000).
|
|
pub fn make_packet_type3(opcode: u8, count: u16) -> u32 {
|
|
debug_assert!(count >= 1 && count as u32 <= 0x4000);
|
|
(3u32 << 30) | (((count as u32 - 1) & 0x3FFF) << 16) | ((opcode as u32 & 0x7F) << 8)
|
|
}
|
|
|
|
/// Decode a single PM4 packet header.
|
|
pub fn decode(header: u32) -> PacketHeader {
|
|
match header >> 30 {
|
|
0 => {
|
|
let count = ((header >> 16) & 0x3FFF) + 1;
|
|
PacketHeader {
|
|
kind: PacketKind::Type0 {
|
|
base_index: header & 0x7FFF,
|
|
count,
|
|
write_one: (header >> 15) & 1 != 0,
|
|
},
|
|
total_dwords: 1 + count,
|
|
}
|
|
}
|
|
1 => PacketHeader {
|
|
kind: PacketKind::Type1 {
|
|
reg_index_1: header & 0x7FF,
|
|
reg_index_2: (header >> 11) & 0x7FF,
|
|
},
|
|
total_dwords: 3,
|
|
},
|
|
2 => PacketHeader {
|
|
kind: PacketKind::Type2,
|
|
total_dwords: 1,
|
|
},
|
|
3 => {
|
|
let count = ((header >> 16) & 0x3FFF) + 1;
|
|
PacketHeader {
|
|
kind: PacketKind::Type3 {
|
|
opcode: ((header >> 8) & 0x7F) as u8,
|
|
count,
|
|
predicated: (header & 1) != 0,
|
|
},
|
|
total_dwords: 1 + count,
|
|
}
|
|
}
|
|
_ => unreachable!(),
|
|
}
|
|
}
|
|
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use super::*;
|
|
|
|
#[test]
|
|
fn type2_is_one_dword() {
|
|
// 0x80000000 == type 2 header (bits 31:30 = 10)
|
|
let hdr = decode(0x8000_0000);
|
|
assert_eq!(hdr.kind, PacketKind::Type2);
|
|
assert_eq!(hdr.total_dwords, 1);
|
|
}
|
|
|
|
#[test]
|
|
fn type0_count_is_inclusive() {
|
|
// count field (bits 29:16) = 5 → 6 data dwords. base_index = 0x100.
|
|
// write_one = 0.
|
|
let hdr = decode((5 << 16) | 0x100);
|
|
match hdr.kind {
|
|
PacketKind::Type0 {
|
|
base_index,
|
|
count,
|
|
write_one,
|
|
} => {
|
|
assert_eq!(base_index, 0x100);
|
|
assert_eq!(count, 6);
|
|
assert!(!write_one);
|
|
}
|
|
_ => panic!("expected Type0"),
|
|
}
|
|
assert_eq!(hdr.total_dwords, 7);
|
|
}
|
|
|
|
#[test]
|
|
fn type3_swap_packet() {
|
|
// Build the exact header canary's VdSwap emits:
|
|
// MakePacketType3(PM4_XE_SWAP, 4) → ((3<<30) | ((4-1)<<16) | (0x64<<8))
|
|
let hdr_word = (3u32 << 30) | ((4u32 - 1) << 16) | ((PM4_XE_SWAP as u32) << 8);
|
|
let hdr = decode(hdr_word);
|
|
match hdr.kind {
|
|
PacketKind::Type3 {
|
|
opcode,
|
|
count,
|
|
predicated,
|
|
} => {
|
|
assert_eq!(opcode, PM4_XE_SWAP);
|
|
assert_eq!(count, 4);
|
|
assert!(!predicated);
|
|
}
|
|
_ => panic!("expected Type3"),
|
|
}
|
|
assert_eq!(hdr.total_dwords, 5);
|
|
}
|
|
|
|
#[test]
|
|
fn opcode_names_are_present_for_common_ops() {
|
|
assert_eq!(type3_opcode_name(PM4_NOP), "NOP");
|
|
assert_eq!(type3_opcode_name(PM4_DRAW_INDX), "DRAW_INDX");
|
|
assert_eq!(type3_opcode_name(PM4_XE_SWAP), "XE_SWAP");
|
|
assert_eq!(type3_opcode_name(PM4_WAIT_REG_MEM), "WAIT_REG_MEM");
|
|
assert_eq!(type3_opcode_name(0xFE), "UNKNOWN");
|
|
}
|
|
|
|
#[test]
|
|
fn make_packet_helpers_round_trip_through_decode() {
|
|
// Type-0: SHADER_CONSTANT_FETCH_00_0 (0x4800), count=6.
|
|
let t0 = make_packet_type0(0x4800, 6);
|
|
match decode(t0).kind {
|
|
PacketKind::Type0 { base_index, count, write_one } => {
|
|
assert_eq!(base_index, 0x4800);
|
|
assert_eq!(count, 6);
|
|
assert!(!write_one);
|
|
}
|
|
other => panic!("expected Type0, got {other:?}"),
|
|
}
|
|
assert_eq!(decode(t0).total_dwords, 7);
|
|
|
|
// Type-3: PM4_XE_SWAP, count=4 (signature + addr + W + H).
|
|
let t3 = make_packet_type3(PM4_XE_SWAP, 4);
|
|
match decode(t3).kind {
|
|
PacketKind::Type3 { opcode, count, predicated } => {
|
|
assert_eq!(opcode, PM4_XE_SWAP);
|
|
assert_eq!(count, 4);
|
|
assert!(!predicated);
|
|
}
|
|
other => panic!("expected Type3, got {other:?}"),
|
|
}
|
|
assert_eq!(decode(t3).total_dwords, 5);
|
|
|
|
// Type-2: NOP.
|
|
let t2 = make_packet_type2();
|
|
assert_eq!(t2, 0x8000_0000);
|
|
assert_eq!(decode(t2).kind, PacketKind::Type2);
|
|
assert_eq!(decode(t2).total_dwords, 1);
|
|
}
|
|
}
|