[iterate-3M] Fix Xenos shader CF/fetch decode so the textured logo binds
The publisher splash (title idx0) rendered FLAT in ours while canary samples a texture: ours never decoded the logo's textured pixel shader (E59B2B3D, a `tfetch2D` sprite) even though our guest IM_LOADs the exact same microcode canary does (verified byte-identical against the Wine oracle). The shader was misparsed as flat. Three coupled bugs in the ucode decoder, all off vs canary `gpu/ucode.h`: 1. CF opcode table was off-by-one (`control_flow.rs`): mapped opcode 0→Exec and 1→Exit, but Xenos has 0=kNop, 1=kExec, 2=kExecEnd, 3..6/13..14 the cond-exec variants, 7/8 loop, 9/10 call/return, 11 condjmp, 12 alloc, 15 mark-vs-fetch-done. So a real `kExec` clause was read as a terminal `Exit`, truncating the CF block and dropping every instruction (incl. the `tfetch`) after it. Added Nop/MarkVsFetchDone variants; parse now ends on an END-bit exec clause. 2. exec/loop `address` is an absolute instruction-triple index from shader dword 0, but indexed our post-CF `instructions` slice directly (`ucode/mod.rs`). Rebase addresses by the CF triple count so `address*3` lands on the right instruction. 3. Fetch instruction bitfields were wrong (`ucode/fetch.rs`): `const_index` read from bit 5 (actually `src_reg`) instead of bit 20, and texture `dimension` from dword1 instead of dword2 bit14. The logo's `tfetch ..,tf0` was read as `tf1`, whose empty fetch-constant failed to decode → no texture. Also the `sequence` fetch/ALU bit is bit[0] of each pair, not bit[1] (`shader_metrics.rs`, `translator.rs`, `xenos_interp.wgsl`). Result (--gpu-inline, deterministic 2x): the active PS's `tfetch_slots` now resolves slot 0, the tf0 fetch-constant decodes (fmt K8888), and `gpu.texture.decode` fires (137x at -n 50M; texture_cache_entries 0→1, the only golden field that changed — all draw/swap counts unchanged). The same fixes correct the WGSL uber-shader's fetch/CF walk for the threaded/--ui path. Added a regression test that parses the real E59B2B3D microcode and asserts a tfetch slot is found. Golden re-baselined (texture_cache_entries 0→1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -48,6 +48,9 @@ pub mod cf_kind {
|
||||
pub const COND_JMP: u32 = 6;
|
||||
pub const COND_CALL: u32 = 7;
|
||||
pub const RETURN: u32 = 8;
|
||||
/// Non-executing CF clause: `kNop` padding or `kMarkVsFetchDone` hint.
|
||||
/// The WGSL CF walker treats this as a no-op (advance, do not reject).
|
||||
pub const NOP: u32 = 9;
|
||||
pub const UNKNOWN: u32 = 15;
|
||||
}
|
||||
|
||||
@@ -136,6 +139,7 @@ fn encode_cf(c: ControlFlowInstruction) -> (u32, u32, u32) {
|
||||
}
|
||||
CondCall { target } => (cf_kind::COND_CALL, target, 0),
|
||||
Return => (cf_kind::RETURN, 0, 0),
|
||||
Nop | MarkVsFetchDone => (cf_kind::NOP, 0, 0),
|
||||
Unknown { opcode } => (cf_kind::UNKNOWN, opcode as u32, 0),
|
||||
}
|
||||
}
|
||||
@@ -164,9 +168,11 @@ pub struct ParsedShader {
|
||||
}
|
||||
|
||||
/// Decode a shader blob. `raw_dwords` is a host-endian slice of the entire
|
||||
/// microcode buffer (control flow + instructions). Heuristic: CF dword count
|
||||
/// is encoded in the first word's low 12 bits of the last exec clause —
|
||||
/// canary iterates until it hits a clause of kind `Exit`. We do the same.
|
||||
/// microcode buffer (control flow + instructions). The CF block is implicitly
|
||||
/// bounded: we walk clause-pair rows until one terminates the shader (an
|
||||
/// `Exec`/`CondExec` clause with the END bit set, per Xenos). Everything after
|
||||
/// that row is the instruction block; exec/loop addresses are then rebased to
|
||||
/// be relative to it.
|
||||
pub fn parse_shader(raw_dwords: &[u32]) -> ParsedShader {
|
||||
let mut cf = Vec::new();
|
||||
// CF clauses are 48-bit (word1 lo 16 + word0 = 48 or so per canary's
|
||||
@@ -175,22 +181,50 @@ pub fn parse_shader(raw_dwords: &[u32]) -> ParsedShader {
|
||||
while i + 2 < raw_dwords.len() {
|
||||
let a = decode_cf_pair(raw_dwords[i], raw_dwords[i + 1], raw_dwords[i + 2]);
|
||||
let (first, second) = a;
|
||||
let seen_exit = matches!(
|
||||
first,
|
||||
ControlFlowInstruction::Exit | ControlFlowInstruction::Unknown { .. }
|
||||
) || matches!(
|
||||
second,
|
||||
ControlFlowInstruction::Exit | ControlFlowInstruction::Unknown { .. }
|
||||
);
|
||||
// The CF block ends after the clause that terminates the shader: an
|
||||
// `Exec` with the END bit set (Xenos `kExecEnd`/`kCondExec*End`), a
|
||||
// synthetic `Exit`, or an `Unknown` opcode (decode ran off the CF
|
||||
// block into instruction data — stop defensively). `Nop` padding
|
||||
// does NOT terminate. (Previously this stopped on the first `Exit`,
|
||||
// but with the corrected opcode table opcode 1 is `kExec`, not exit,
|
||||
// so real exec clauses kept the parse going as intended.)
|
||||
let terminates = |cf: &ControlFlowInstruction| {
|
||||
matches!(
|
||||
cf,
|
||||
ControlFlowInstruction::Exec { is_end: true, .. }
|
||||
| ControlFlowInstruction::Exit
|
||||
| ControlFlowInstruction::Unknown { .. }
|
||||
)
|
||||
};
|
||||
let seen_end = terminates(&first) || terminates(&second);
|
||||
cf.push(first);
|
||||
cf.push(second);
|
||||
i += 3;
|
||||
if seen_exit {
|
||||
if seen_end {
|
||||
break;
|
||||
}
|
||||
}
|
||||
// Everything after `i` dwords is the instruction block.
|
||||
let instructions = raw_dwords[i..].to_vec();
|
||||
// Xenos exec/loop `address` fields are absolute instruction-triple indices
|
||||
// counted from shader dword 0, but `instructions` here begins *after* the
|
||||
// CF block. Rebase those addresses to be relative to the instruction block
|
||||
// (subtract the CF triple count) so `address * 3` indexes `instructions`
|
||||
// directly. (Without this, every exec read 3 dwords too far per CF triple —
|
||||
// the publisher-logo `tfetch` triple was skipped → flat splash.)
|
||||
let cf_triples = (i / 3) as u32;
|
||||
for clause in cf.iter_mut() {
|
||||
match clause {
|
||||
ControlFlowInstruction::Exec { address, .. } => {
|
||||
*address = address.saturating_sub(cf_triples);
|
||||
}
|
||||
ControlFlowInstruction::LoopStart { address, .. }
|
||||
| ControlFlowInstruction::LoopEnd { address, .. } => {
|
||||
*address = address.saturating_sub(cf_triples);
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
ParsedShader { cf, instructions }
|
||||
}
|
||||
|
||||
@@ -235,15 +269,19 @@ mod tests {
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn trivial_exit_clause_stops_parsing() {
|
||||
// Two clauses: [NOP (kind=0), EXIT (kind=1)] encoded per canary.
|
||||
// Exit clause is opcode 1 in the top 4 bits of the upper 16 bits.
|
||||
let w0 = 0u32; // clause A body
|
||||
let w1 = (1u32 << 12) << 16; // upper 16 bits = 0x1000 → opcode=1 (EXIT) for clause A
|
||||
let w2 = 0u32;
|
||||
let p = parse_shader(&[w0, w1, w2, 0xDEAD_BEEF]);
|
||||
fn exec_end_clause_stops_parsing() {
|
||||
// Row: clause B = kExecEnd (opcode 2) terminates the CF block.
|
||||
// 48-bit payload of B occupies hi16(word1) + word2; opcode lives in
|
||||
// bits 44..47 of that payload. Put opcode 2 there: payload bit 44 set
|
||||
// for the `2` → (2 << 44). In B's framing, bits 16..47 come from
|
||||
// word2, so word2 bit (44-16)=28 region holds the opcode nibble.
|
||||
let b_payload: u64 = 2u64 << 44; // kExecEnd
|
||||
// B = lo16 from hi16(word1), hi from word2. Reconstruct word1/word2.
|
||||
let word1 = ((b_payload & 0xFFFF) as u32) << 16; // B's low 16 bits → hi16(word1)
|
||||
let word2 = ((b_payload >> 16) & 0xFFFF_FFFF) as u32;
|
||||
let p = parse_shader(&[0, word1, word2, 0xDEAD_BEEF]);
|
||||
assert!(!p.cf.is_empty());
|
||||
// Exit detected → remaining dword is instruction data.
|
||||
// ExecEnd detected in the first row → remaining dword is instruction data.
|
||||
assert_eq!(p.instructions, vec![0xDEAD_BEEF]);
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user