[iterate-3M] Fix Xenos shader CF/fetch decode so the textured logo binds

The publisher splash (title idx0) rendered FLAT in ours while canary samples
a texture: ours never decoded the logo's textured pixel shader
(E59B2B3D, a `tfetch2D` sprite) even though our guest IM_LOADs the exact same
microcode canary does (verified byte-identical against the Wine oracle). The
shader was misparsed as flat. Three coupled bugs in the ucode decoder, all
off vs canary `gpu/ucode.h`:

1. CF opcode table was off-by-one (`control_flow.rs`): mapped opcode 0→Exec
   and 1→Exit, but Xenos has 0=kNop, 1=kExec, 2=kExecEnd, 3..6/13..14 the
   cond-exec variants, 7/8 loop, 9/10 call/return, 11 condjmp, 12 alloc,
   15 mark-vs-fetch-done. So a real `kExec` clause was read as a terminal
   `Exit`, truncating the CF block and dropping every instruction (incl. the
   `tfetch`) after it. Added Nop/MarkVsFetchDone variants; parse now ends on
   an END-bit exec clause.

2. exec/loop `address` is an absolute instruction-triple index from shader
   dword 0, but indexed our post-CF `instructions` slice directly
   (`ucode/mod.rs`). Rebase addresses by the CF triple count so `address*3`
   lands on the right instruction.

3. Fetch instruction bitfields were wrong (`ucode/fetch.rs`): `const_index`
   read from bit 5 (actually `src_reg`) instead of bit 20, and texture
   `dimension` from dword1 instead of dword2 bit14. The logo's `tfetch ..,tf0`
   was read as `tf1`, whose empty fetch-constant failed to decode → no
   texture. Also the `sequence` fetch/ALU bit is bit[0] of each pair, not
   bit[1] (`shader_metrics.rs`, `translator.rs`, `xenos_interp.wgsl`).

Result (--gpu-inline, deterministic 2x): the active PS's `tfetch_slots` now
resolves slot 0, the tf0 fetch-constant decodes (fmt K8888), and
`gpu.texture.decode` fires (137x at -n 50M; texture_cache_entries 0→1, the
only golden field that changed — all draw/swap counts unchanged). The same
fixes correct the WGSL uber-shader's fetch/CF walk for the threaded/--ui path.

Added a regression test that parses the real E59B2B3D microcode and asserts a
tfetch slot is found. Golden re-baselined (texture_cache_entries 0→1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-17 21:53:35 +02:00
parent 3f5d5cf5f7
commit 6bb4355e3d
7 changed files with 205 additions and 84 deletions

View File

@@ -48,6 +48,9 @@ pub mod cf_kind {
pub const COND_JMP: u32 = 6;
pub const COND_CALL: u32 = 7;
pub const RETURN: u32 = 8;
/// Non-executing CF clause: `kNop` padding or `kMarkVsFetchDone` hint.
/// The WGSL CF walker treats this as a no-op (advance, do not reject).
pub const NOP: u32 = 9;
pub const UNKNOWN: u32 = 15;
}
@@ -136,6 +139,7 @@ fn encode_cf(c: ControlFlowInstruction) -> (u32, u32, u32) {
}
CondCall { target } => (cf_kind::COND_CALL, target, 0),
Return => (cf_kind::RETURN, 0, 0),
Nop | MarkVsFetchDone => (cf_kind::NOP, 0, 0),
Unknown { opcode } => (cf_kind::UNKNOWN, opcode as u32, 0),
}
}
@@ -164,9 +168,11 @@ pub struct ParsedShader {
}
/// Decode a shader blob. `raw_dwords` is a host-endian slice of the entire
/// microcode buffer (control flow + instructions). Heuristic: CF dword count
/// is encoded in the first word's low 12 bits of the last exec clause —
/// canary iterates until it hits a clause of kind `Exit`. We do the same.
/// microcode buffer (control flow + instructions). The CF block is implicitly
/// bounded: we walk clause-pair rows until one terminates the shader (an
/// `Exec`/`CondExec` clause with the END bit set, per Xenos). Everything after
/// that row is the instruction block; exec/loop addresses are then rebased to
/// be relative to it.
pub fn parse_shader(raw_dwords: &[u32]) -> ParsedShader {
let mut cf = Vec::new();
// CF clauses are 48-bit (word1 lo 16 + word0 = 48 or so per canary's
@@ -175,22 +181,50 @@ pub fn parse_shader(raw_dwords: &[u32]) -> ParsedShader {
while i + 2 < raw_dwords.len() {
let a = decode_cf_pair(raw_dwords[i], raw_dwords[i + 1], raw_dwords[i + 2]);
let (first, second) = a;
let seen_exit = matches!(
first,
ControlFlowInstruction::Exit | ControlFlowInstruction::Unknown { .. }
) || matches!(
second,
ControlFlowInstruction::Exit | ControlFlowInstruction::Unknown { .. }
);
// The CF block ends after the clause that terminates the shader: an
// `Exec` with the END bit set (Xenos `kExecEnd`/`kCondExec*End`), a
// synthetic `Exit`, or an `Unknown` opcode (decode ran off the CF
// block into instruction data — stop defensively). `Nop` padding
// does NOT terminate. (Previously this stopped on the first `Exit`,
// but with the corrected opcode table opcode 1 is `kExec`, not exit,
// so real exec clauses kept the parse going as intended.)
let terminates = |cf: &ControlFlowInstruction| {
matches!(
cf,
ControlFlowInstruction::Exec { is_end: true, .. }
| ControlFlowInstruction::Exit
| ControlFlowInstruction::Unknown { .. }
)
};
let seen_end = terminates(&first) || terminates(&second);
cf.push(first);
cf.push(second);
i += 3;
if seen_exit {
if seen_end {
break;
}
}
// Everything after `i` dwords is the instruction block.
let instructions = raw_dwords[i..].to_vec();
// Xenos exec/loop `address` fields are absolute instruction-triple indices
// counted from shader dword 0, but `instructions` here begins *after* the
// CF block. Rebase those addresses to be relative to the instruction block
// (subtract the CF triple count) so `address * 3` indexes `instructions`
// directly. (Without this, every exec read 3 dwords too far per CF triple —
// the publisher-logo `tfetch` triple was skipped → flat splash.)
let cf_triples = (i / 3) as u32;
for clause in cf.iter_mut() {
match clause {
ControlFlowInstruction::Exec { address, .. } => {
*address = address.saturating_sub(cf_triples);
}
ControlFlowInstruction::LoopStart { address, .. }
| ControlFlowInstruction::LoopEnd { address, .. } => {
*address = address.saturating_sub(cf_triples);
}
_ => {}
}
}
ParsedShader { cf, instructions }
}
@@ -235,15 +269,19 @@ mod tests {
}
#[test]
fn trivial_exit_clause_stops_parsing() {
// Two clauses: [NOP (kind=0), EXIT (kind=1)] encoded per canary.
// Exit clause is opcode 1 in the top 4 bits of the upper 16 bits.
let w0 = 0u32; // clause A body
let w1 = (1u32 << 12) << 16; // upper 16 bits = 0x1000 → opcode=1 (EXIT) for clause A
let w2 = 0u32;
let p = parse_shader(&[w0, w1, w2, 0xDEAD_BEEF]);
fn exec_end_clause_stops_parsing() {
// Row: clause B = kExecEnd (opcode 2) terminates the CF block.
// 48-bit payload of B occupies hi16(word1) + word2; opcode lives in
// bits 44..47 of that payload. Put opcode 2 there: payload bit 44 set
// for the `2` → (2 << 44). In B's framing, bits 16..47 come from
// word2, so word2 bit (44-16)=28 region holds the opcode nibble.
let b_payload: u64 = 2u64 << 44; // kExecEnd
// B = lo16 from hi16(word1), hi from word2. Reconstruct word1/word2.
let word1 = ((b_payload & 0xFFFF) as u32) << 16; // B's low 16 bits → hi16(word1)
let word2 = ((b_payload >> 16) & 0xFFFF_FFFF) as u32;
let p = parse_shader(&[0, word1, word2, 0xDEAD_BEEF]);
assert!(!p.cf.is_empty());
// Exit detected → remaining dword is instruction data.
// ExecEnd detected in the first row → remaining dword is instruction data.
assert_eq!(p.instructions, vec![0xDEAD_BEEF]);
}
}