[iterate-3M] Fix Xenos shader CF/fetch decode so the textured logo binds

The publisher splash (title idx0) rendered FLAT in ours while canary samples
a texture: ours never decoded the logo's textured pixel shader
(E59B2B3D, a `tfetch2D` sprite) even though our guest IM_LOADs the exact same
microcode canary does (verified byte-identical against the Wine oracle). The
shader was misparsed as flat. Three coupled bugs in the ucode decoder, all
off vs canary `gpu/ucode.h`:

1. CF opcode table was off-by-one (`control_flow.rs`): mapped opcode 0→Exec
   and 1→Exit, but Xenos has 0=kNop, 1=kExec, 2=kExecEnd, 3..6/13..14 the
   cond-exec variants, 7/8 loop, 9/10 call/return, 11 condjmp, 12 alloc,
   15 mark-vs-fetch-done. So a real `kExec` clause was read as a terminal
   `Exit`, truncating the CF block and dropping every instruction (incl. the
   `tfetch`) after it. Added Nop/MarkVsFetchDone variants; parse now ends on
   an END-bit exec clause.

2. exec/loop `address` is an absolute instruction-triple index from shader
   dword 0, but indexed our post-CF `instructions` slice directly
   (`ucode/mod.rs`). Rebase addresses by the CF triple count so `address*3`
   lands on the right instruction.

3. Fetch instruction bitfields were wrong (`ucode/fetch.rs`): `const_index`
   read from bit 5 (actually `src_reg`) instead of bit 20, and texture
   `dimension` from dword1 instead of dword2 bit14. The logo's `tfetch ..,tf0`
   was read as `tf1`, whose empty fetch-constant failed to decode → no
   texture. Also the `sequence` fetch/ALU bit is bit[0] of each pair, not
   bit[1] (`shader_metrics.rs`, `translator.rs`, `xenos_interp.wgsl`).

Result (--gpu-inline, deterministic 2x): the active PS's `tfetch_slots` now
resolves slot 0, the tf0 fetch-constant decodes (fmt K8888), and
`gpu.texture.decode` fires (137x at -n 50M; texture_cache_entries 0→1, the
only golden field that changed — all draw/swap counts unchanged). The same
fixes correct the WGSL uber-shader's fetch/CF walk for the threaded/--ui path.

Added a regression test that parses the real E59B2B3D microcode and asserts a
tfetch slot is found. Golden re-baselined (texture_cache_entries 0→1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-17 21:53:35 +02:00
parent 3f5d5cf5f7
commit 6bb4355e3d
7 changed files with 205 additions and 84 deletions

View File

@@ -43,7 +43,15 @@ pub enum ControlFlowInstruction {
Return,
/// `kAlloc` — pre-allocate export registers (position, interpolators, colors).
Alloc { size: u32, kind: AllocKind },
/// Exit the shader (terminal).
/// `kNop` — fills space in the CF block; executes nothing, does not end
/// the shader. (Xenos opcode 0.)
Nop,
/// `kMarkVsFetchDone` — hint that no more vertex fetches will be performed.
/// (Xenos opcode 15.) Non-terminating.
MarkVsFetchDone,
/// Exit the shader (terminal). Synthesized — Xenos has no dedicated exit
/// opcode; the shader ends after an `Exec`/`CondExec` clause with the
/// END bit set (`is_end`). Retained for callers/tests that reference it.
Exit,
/// Unknown / unhandled opcode.
Unknown { opcode: u8 },
@@ -93,37 +101,45 @@ fn decode_single(payload: u64) -> ControlFlowInstruction {
let predicated = ((payload >> 28) & 1) != 0;
let predicate_condition = ((payload >> 29) & 1) != 0;
// Xenos `ControlFlowOpcode` (canary `ucode.h:86-160`):
// 0 kNop, 1 kExec, 2 kExecEnd, 3 kCondExec, 4 kCondExecEnd,
// 5 kCondExecPred, 6 kCondExecPredEnd, 7 kLoopStart, 8 kLoopEnd,
// 9 kCondCall, 10 kReturn, 11 kCondJmp, 12 kAlloc,
// 13 kCondExecPredClean, 14 kCondExecPredCleanEnd, 15 kMarkVsFetchDone.
// All exec variants share the address(12)/count(3)/sequence(12) layout
// of `ControlFlowExecInstruction`; the `*End` variants terminate the
// shader. (Prior table was off-by-one — it mapped 0→Exec and 1→Exit,
// so a real `kExec` clause was misread as a terminal `Exit`, truncating
// the CF block and dropping every `tfetch` in it.)
let exec = |is_end: bool| ControlFlowInstruction::Exec {
address: (payload & 0xFFF) as u32,
count: ((payload >> 12) & 0x7) as u32,
sequence: ((payload >> 16) & 0xFFF) as u32,
is_end,
predicated,
predicate_condition,
};
match opcode {
0 => ControlFlowInstruction::Exec {
address: (payload & 0xFFF) as u32,
count: ((payload >> 12) & 0x7) as u32,
sequence: ((payload >> 16) & 0xFFF) as u32,
is_end: false,
predicated,
predicate_condition,
},
1 => ControlFlowInstruction::Exit,
2 => ControlFlowInstruction::Exec {
address: (payload & 0xFFF) as u32,
count: ((payload >> 12) & 0x7) as u32,
sequence: ((payload >> 16) & 0xFFF) as u32,
is_end: true,
predicated,
predicate_condition,
},
6 => ControlFlowInstruction::LoopStart {
0 => ControlFlowInstruction::Nop,
1 => exec(false),
2 => exec(true),
3 => exec(false),
4 => exec(true),
5 => exec(false),
6 => exec(true),
7 => ControlFlowInstruction::LoopStart {
address: (payload & 0x3FF) as u32,
loop_id: ((payload >> 16) & 0x1F) as u32,
},
7 => ControlFlowInstruction::LoopEnd {
8 => ControlFlowInstruction::LoopEnd {
address: (payload & 0x3FF) as u32,
loop_id: ((payload >> 16) & 0x1F) as u32,
},
8 => ControlFlowInstruction::CondCall {
9 => ControlFlowInstruction::CondCall {
target: (payload & 0x3FF) as u32,
},
9 => ControlFlowInstruction::Return,
10 => ControlFlowInstruction::CondJmp {
10 => ControlFlowInstruction::Return,
11 => ControlFlowInstruction::CondJmp {
target: (payload & 0x3FF) as u32,
predicated,
predicate_condition,
@@ -132,6 +148,9 @@ fn decode_single(payload: u64) -> ControlFlowInstruction {
size: (payload & 0x7) as u32,
kind: AllocKind::from_bits(((payload >> 4) & 0x7) as u32),
},
13 => exec(false),
14 => exec(true),
15 => ControlFlowInstruction::MarkVsFetchDone,
other => ControlFlowInstruction::Unknown { opcode: other },
}
}
@@ -141,12 +160,49 @@ mod tests {
use super::*;
#[test]
fn opcode_exit_decodes() {
// opcode 1 (Exit) in bits 44..47 of A's 48-bit payload.
fn opcode_nop_and_exec_decode() {
// Xenos opcode 0 = kNop (non-terminating padding).
let payload: u64 = 0u64 << 44;
let (hi, lo) = ((payload & 0xFFFF_FFFF) as u32, ((payload >> 32) & 0xFFFF) as u32);
assert_eq!(decode_cf_pair(hi, lo, 0).0, ControlFlowInstruction::Nop);
// Xenos opcode 1 = kExec (executes instructions; NOT a terminal exit).
let payload: u64 = 1u64 << 44;
let (hi, lo) = ((payload & 0xFFFF_FFFF) as u32, ((payload >> 32) & 0xFFFF) as u32);
let cf = decode_cf_pair(hi, lo, 0).0;
assert_eq!(cf, ControlFlowInstruction::Exit);
match decode_cf_pair(hi, lo, 0).0 {
ControlFlowInstruction::Exec { is_end, .. } => assert!(!is_end),
other => panic!("opcode 1 should be non-end Exec, got {other:?}"),
}
// Xenos opcode 15 = kMarkVsFetchDone (non-terminating hint).
let payload: u64 = 15u64 << 44;
let (hi, lo) = ((payload & 0xFFFF_FFFF) as u32, ((payload >> 32) & 0xFFFF) as u32);
assert_eq!(
decode_cf_pair(hi, lo, 0).0,
ControlFlowInstruction::MarkVsFetchDone
);
}
#[test]
fn real_logo_shader_has_tfetch_clauses() {
// The publisher-logo pixel shader E59B2B3DA4AA9008 (captured from the
// canary oracle, byte-identical to the microcode our guest IM_LOADs).
// Regression for iterate-3M: the old off-by-one opcode table decoded
// its leading `kExec` (opcode 1) as a terminal `Exit`, truncating the
// CF block so the `tfetch2D` never appeared → flat splash.
let ucode: [u32; 24] = [
0x00011002, 0x00001200, 0xC4000000, 0x00004003, 0x00002200, 0x00000000,
0x10082021, 0x1F1FF688, 0x00004000, 0xC8080001, 0x001B1B00, 0xC1020000,
0xC8070000, 0x00C0C000, 0xC1020000, 0xC8070001, 0x00C01B00, 0xC1000100,
0xC80F8000, 0x00000000, 0xC2010100, 0x00000000, 0x00000000, 0x00000000,
];
let p = crate::ucode::parse_shader(&ucode);
let exec_clauses = p
.cf
.iter()
.filter(|c| matches!(c, ControlFlowInstruction::Exec { .. }))
.count();
assert!(exec_clauses >= 1, "expected >=1 Exec clause, cf={:?}", p.cf);
let slots = crate::shader_metrics::tfetch_slots(&p);
assert!(!slots.is_empty(), "expected tfetch slots, got none; cf={:?}", p.cf);
}
#[test]