[iterate-3M] Fix Xenos shader CF/fetch decode so the textured logo binds
The publisher splash (title idx0) rendered FLAT in ours while canary samples a texture: ours never decoded the logo's textured pixel shader (E59B2B3D, a `tfetch2D` sprite) even though our guest IM_LOADs the exact same microcode canary does (verified byte-identical against the Wine oracle). The shader was misparsed as flat. Three coupled bugs in the ucode decoder, all off vs canary `gpu/ucode.h`: 1. CF opcode table was off-by-one (`control_flow.rs`): mapped opcode 0→Exec and 1→Exit, but Xenos has 0=kNop, 1=kExec, 2=kExecEnd, 3..6/13..14 the cond-exec variants, 7/8 loop, 9/10 call/return, 11 condjmp, 12 alloc, 15 mark-vs-fetch-done. So a real `kExec` clause was read as a terminal `Exit`, truncating the CF block and dropping every instruction (incl. the `tfetch`) after it. Added Nop/MarkVsFetchDone variants; parse now ends on an END-bit exec clause. 2. exec/loop `address` is an absolute instruction-triple index from shader dword 0, but indexed our post-CF `instructions` slice directly (`ucode/mod.rs`). Rebase addresses by the CF triple count so `address*3` lands on the right instruction. 3. Fetch instruction bitfields were wrong (`ucode/fetch.rs`): `const_index` read from bit 5 (actually `src_reg`) instead of bit 20, and texture `dimension` from dword1 instead of dword2 bit14. The logo's `tfetch ..,tf0` was read as `tf1`, whose empty fetch-constant failed to decode → no texture. Also the `sequence` fetch/ALU bit is bit[0] of each pair, not bit[1] (`shader_metrics.rs`, `translator.rs`, `xenos_interp.wgsl`). Result (--gpu-inline, deterministic 2x): the active PS's `tfetch_slots` now resolves slot 0, the tf0 fetch-constant decodes (fmt K8888), and `gpu.texture.decode` fires (137x at -n 50M; texture_cache_entries 0→1, the only golden field that changed — all draw/swap counts unchanged). The same fixes correct the WGSL uber-shader's fetch/CF walk for the threaded/--ui path. Added a regression test that parses the real E59B2B3D microcode and asserts a tfetch slot is found. Golden re-baselined (texture_cache_entries 0→1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -43,7 +43,15 @@ pub enum ControlFlowInstruction {
|
||||
Return,
|
||||
/// `kAlloc` — pre-allocate export registers (position, interpolators, colors).
|
||||
Alloc { size: u32, kind: AllocKind },
|
||||
/// Exit the shader (terminal).
|
||||
/// `kNop` — fills space in the CF block; executes nothing, does not end
|
||||
/// the shader. (Xenos opcode 0.)
|
||||
Nop,
|
||||
/// `kMarkVsFetchDone` — hint that no more vertex fetches will be performed.
|
||||
/// (Xenos opcode 15.) Non-terminating.
|
||||
MarkVsFetchDone,
|
||||
/// Exit the shader (terminal). Synthesized — Xenos has no dedicated exit
|
||||
/// opcode; the shader ends after an `Exec`/`CondExec` clause with the
|
||||
/// END bit set (`is_end`). Retained for callers/tests that reference it.
|
||||
Exit,
|
||||
/// Unknown / unhandled opcode.
|
||||
Unknown { opcode: u8 },
|
||||
@@ -93,37 +101,45 @@ fn decode_single(payload: u64) -> ControlFlowInstruction {
|
||||
let predicated = ((payload >> 28) & 1) != 0;
|
||||
let predicate_condition = ((payload >> 29) & 1) != 0;
|
||||
|
||||
// Xenos `ControlFlowOpcode` (canary `ucode.h:86-160`):
|
||||
// 0 kNop, 1 kExec, 2 kExecEnd, 3 kCondExec, 4 kCondExecEnd,
|
||||
// 5 kCondExecPred, 6 kCondExecPredEnd, 7 kLoopStart, 8 kLoopEnd,
|
||||
// 9 kCondCall, 10 kReturn, 11 kCondJmp, 12 kAlloc,
|
||||
// 13 kCondExecPredClean, 14 kCondExecPredCleanEnd, 15 kMarkVsFetchDone.
|
||||
// All exec variants share the address(12)/count(3)/sequence(12) layout
|
||||
// of `ControlFlowExecInstruction`; the `*End` variants terminate the
|
||||
// shader. (Prior table was off-by-one — it mapped 0→Exec and 1→Exit,
|
||||
// so a real `kExec` clause was misread as a terminal `Exit`, truncating
|
||||
// the CF block and dropping every `tfetch` in it.)
|
||||
let exec = |is_end: bool| ControlFlowInstruction::Exec {
|
||||
address: (payload & 0xFFF) as u32,
|
||||
count: ((payload >> 12) & 0x7) as u32,
|
||||
sequence: ((payload >> 16) & 0xFFF) as u32,
|
||||
is_end,
|
||||
predicated,
|
||||
predicate_condition,
|
||||
};
|
||||
match opcode {
|
||||
0 => ControlFlowInstruction::Exec {
|
||||
address: (payload & 0xFFF) as u32,
|
||||
count: ((payload >> 12) & 0x7) as u32,
|
||||
sequence: ((payload >> 16) & 0xFFF) as u32,
|
||||
is_end: false,
|
||||
predicated,
|
||||
predicate_condition,
|
||||
},
|
||||
1 => ControlFlowInstruction::Exit,
|
||||
2 => ControlFlowInstruction::Exec {
|
||||
address: (payload & 0xFFF) as u32,
|
||||
count: ((payload >> 12) & 0x7) as u32,
|
||||
sequence: ((payload >> 16) & 0xFFF) as u32,
|
||||
is_end: true,
|
||||
predicated,
|
||||
predicate_condition,
|
||||
},
|
||||
6 => ControlFlowInstruction::LoopStart {
|
||||
0 => ControlFlowInstruction::Nop,
|
||||
1 => exec(false),
|
||||
2 => exec(true),
|
||||
3 => exec(false),
|
||||
4 => exec(true),
|
||||
5 => exec(false),
|
||||
6 => exec(true),
|
||||
7 => ControlFlowInstruction::LoopStart {
|
||||
address: (payload & 0x3FF) as u32,
|
||||
loop_id: ((payload >> 16) & 0x1F) as u32,
|
||||
},
|
||||
7 => ControlFlowInstruction::LoopEnd {
|
||||
8 => ControlFlowInstruction::LoopEnd {
|
||||
address: (payload & 0x3FF) as u32,
|
||||
loop_id: ((payload >> 16) & 0x1F) as u32,
|
||||
},
|
||||
8 => ControlFlowInstruction::CondCall {
|
||||
9 => ControlFlowInstruction::CondCall {
|
||||
target: (payload & 0x3FF) as u32,
|
||||
},
|
||||
9 => ControlFlowInstruction::Return,
|
||||
10 => ControlFlowInstruction::CondJmp {
|
||||
10 => ControlFlowInstruction::Return,
|
||||
11 => ControlFlowInstruction::CondJmp {
|
||||
target: (payload & 0x3FF) as u32,
|
||||
predicated,
|
||||
predicate_condition,
|
||||
@@ -132,6 +148,9 @@ fn decode_single(payload: u64) -> ControlFlowInstruction {
|
||||
size: (payload & 0x7) as u32,
|
||||
kind: AllocKind::from_bits(((payload >> 4) & 0x7) as u32),
|
||||
},
|
||||
13 => exec(false),
|
||||
14 => exec(true),
|
||||
15 => ControlFlowInstruction::MarkVsFetchDone,
|
||||
other => ControlFlowInstruction::Unknown { opcode: other },
|
||||
}
|
||||
}
|
||||
@@ -141,12 +160,49 @@ mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn opcode_exit_decodes() {
|
||||
// opcode 1 (Exit) in bits 44..47 of A's 48-bit payload.
|
||||
fn opcode_nop_and_exec_decode() {
|
||||
// Xenos opcode 0 = kNop (non-terminating padding).
|
||||
let payload: u64 = 0u64 << 44;
|
||||
let (hi, lo) = ((payload & 0xFFFF_FFFF) as u32, ((payload >> 32) & 0xFFFF) as u32);
|
||||
assert_eq!(decode_cf_pair(hi, lo, 0).0, ControlFlowInstruction::Nop);
|
||||
// Xenos opcode 1 = kExec (executes instructions; NOT a terminal exit).
|
||||
let payload: u64 = 1u64 << 44;
|
||||
let (hi, lo) = ((payload & 0xFFFF_FFFF) as u32, ((payload >> 32) & 0xFFFF) as u32);
|
||||
let cf = decode_cf_pair(hi, lo, 0).0;
|
||||
assert_eq!(cf, ControlFlowInstruction::Exit);
|
||||
match decode_cf_pair(hi, lo, 0).0 {
|
||||
ControlFlowInstruction::Exec { is_end, .. } => assert!(!is_end),
|
||||
other => panic!("opcode 1 should be non-end Exec, got {other:?}"),
|
||||
}
|
||||
// Xenos opcode 15 = kMarkVsFetchDone (non-terminating hint).
|
||||
let payload: u64 = 15u64 << 44;
|
||||
let (hi, lo) = ((payload & 0xFFFF_FFFF) as u32, ((payload >> 32) & 0xFFFF) as u32);
|
||||
assert_eq!(
|
||||
decode_cf_pair(hi, lo, 0).0,
|
||||
ControlFlowInstruction::MarkVsFetchDone
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn real_logo_shader_has_tfetch_clauses() {
|
||||
// The publisher-logo pixel shader E59B2B3DA4AA9008 (captured from the
|
||||
// canary oracle, byte-identical to the microcode our guest IM_LOADs).
|
||||
// Regression for iterate-3M: the old off-by-one opcode table decoded
|
||||
// its leading `kExec` (opcode 1) as a terminal `Exit`, truncating the
|
||||
// CF block so the `tfetch2D` never appeared → flat splash.
|
||||
let ucode: [u32; 24] = [
|
||||
0x00011002, 0x00001200, 0xC4000000, 0x00004003, 0x00002200, 0x00000000,
|
||||
0x10082021, 0x1F1FF688, 0x00004000, 0xC8080001, 0x001B1B00, 0xC1020000,
|
||||
0xC8070000, 0x00C0C000, 0xC1020000, 0xC8070001, 0x00C01B00, 0xC1000100,
|
||||
0xC80F8000, 0x00000000, 0xC2010100, 0x00000000, 0x00000000, 0x00000000,
|
||||
];
|
||||
let p = crate::ucode::parse_shader(&ucode);
|
||||
let exec_clauses = p
|
||||
.cf
|
||||
.iter()
|
||||
.filter(|c| matches!(c, ControlFlowInstruction::Exec { .. }))
|
||||
.count();
|
||||
assert!(exec_clauses >= 1, "expected >=1 Exec clause, cf={:?}", p.cf);
|
||||
let slots = crate::shader_metrics::tfetch_slots(&p);
|
||||
assert!(!slots.is_empty(), "expected tfetch slots, got none; cf={:?}", p.cf);
|
||||
}
|
||||
|
||||
#[test]
|
||||
|
||||
Reference in New Issue
Block a user