fix(gpu): GPUBUG-101 — decode src1/2/3_sel temp-vs-constant selector
Per canary AluInstruction layout (xenia-canary/src/xenia/gpu/ucode.h:
2078-2086), word-0 bits 29-31 are the per-operand `srcN_sel` flags
selecting temp register (1) vs ALU constant (0); the corresponding
8-bit src byte indexes either:
- a temp register (bits 5:0 = index, bits 6/7 reserved for
relative-addressing / abs flags consumed by Phase D2), or
- an ALU constant (full 8-bit index).
Pre-fix, the WGSL interpreter and AOT translator both masked `& 0x7F`
on the src byte and emitted `r[low7]` regardless of the operand class.
Every shader's WVP matrix / light constant / per-frame uniform read
came back as r[low7] — typically zero — yielding invisible rendering.
Mechanical changes:
- crates/xenia-gpu/src/ucode/alu.rs: decode src_a_is_temp /
src_b_is_temp / src_c_is_temp from w0 bits 29/30/31. Note that our
src_a (low byte of w0) is canary's third operand, hence its selector
is bit 29 (canary src3_sel), not bit 31.
- crates/xenia-gpu/src/shaders/xenos_interp.wgsl: `read_src` now takes
the is_temp flag; constants index xenos_consts.alu directly.
- crates/xenia-gpu/src/translator.rs: `src_operand` mirrors the
interpreter — `r[idx]` when temp, `xenos_consts.alu[idx]` when
constant.
The trivial-shader synthetic test was updated to set the temp flags so
its `r[0u] = (r[0u] + r[0u])` assertion remains valid; without the
flags set, all sources would now resolve as constants.
Bank-selection (cf-level relative addressing for higher banks of the
512 ALU constants) remains a Phase G+ extension — covers c0..c127
in bank 0, which most Sylpheed shaders use directly.
Verification at -n 100M lockstep:
swaps: 2 → 2 (unchanged — gated by D2/D3/E for draws)
draws: 0 → 0
packets: ~61M (within noise)
Tests: 552 → 554 (+2 translator tests for the temp/constant decode).
Closes GPUBUG-101 (P0).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -29,10 +29,22 @@ pub struct AluInstruction {
|
||||
pub vector_dest_is_export: bool,
|
||||
/// Selects `ps` (previous scalar result) as the scalar operand when set.
|
||||
pub scalar_src_is_ps: bool,
|
||||
/// Source register indices (at most 3 for vector ops).
|
||||
/// Source register indices (at most 3 for vector ops). The src bytes
|
||||
/// are the canary `srcN_reg` fields (8 bits each); for **temp-typed**
|
||||
/// operands (see `src_a_is_temp` etc.), bit 7 of the byte is the abs
|
||||
/// flag and bit 6 is the loop-relative flag — bits 5:0 give the temp
|
||||
/// index. For **constant-typed** operands the full byte is the
|
||||
/// constant index.
|
||||
pub src_a: u8,
|
||||
pub src_b: u8,
|
||||
pub src_c: u8,
|
||||
/// Per-operand "is temporary" flag — when true, the corresponding
|
||||
/// `src_X` byte indexes a general register (r#); when false, it
|
||||
/// indexes an ALU constant (c#). Decoded from word-0 bits 29-31
|
||||
/// (canary's `src3_sel`/`src2_sel`/`src1_sel`). GPUBUG-101.
|
||||
pub src_a_is_temp: bool,
|
||||
pub src_b_is_temp: bool,
|
||||
pub src_c_is_temp: bool,
|
||||
/// Set when the instruction is predicated; skipped if the predicate
|
||||
/// doesn't match `predicate_condition`.
|
||||
pub predicated: bool,
|
||||
@@ -59,6 +71,13 @@ pub fn decode_alu(words: [u32; 3]) -> AluInstruction {
|
||||
src_a: (w0 & 0xFF) as u8,
|
||||
src_b: ((w0 >> 8) & 0xFF) as u8,
|
||||
src_c: ((w0 >> 16) & 0xFF) as u8,
|
||||
// Word-0 bits 29-31 are the per-operand temp-vs-constant
|
||||
// selector (canary `src3_sel`/`src2_sel`/`src1_sel`,
|
||||
// ucode.h:2078-2086). Our `src_a` is canary's third operand
|
||||
// (low byte of w0), so its selector is bit 29.
|
||||
src_a_is_temp: ((w0 >> 29) & 1) != 0,
|
||||
src_b_is_temp: ((w0 >> 30) & 1) != 0,
|
||||
src_c_is_temp: ((w0 >> 31) & 1) != 0,
|
||||
predicated: ((w0 >> 27) & 1) != 0,
|
||||
predicate_condition: ((w0 >> 28) & 1) != 0,
|
||||
raw: words,
|
||||
|
||||
Reference in New Issue
Block a user