fix(gpu): GPUBUG-101 — decode src1/2/3_sel temp-vs-constant selector

Per canary AluInstruction layout (xenia-canary/src/xenia/gpu/ucode.h:
2078-2086), word-0 bits 29-31 are the per-operand `srcN_sel` flags
selecting temp register (1) vs ALU constant (0); the corresponding
8-bit src byte indexes either:
  - a temp register (bits 5:0 = index, bits 6/7 reserved for
    relative-addressing / abs flags consumed by Phase D2), or
  - an ALU constant (full 8-bit index).

Pre-fix, the WGSL interpreter and AOT translator both masked `& 0x7F`
on the src byte and emitted `r[low7]` regardless of the operand class.
Every shader's WVP matrix / light constant / per-frame uniform read
came back as r[low7] — typically zero — yielding invisible rendering.

Mechanical changes:
- crates/xenia-gpu/src/ucode/alu.rs: decode src_a_is_temp /
  src_b_is_temp / src_c_is_temp from w0 bits 29/30/31. Note that our
  src_a (low byte of w0) is canary's third operand, hence its selector
  is bit 29 (canary src3_sel), not bit 31.
- crates/xenia-gpu/src/shaders/xenos_interp.wgsl: `read_src` now takes
  the is_temp flag; constants index xenos_consts.alu directly.
- crates/xenia-gpu/src/translator.rs: `src_operand` mirrors the
  interpreter — `r[idx]` when temp, `xenos_consts.alu[idx]` when
  constant.

The trivial-shader synthetic test was updated to set the temp flags so
its `r[0u] = (r[0u] + r[0u])` assertion remains valid; without the
flags set, all sources would now resolve as constants.

Bank-selection (cf-level relative addressing for higher banks of the
512 ALU constants) remains a Phase G+ extension — covers c0..c127
in bank 0, which most Sylpheed shaders use directly.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (unchanged — gated by D2/D3/E for draws)
  draws:                0 → 0
  packets:              ~61M (within noise)
Tests: 552 → 554 (+2 translator tests for the temp/constant decode).

Closes GPUBUG-101 (P0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-03 14:10:11 +02:00
parent 1b74db6fa7
commit 78ea81c12a
3 changed files with 139 additions and 15 deletions

View File

@@ -233,11 +233,24 @@ const SOP_SIN: u32 = 48u;
const SOP_COS: u32 = 49u;
const SOP_RETAIN_PREV: u32 = 50u;
// Read a vec4 source from the register file. Treats the src index as a
// direct r# reference (ignores c# selector + swizzle/modifiers for MVP).
// M4+ will extend this to decode the full operand header.
fn read_src(idx: u32) -> vec4<f32> {
return registers[idx & 0x7Fu];
// Read a vec4 source. Per canary `xenia-canary/src/xenia/gpu/ucode.h`
// the temp-vs-constant selector lives in word-0 bits 29-31
// (`srcN_sel`); the corresponding 8-bit src_byte is **either** a temp
// register descriptor (bit 7 = abs flag, bit 6 = relative, bits 5:0 =
// temp index) **or** a flat constant index (full byte). Pre-fix, the
// MVP masked `& 0x7F` and read `registers[low7]` regardless — every
// shader's WVP matrix / light constant / per-frame uniform read came
// back as r[low7] (typically zero → invisible rendering). GPUBUG-101.
fn read_src(src_byte: u32, is_temp: bool) -> vec4<f32> {
if is_temp {
// Bits 5:0 of the byte give the temp index; bit 7 (abs) and
// bit 6 (relative) are handled in read_src_full when modifiers
// land in Phase D2.
return registers[src_byte & 0x3Fu];
}
// Constant index — full byte (covers c0..c127 in bank 0; higher
// banks via cf-level relative addressing land in a later phase).
return xenos_consts.alu[src_byte & 0xFFu];
}
fn exec_vector_op(op: u32, a: vec4<f32>, b: vec4<f32>, c: vec4<f32>) -> vec4<f32> {
@@ -520,11 +533,17 @@ fn interpret_alu(t: u32, is_vertex: bool) {
let src_a = w0 & 0xFFu;
let src_b = (w0 >> 8u) & 0xFFu;
let src_c = (w0 >> 16u) & 0xFFu;
// GPUBUG-101: word-0 bits 29-31 are the per-operand temp-vs-constant
// selectors (canary `srcN_sel`, ucode.h:2078-2086). `src_a` is
// canary's third operand (low byte), so its selector is bit 29.
let src_a_is_temp = ((w0 >> 29u) & 1u) != 0u;
let src_b_is_temp = ((w0 >> 30u) & 1u) != 0u;
let src_c_is_temp = ((w0 >> 31u) & 1u) != 0u;
let predicated = ((w0 >> 27u) & 1u) != 0u;
let predicate_condition = ((w0 >> 28u) & 1u) != 0u;
let scalar_src_is_ps = ((w0 >> 26u) & 1u) != 0u;
// `w1` holds per-operand swizzle + negate/abs/c-vs-r flags. The MVP
// treats every source as a full r#, no modifiers — M4+ decodes it.
// `w1` holds per-operand swizzle + negate/abs flags. Phase D2 decodes
// them; Phase D1 only resolved the temp/constant selector.
_ = w1;
// Honor per-instruction predicate: skip when predicated and the
@@ -534,9 +553,9 @@ fn interpret_alu(t: u32, is_vertex: bool) {
}
// Vector pipe.
let a = read_src(src_a);
let b = read_src(src_b);
let c = read_src(src_c);
let a = read_src(src_a, src_a_is_temp);
let b = read_src(src_b, src_b_is_temp);
let c = read_src(src_c, src_c_is_temp);
let vec_result = exec_vector_op(vec_op, a, b, c);
if vec_wm != 0u {
write_reg_masked(vec_dst, vec_wm, vec_result);