xenia-rs

fabi/xenia-rs

Fork 0

Commit Graph

Author	SHA1	Message	Date
MechaCat02	c5c6713419	fix(gpu): GPUBUG-100 — apply per-operand swizzle + negate to ALU sources Word-1 of every ALU triple holds three 8-bit component-relative swizzles (`src1_swiz`/`src2_swiz`/`src3_swiz` at bits 16-23/8-15/0-7 per canary ucode.h:2064-2066) and three per-operand negate flags (bits 24/25/26). Pre-fix, both the WGSL interpreter and the AOT translator discarded word-1 entirely with `_ = w1;` — every ALU result was missing its swizzle (broadcast/permute patterns like `.zyxw`, `.xxxx`) and any negated operand was used positive instead. Component-relative semantics (canary's `AluInstruction::GetSwizzledComponentIndex`, ucode.h:1996): for output component i, the source component is `((swizzle >> (2*i)) + i) & 3`. Identity swizzle is 0x00, NOT 0xE4 — the original `apply_swizzle` in the interpreter shader treated it as absolute, also incorrect. Mechanical changes: - crates/xenia-gpu/src/ucode/alu.rs: extend AluInstruction with src_X_swiz (u8) and src_X_negate (bool) fields. decode_alu unpacks them from word 1. - crates/xenia-gpu/src/shaders/xenos_interp.wgsl: apply_swizzle uses component-relative semantics. interpret_alu decodes the modifiers and applies via apply_swizzle + apply_modifiers (with abs=false). - crates/xenia-gpu/src/translator.rs: src_operand emits the precomputed swizzle inline as `vec4<f32>(base.x, base.y, ...)`, then wraps in `(-…)` when negated. Identity swizzle (0x00) emits a bare base expression so it round-trips with the trivial-shader fixture. Abs is omitted in this commit — the abs flag is dual-meaning (for temps it lives at bit 7 of the src byte; for constants at word-2 bit 7 `abs_constants`). Wiring it up correctly requires more careful case-split logic; deferred to Phase G. Verification at -n 100M lockstep: swaps: 2 → 2 (gated by Phase E for draws) draws: 0 → 0 packets: ~58M (within noise) Tests: 554 → 555 (+1 swizzle/negate test, no count change otherwise because identity swizzle test merged into D1's parameterised test). WGSL still validates via naga (combined_module_parses_as_wgsl). Closes GPUBUG-100 (P0). Abs deferred to Phase G. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:15:07 +02:00
MechaCat02	78ea81c12a	fix(gpu): GPUBUG-101 — decode src1/2/3_sel temp-vs-constant selector Per canary AluInstruction layout (xenia-canary/src/xenia/gpu/ucode.h: 2078-2086), word-0 bits 29-31 are the per-operand `srcN_sel` flags selecting temp register (1) vs ALU constant (0); the corresponding 8-bit src byte indexes either: - a temp register (bits 5:0 = index, bits 6/7 reserved for relative-addressing / abs flags consumed by Phase D2), or - an ALU constant (full 8-bit index). Pre-fix, the WGSL interpreter and AOT translator both masked `& 0x7F` on the src byte and emitted `r[low7]` regardless of the operand class. Every shader's WVP matrix / light constant / per-frame uniform read came back as r[low7] — typically zero — yielding invisible rendering. Mechanical changes: - crates/xenia-gpu/src/ucode/alu.rs: decode src_a_is_temp / src_b_is_temp / src_c_is_temp from w0 bits 29/30/31. Note that our src_a (low byte of w0) is canary's third operand, hence its selector is bit 29 (canary src3_sel), not bit 31. - crates/xenia-gpu/src/shaders/xenos_interp.wgsl: `read_src` now takes the is_temp flag; constants index xenos_consts.alu directly. - crates/xenia-gpu/src/translator.rs: `src_operand` mirrors the interpreter — `r[idx]` when temp, `xenos_consts.alu[idx]` when constant. The trivial-shader synthetic test was updated to set the temp flags so its `r[0u] = (r[0u] + r[0u])` assertion remains valid; without the flags set, all sources would now resolve as constants. Bank-selection (cf-level relative addressing for higher banks of the 512 ALU constants) remains a Phase G+ extension — covers c0..c127 in bank 0, which most Sylpheed shaders use directly. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged — gated by D2/D3/E for draws) draws: 0 → 0 packets: ~61M (within noise) Tests: 552 → 554 (+2 translator tests for the temp/constant decode). Closes GPUBUG-101 (P0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:10:11 +02:00
MechaCat02	79eb52c378	xenia-gpu: end-to-end Xenos pipeline (PM4, ucode, EDRAM, resolve) First real GPU implementation. Ring/PM4 frontend (ring_view, ring_drain, pm4) drains the command processor; gpu_system owns the threaded backend (DrainFence RPC + parker/fence helpers from M1) and the MMIO-mapped register block (mmio_region). Xenos shader frontend: ucode/{alu,control_flow,fetch,mod}.rs decode the Xbox 360 microcode, translator.rs lowers it onto the WGSL xenos_interp interpreter shader (shaders/xenos_interp.wgsl). shader_metrics.rs counts decode/translate work. Render state: draw_state, primitive, render_target_cache, texture_cache, tiled_address (Xenos's swizzled tiled-memory layout), xenos_constants (register field constants), edram (the 10 MiB EDRAM model with MSAA), and resolve.rs (TILE_FLUSH copy-out — clear-resolve plus bitwise-equivalent 32 bpp + 64 bpp paths landed). handle.rs owns the typed GPU-resource handles the kernel hands out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:29:38 +02:00

Author

SHA1

Message

Date

MechaCat02

c5c6713419

fix(gpu): GPUBUG-100 — apply per-operand swizzle + negate to ALU sources

Word-1 of every ALU triple holds three 8-bit component-relative
swizzles (`src1_swiz`/`src2_swiz`/`src3_swiz` at bits 16-23/8-15/0-7
per canary ucode.h:2064-2066) and three per-operand negate flags
(bits 24/25/26). Pre-fix, both the WGSL interpreter and the AOT
translator discarded word-1 entirely with `_ = w1;` — every ALU
result was missing its swizzle (broadcast/permute patterns like
`.zyxw`, `.xxxx`) and any negated operand was used positive instead.

Component-relative semantics (canary's
`AluInstruction::GetSwizzledComponentIndex`, ucode.h:1996): for output
component i, the source component is `((swizzle >> (2*i)) + i) & 3`.
Identity swizzle is 0x00, NOT 0xE4 — the original `apply_swizzle` in
the interpreter shader treated it as absolute, also incorrect.

Mechanical changes:
- crates/xenia-gpu/src/ucode/alu.rs: extend AluInstruction with
  src_X_swiz (u8) and src_X_negate (bool) fields. decode_alu unpacks
  them from word 1.
- crates/xenia-gpu/src/shaders/xenos_interp.wgsl: apply_swizzle uses
  component-relative semantics. interpret_alu decodes the modifiers
  and applies via apply_swizzle + apply_modifiers (with abs=false).
- crates/xenia-gpu/src/translator.rs: src_operand emits the
  precomputed swizzle inline as `vec4<f32>(base.x, base.y, ...)`,
  then wraps in `(-…)` when negated. Identity swizzle (0x00) emits a
  bare base expression so it round-trips with the trivial-shader
  fixture.

Abs is omitted in this commit — the abs flag is dual-meaning (for
temps it lives at bit 7 of the src byte; for constants at word-2 bit
7 `abs_constants`). Wiring it up correctly requires more careful
case-split logic; deferred to Phase G.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (gated by Phase E for draws)
  draws:                0 → 0
  packets:              ~58M (within noise)
Tests: 554 → 555 (+1 swizzle/negate test, no count change otherwise
because identity swizzle test merged into D1's parameterised test).
WGSL still validates via naga (combined_module_parses_as_wgsl).

Closes GPUBUG-100 (P0). Abs deferred to Phase G.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 14:15:07 +02:00

MechaCat02

78ea81c12a

fix(gpu): GPUBUG-101 — decode src1/2/3_sel temp-vs-constant selector

Per canary AluInstruction layout (xenia-canary/src/xenia/gpu/ucode.h:
2078-2086), word-0 bits 29-31 are the per-operand `srcN_sel` flags
selecting temp register (1) vs ALU constant (0); the corresponding
8-bit src byte indexes either:
  - a temp register (bits 5:0 = index, bits 6/7 reserved for
    relative-addressing / abs flags consumed by Phase D2), or
  - an ALU constant (full 8-bit index).

Pre-fix, the WGSL interpreter and AOT translator both masked `& 0x7F`
on the src byte and emitted `r[low7]` regardless of the operand class.
Every shader's WVP matrix / light constant / per-frame uniform read
came back as r[low7] — typically zero — yielding invisible rendering.

Mechanical changes:
- crates/xenia-gpu/src/ucode/alu.rs: decode src_a_is_temp /
  src_b_is_temp / src_c_is_temp from w0 bits 29/30/31. Note that our
  src_a (low byte of w0) is canary's third operand, hence its selector
  is bit 29 (canary src3_sel), not bit 31.
- crates/xenia-gpu/src/shaders/xenos_interp.wgsl: `read_src` now takes
  the is_temp flag; constants index xenos_consts.alu directly.
- crates/xenia-gpu/src/translator.rs: `src_operand` mirrors the
  interpreter — `r[idx]` when temp, `xenos_consts.alu[idx]` when
  constant.

The trivial-shader synthetic test was updated to set the temp flags so
its `r[0u] = (r[0u] + r[0u])` assertion remains valid; without the
flags set, all sources would now resolve as constants.

Bank-selection (cf-level relative addressing for higher banks of the
512 ALU constants) remains a Phase G+ extension — covers c0..c127
in bank 0, which most Sylpheed shaders use directly.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (unchanged — gated by D2/D3/E for draws)
  draws:                0 → 0
  packets:              ~61M (within noise)
Tests: 552 → 554 (+2 translator tests for the temp/constant decode).

Closes GPUBUG-101 (P0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 14:10:11 +02:00

MechaCat02

79eb52c378

xenia-gpu: end-to-end Xenos pipeline (PM4, ucode, EDRAM, resolve)

First real GPU implementation. Ring/PM4 frontend (ring_view,
ring_drain, pm4) drains the command processor; gpu_system owns the
threaded backend (DrainFence RPC + parker/fence helpers from M1) and
the MMIO-mapped register block (mmio_region).

Xenos shader frontend: ucode/{alu,control_flow,fetch,mod}.rs decode
the Xbox 360 microcode, translator.rs lowers it onto the WGSL
xenos_interp interpreter shader (shaders/xenos_interp.wgsl).
shader_metrics.rs counts decode/translate work.

Render state: draw_state, primitive, render_target_cache,
texture_cache, tiled_address (Xenos's swizzled tiled-memory layout),
xenos_constants (register field constants), edram (the 10 MiB EDRAM
model with MSAA), and resolve.rs (TILE_FLUSH copy-out — clear-resolve
plus bitwise-equivalent 32 bpp + 64 bpp paths landed). handle.rs
owns the typed GPU-resource handles the kernel hands out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 16:29:38 +02:00

3 Commits