The vertex fetch constant (canary `xe_gpu_vertex_fetch_t`,
xenos.h:1158-1172) holds an `endian` field (low 2 bits of dword_1)
selecting kNone/k8in16/k8in32/k16in32 swap patterns per
`GpuSwapInline` (xenos.h:1090-1109). Xbox 360 vertex data is stored
big-endian; the host is little-endian. Pre-fix every dword was
bitcast as-is — vertex positions decoded as byte-reversed garbage,
producing clipped or NaN positions in any draw that survived to the
host.
Mechanical changes:
- crates/xenia-gpu/src/translator.rs: AOT `emit_vfetch` reads
fetch_const dword 1 (endian) and wraps each lane's load in
`gpu_swap(value, endian)`. New `gpu_swap` helper added to the
emitted module header.
- crates/xenia-gpu/src/shaders/xenos_interp.wgsl: matching
`gpu_swap` helper added to the runtime interpreter shader.
`interpret_vertex_fetch` reads fc1, computes the endian, and wraps
every format's per-lane load (including 8_8_8_8 and 16_16_FLOAT
paths). Mirrors the AOT translator's emission.
Verification at -n 100M lockstep:
swaps: 2 → 2 (gated by Phase E for draws)
draws: 0 → 0
packets: ~60M (within noise)
Tests: +1 (vfetch_emit_includes_gpu_swap_helper_call).
Closes GPUBUG-102 (P0).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>