The publisher splash (title idx0) rendered FLAT in ours while canary samples
a texture: ours never decoded the logo's textured pixel shader
(E59B2B3D, a `tfetch2D` sprite) even though our guest IM_LOADs the exact same
microcode canary does (verified byte-identical against the Wine oracle). The
shader was misparsed as flat. Three coupled bugs in the ucode decoder, all
off vs canary `gpu/ucode.h`:
1. CF opcode table was off-by-one (`control_flow.rs`): mapped opcode 0→Exec
and 1→Exit, but Xenos has 0=kNop, 1=kExec, 2=kExecEnd, 3..6/13..14 the
cond-exec variants, 7/8 loop, 9/10 call/return, 11 condjmp, 12 alloc,
15 mark-vs-fetch-done. So a real `kExec` clause was read as a terminal
`Exit`, truncating the CF block and dropping every instruction (incl. the
`tfetch`) after it. Added Nop/MarkVsFetchDone variants; parse now ends on
an END-bit exec clause.
2. exec/loop `address` is an absolute instruction-triple index from shader
dword 0, but indexed our post-CF `instructions` slice directly
(`ucode/mod.rs`). Rebase addresses by the CF triple count so `address*3`
lands on the right instruction.
3. Fetch instruction bitfields were wrong (`ucode/fetch.rs`): `const_index`
read from bit 5 (actually `src_reg`) instead of bit 20, and texture
`dimension` from dword1 instead of dword2 bit14. The logo's `tfetch ..,tf0`
was read as `tf1`, whose empty fetch-constant failed to decode → no
texture. Also the `sequence` fetch/ALU bit is bit[0] of each pair, not
bit[1] (`shader_metrics.rs`, `translator.rs`, `xenos_interp.wgsl`).
Result (--gpu-inline, deterministic 2x): the active PS's `tfetch_slots` now
resolves slot 0, the tf0 fetch-constant decodes (fmt K8888), and
`gpu.texture.decode` fires (137x at -n 50M; texture_cache_entries 0→1, the
only golden field that changed — all draw/swap counts unchanged). The same
fixes correct the WGSL uber-shader's fetch/CF walk for the threaded/--ui path.
Added a regression test that parses the real E59B2B3D microcode and asserts a
tfetch slot is found. Golden re-baselined (texture_cache_entries 0→1).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two faithful, deterministic GPU-backend changes that make the texture path
correct for whatever textured draw the splash eventually dispatches. Both are
currently inert on Sylpheed (the textured logo draw is still gated downstream
— see below), but neither shifts the stable-digest golden, so they land safely.
1. Un-stub RectangleList primitive expansion (primitive.rs). The splash submits
2819 RectangleList draws at 200M, all of which were REJECTED by the P3 stub
(`gpu.primitive.rejected{rectangle_list}`) → only ~592 flat point/quad draws
rasterized. Mirror canary's intent (primitive_processor.cc:389-456
kRectangleListAsTriangleStrip) within our CPU index-rewrite idiom: emit each
rect's 3 real vertices as one TriangleList triangle (v0,v1,v2), rejected=false,
faithful host_vertex_count. The full quad (synthesized 4th corner v3=v0+v2-v1)
needs real vertex fetch in vs_main — left as a documented TODO. Rejection
warnings drop 2819→0.
2. Draw-time texture decode keyed off the active PS's real tfetch slots
(gpu_system.rs + exports.rs vd_swap). Previously vd_swap decoded a hardcoded
fetch-constant slot 0 at swap time. Now the DRAW handler parses the bound
pixel shader (ucode::parse_shader), collects its tfetch fetch_const slots via
new shader_metrics::tfetch_slots, reads each 6-dword fetch constant, and
decode+caches it into GpuSystem::last_draw_textures. vd_swap publishes the
first of these (UI binds one texture today), falling back to the legacy slot-0
probe on flat-only frames. New span_max_version helper walks page_version over
the trait (draw-time &dyn MemoryAccess lacks the heap's inherent
max_page_version). Pure function of guest writes — deterministic.
Status: texture_decodes stays 0 on Sylpheed because all 6 live shaders are flat
(no tfetch); canary's textured logo shaders E59B2B3D/F7B1457 are not yet
dispatched by ours (a downstream title-state gate, the next frontier). The full
P5 decode→publish→upload→sample path is already wired; this makes the decode
side key off the real shader instead of a guess.
Validation: stable-digest golden sylpheed_n50m unchanged (draws=718 swaps=147
tex=0), regenerated twice byte-identical; 200M run shows 0 RectangleList
rejections. cargo test --workspace green (677, +2: rectangle_list_expansion,
tfetch_slots_extracts_texture_fetch_constants). No temp hooks. Branch only;
not pushed/merged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>