Build the full texture-sampling chain for the publisher splash so the textured
logo CAN sample real artwork at the guest's real UVs. Measured with an env-gated
frontbuffer readback (since removed): the chain is correct end-to-end, but the
sampled K8888 1280x768 texture is ALL-ZERO in the UI window's reachable boot
range — the artwork is produced by an EDRAM resolve (RT->texture copy) that ours
does not yet perform (resolves=0). So this lands the correct shader/UV/bind work
and isolates the remaining blocker to the resolve gap, not the shader path.
Translator (xenia-gpu/src/translator.rs), all UI-translator-only:
- Real Xenos export-index model (replaces the AllocKind heuristic that collapsed
every VS export to one color slot and DROPPED the texcoord). When export_data
is set the 6-bit vector_dest IS the export index: VS 62=oPos, 0..15=interps;
PS 0=RT0. The logo VS exports oPos(62), interp0(color), interp1(UV) distinctly.
- Real interpolator passthrough: VsOut carries 8 interpolator locations; the PS
seeds r[i] = in.interp[i] (Xenos PS-input-GPR mapping) so tfetch samples at the
real interpolated texcoord (r1) instead of (0,0).
- vfetch format 6 (k_16_16) packed-16 unpack + per-attribute dword offset, so the
3 vfetches sharing one fetch-constant (pos/UV/color in a 6-dword vertex) read
the right attribute. Previously rejected the whole logo VS to the interpreter.
- QuadList/RectangleList host->guest vertex-index remap in the VS (replay is
non-indexed): QuadList 6 host verts -> guest [0,1,2,0,2,3] (full quad).
fetch.rs: decode vfetch `offset` (dword2[8:15], dwords), `is_signed`,
`is_normalized`.
Per-draw textures: DrawCapture carries the decoded texture(s) (keyed off the
active PS's tfetch slots, attached in gpu_system after decode);
render.rs::dispatch_xenos_captures uploads + binds each capture's texture via the
host texture cache before its draw, instead of one last-draw primary_texture.
Determinism: all changes feed only the UI translator/capture path; frame_captures
is None headless. `check -n50m --gpu-inline --stable-digest --expect` byte-
identical (exit 0). 681 tests pass (+2 regression: logo VS now translates with
interpolators; PS seeds interps into registers). Temp readback/dump probes removed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 3O→3R real-render slice ran the guest's real translated VS/PS on real
captured vertices at full boot speed, but the --ui window stayed blank.
Bifurcated with an env-gated frontbuffer readback + per-vertex NDC dump
(both removed): the captured splash quads (RectangleList, k_32_32_FLOAT,
3 verts) were non-zero and sane, so this was a transform/decode chain of
bugs, not missing geometry. Four coupled root causes:
- GPUBUG-106 (ucode/alu.rs): decode_alu read EVERY field out of w2, but
canary's AluInstruction lays dest/write-mask/export/scalar-opcode in w0,
the vector opcode + source regs in w2, swizzle/negate/pred in w1. The
misread made every *export* ALU decode with vector_write_mask=0 → no
oPos/oColor export emitted → the translated VS collapsed every vertex to
the clip origin. Rewrote the field map to match ucode.h:2036-2086.
- GPUBUG-107 (ucode/fetch.rs + translator.rs): the translator hardcoded
R32G32B32A32_FLOAT (4 floats, stride 4); the splash quads are
k_32_32_FLOAT (2 floats, stride 2). Over-striding read the next vertex's
X into .w → negative W → the rectangle clipped behind the camera. Decode
the real VertexFormat + dword stride and emit the matching component
read (1/2/3/4 float formats; others reject to the interpreter).
- GPUBUG-108 (translator.rs + xenos_interp.wgsl): the vfetch recomputed
the buffer base from xenos_consts.fetch[], but that uniform carries the
last-published per-frame fetch constant, not this draw's (stale
0x8a000002 vs the real base). The captured window already begins at the
fetch base, so index from 0 (vertex i at i*stride) when a real window is
present; only the synthetic fallback consults the uniform.
- iterate-3S NDC transform (draw_capture.rs + xenos_pipeline.rs + WGSL):
the guest VS emits screen-space pixel coords (clip disabled, VTE viewport
scale/offset off). Added compute_ndc_xy (mirrors canary
GetHostViewportInfo): rescales render-target pixels to [-1,1] clip with
the Y-flip for wgpu, plumbed per-draw into DrawConstants and applied in
both the translated and interpreter VS.
Result (env-gated readback, since removed): the real splash geometry now
fills ~50% of the frontbuffer in a clean triangular coverage pattern, real
positions from real guest vertices through the real translated shaders
(textures are the next stage — sampled color is still the magenta/white
texture stub, tex-cache=0). Headless-inert: draw_capture is only built
when frame_captures is Some (--ui); the changed decoders feed only the UI
translator/metrics. Golden byte-identical (check -n50m --gpu-inline
--stable-digest exit 0); 679 workspace tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the synthetic placeholder triangle in the --ui window with the
splash's REAL guest geometry, proving the faithful-render pipe end to end.
Architecture: Route A (UI-side replay). A per-draw capture channel carries
each PM4_DRAW_INDX*'s real state to the UI, which replays it through the
existing wgpu Xenos pipeline. The deterministic headless core is untouched:
capture is gated on an Option<Vec<DrawCapture>> that is None in headless
mode and only enabled on the --ui path, so the --gpu-inline n50m golden is
byte-identical (verified 2x).
The hard part was sourcing real vertices. The WGSL VS already does
format-aware vertex fetch from the b4 storage buffer at the address from the
fetch constant -- but b4 was never populated and the fetch address is an
absolute guest dword address. The slice:
* xenia-gpu/draw_capture.rs: parse the active VS, find its first vertex
fetch, read that fetch constant, copy a bounded window of guest memory
at the fetch base. Best-effort: has_real_vertices=false falls back to
procedural geometry (never fabricated pixels).
* gpu_system.rs: accumulate one DrawCapture per draw into frame_captures.
* exports.rs (vd_swap): drain + publish the frame's captures to the UI.
* ui_bridge/bridge.rs: new publish_geometry channel + UiHandles.geometry.
* WGSL (interp + translator): rebase the absolute fetch address by a new
DrawConstants.vertex_base_dwords so it indexes the uploaded window.
* render.rs: dispatch_xenos_captures uploads each draw's real vertex
window + matching shader, issues real DrawRequests (real prim type,
host vertex count, vs/ps keys).
* app.rs: prefer the real-capture replay; HUD adds real-geo=N counter.
Verified in --ui on Sylpheed: "first Xenos capture batch replayed (real
geometry) captures=24 real_vertex_draws=24" -- all draws resolved a real
guest vertex window; WGSL compiles; no validation errors over 1616 swaps.
Still synthetic-free but not yet pixel-perfect: textures/UVs, DMA index
buffers (auto-index only for now), and kCopy resolve routing are staged
for follow-ups. Faithful: real vertex data, prim types, shaders, constants.
cargo test --workspace green; n50m golden unchanged (2x byte-identical).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Workspace gains a new xenia-ui member that owns the winit/wgpu
window, the Xenos display pipeline (xenos_pipeline + render +
texture_cache_host), HUD font/blit shaders, and the input-bridge
plumbing the app uses to surface guest framebuffers and overlays.
Workspace dependencies grow accordingly: rusqlite is replaced with
duckdb (analysis pipeline now writes DuckDB stores), and tracing /
metrics / pprof / winit / wgpu / gilrs / pollster / crossbeam /
bytemuck are added at workspace level so xenia-ui and xenia-app
share versions. Cargo.lock regenerated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>