Commit Graph

4 Commits

Author SHA1 Message Date
MechaCat02
1b9918450f [iterate-3T] Real UV interpolation + per-draw textures: shader/UV/bind chain complete
Build the full texture-sampling chain for the publisher splash so the textured
logo CAN sample real artwork at the guest's real UVs. Measured with an env-gated
frontbuffer readback (since removed): the chain is correct end-to-end, but the
sampled K8888 1280x768 texture is ALL-ZERO in the UI window's reachable boot
range — the artwork is produced by an EDRAM resolve (RT->texture copy) that ours
does not yet perform (resolves=0). So this lands the correct shader/UV/bind work
and isolates the remaining blocker to the resolve gap, not the shader path.

Translator (xenia-gpu/src/translator.rs), all UI-translator-only:
- Real Xenos export-index model (replaces the AllocKind heuristic that collapsed
  every VS export to one color slot and DROPPED the texcoord). When export_data
  is set the 6-bit vector_dest IS the export index: VS 62=oPos, 0..15=interps;
  PS 0=RT0. The logo VS exports oPos(62), interp0(color), interp1(UV) distinctly.
- Real interpolator passthrough: VsOut carries 8 interpolator locations; the PS
  seeds r[i] = in.interp[i] (Xenos PS-input-GPR mapping) so tfetch samples at the
  real interpolated texcoord (r1) instead of (0,0).
- vfetch format 6 (k_16_16) packed-16 unpack + per-attribute dword offset, so the
  3 vfetches sharing one fetch-constant (pos/UV/color in a 6-dword vertex) read
  the right attribute. Previously rejected the whole logo VS to the interpreter.
- QuadList/RectangleList host->guest vertex-index remap in the VS (replay is
  non-indexed): QuadList 6 host verts -> guest [0,1,2,0,2,3] (full quad).

fetch.rs: decode vfetch `offset` (dword2[8:15], dwords), `is_signed`,
`is_normalized`.

Per-draw textures: DrawCapture carries the decoded texture(s) (keyed off the
active PS's tfetch slots, attached in gpu_system after decode);
render.rs::dispatch_xenos_captures uploads + binds each capture's texture via the
host texture cache before its draw, instead of one last-draw primary_texture.

Determinism: all changes feed only the UI translator/capture path; frame_captures
is None headless. `check -n50m --gpu-inline --stable-digest --expect` byte-
identical (exit 0). 681 tests pass (+2 regression: logo VS now translates with
interpolators; PS seeds interps into registers). Temp readback/dump probes removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 17:12:16 +02:00
MechaCat02
80fbff8bd1 [iterate-3S] Real splash geometry renders: fix ALU/vfetch decode + per-draw NDC transform
The 3O→3R real-render slice ran the guest's real translated VS/PS on real
captured vertices at full boot speed, but the --ui window stayed blank.
Bifurcated with an env-gated frontbuffer readback + per-vertex NDC dump
(both removed): the captured splash quads (RectangleList, k_32_32_FLOAT,
3 verts) were non-zero and sane, so this was a transform/decode chain of
bugs, not missing geometry. Four coupled root causes:

- GPUBUG-106 (ucode/alu.rs): decode_alu read EVERY field out of w2, but
  canary's AluInstruction lays dest/write-mask/export/scalar-opcode in w0,
  the vector opcode + source regs in w2, swizzle/negate/pred in w1. The
  misread made every *export* ALU decode with vector_write_mask=0 → no
  oPos/oColor export emitted → the translated VS collapsed every vertex to
  the clip origin. Rewrote the field map to match ucode.h:2036-2086.

- GPUBUG-107 (ucode/fetch.rs + translator.rs): the translator hardcoded
  R32G32B32A32_FLOAT (4 floats, stride 4); the splash quads are
  k_32_32_FLOAT (2 floats, stride 2). Over-striding read the next vertex's
  X into .w → negative W → the rectangle clipped behind the camera. Decode
  the real VertexFormat + dword stride and emit the matching component
  read (1/2/3/4 float formats; others reject to the interpreter).

- GPUBUG-108 (translator.rs + xenos_interp.wgsl): the vfetch recomputed
  the buffer base from xenos_consts.fetch[], but that uniform carries the
  last-published per-frame fetch constant, not this draw's (stale
  0x8a000002 vs the real base). The captured window already begins at the
  fetch base, so index from 0 (vertex i at i*stride) when a real window is
  present; only the synthetic fallback consults the uniform.

- iterate-3S NDC transform (draw_capture.rs + xenos_pipeline.rs + WGSL):
  the guest VS emits screen-space pixel coords (clip disabled, VTE viewport
  scale/offset off). Added compute_ndc_xy (mirrors canary
  GetHostViewportInfo): rescales render-target pixels to [-1,1] clip with
  the Y-flip for wgpu, plumbed per-draw into DrawConstants and applied in
  both the translated and interpreter VS.

Result (env-gated readback, since removed): the real splash geometry now
fills ~50% of the frontbuffer in a clean triangular coverage pattern, real
positions from real guest vertices through the real translated shaders
(textures are the next stage — sampled color is still the magenta/white
texture stub, tex-cache=0). Headless-inert: draw_capture is only built
when frame_captures is Some (--ui); the changed decoders feed only the UI
translator/metrics. Golden byte-identical (check -n50m --gpu-inline
--stable-digest exit 0); 679 workspace tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 16:35:01 +02:00
MechaCat02
504592ac13 [iterate-3O] Real-render slice: replay guest geometry in --ui (Route A)
Replace the synthetic placeholder triangle in the --ui window with the
splash's REAL guest geometry, proving the faithful-render pipe end to end.

Architecture: Route A (UI-side replay). A per-draw capture channel carries
each PM4_DRAW_INDX*'s real state to the UI, which replays it through the
existing wgpu Xenos pipeline. The deterministic headless core is untouched:
capture is gated on an Option<Vec<DrawCapture>> that is None in headless
mode and only enabled on the --ui path, so the --gpu-inline n50m golden is
byte-identical (verified 2x).

The hard part was sourcing real vertices. The WGSL VS already does
format-aware vertex fetch from the b4 storage buffer at the address from the
fetch constant -- but b4 was never populated and the fetch address is an
absolute guest dword address. The slice:
  * xenia-gpu/draw_capture.rs: parse the active VS, find its first vertex
    fetch, read that fetch constant, copy a bounded window of guest memory
    at the fetch base. Best-effort: has_real_vertices=false falls back to
    procedural geometry (never fabricated pixels).
  * gpu_system.rs: accumulate one DrawCapture per draw into frame_captures.
  * exports.rs (vd_swap): drain + publish the frame's captures to the UI.
  * ui_bridge/bridge.rs: new publish_geometry channel + UiHandles.geometry.
  * WGSL (interp + translator): rebase the absolute fetch address by a new
    DrawConstants.vertex_base_dwords so it indexes the uploaded window.
  * render.rs: dispatch_xenos_captures uploads each draw's real vertex
    window + matching shader, issues real DrawRequests (real prim type,
    host vertex count, vs/ps keys).
  * app.rs: prefer the real-capture replay; HUD adds real-geo=N counter.

Verified in --ui on Sylpheed: "first Xenos capture batch replayed (real
geometry) captures=24 real_vertex_draws=24" -- all draws resolved a real
guest vertex window; WGSL compiles; no validation errors over 1616 swaps.

Still synthetic-free but not yet pixel-perfect: textures/UVs, DMA index
buffers (auto-index only for now), and kCopy resolve routing are staged
for follow-ups. Faithful: real vertex data, prim types, shaders, constants.

cargo test --workspace green; n50m golden unchanged (2x byte-identical).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 22:38:46 +02:00
MechaCat02
e2b8860e10 Add xenia-ui crate; switch analysis store to DuckDB
Workspace gains a new xenia-ui member that owns the winit/wgpu
window, the Xenos display pipeline (xenos_pipeline + render +
texture_cache_host), HUD font/blit shaders, and the input-bridge
plumbing the app uses to surface guest framebuffers and overlays.

Workspace dependencies grow accordingly: rusqlite is replaced with
duckdb (analysis pipeline now writes DuckDB stores), and tracing /
metrics / pprof / winit / wgpu / gilrs / pollster / crossbeam /
bytemuck are added at workspace level so xenia-ui and xenia-app
share versions. Cargo.lock regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:26:48 +02:00