Commit Graph

4 Commits

Author SHA1 Message Date
MechaCat02
2a992db47b [iterate-3Y] Replay per-draw blend + write-mask so the logo composites visible
The publisher logo rendered its real artwork in isolation (3X) but was
overpainted in the full composite: every replayed draw used ONE fixed
SrcAlpha/OneMinusSrcAlpha pipeline + an opaque-magenta texture stub, so the
textured RectangleList draws whose sampler slot is shadowed by a vertex-fetch
constant (no resolvable texture) wrote opaque magenta over the logo.

Per-draw render-state inventory at the splash (env-gated probe, removed):
  - logo  QuadList vs=0x03b7b020 ps=0x03b79001: bc0=0x07010701
    (One,OneMinusSrcAlpha — premultiplied alpha), cmask=0xF, ntex=1 (real K8888)
  - RectangleList vs=0xd4c14f46 ps=0x03b79001: SAME premult blend, ntex=0
    (slot 0 holds a type=3 vertex constant → texture decode rejects) → magenta
  - opaque fill vs=0x36660986 ps=0xed732b5a: bc0=0x00010001 (One,Zero) — green
Draw order: the logo is drawn LAST per group, so order was not the problem;
the fixed pipeline state was.

Change (UI-side capture/replay only):
  - draw_capture: capture RB_BLENDCONTROL0 + RB_COLOR_MASK (+ colorcontrol /
    depthcontrol for follow-ups) per draw.
  - xenos_pipeline: new RenderState{blend_control,color_mask}; map Xenos blend
    factors/ops -> wgpu mirroring canary kBlendFactorMap/kBlendFactorAlphaMap;
    One,Zero,Add => blend:None (opaque); zero-channel mask => ColorWrites; cache
    translator AND interpreter pipelines keyed on (vs,ps,RenderState) /
    RenderState so each draw composites with its real state.
  - render: pass each capture's RenderState through both replay paths.
  - dummy texture magenta(255,0,255,255) -> transparent(0,0,0,0): an
    unresolvable texture now contributes nothing under its real premult blend
    instead of fabricating opaque magenta (removes a fake, adds none).

Readback (env-gated, removed): full 1280x720 composite now shows the logo's
real artwork (maxR=255, 50-102 distinct colors/cell) in a centered strip; no
magenta anywhere. Background is uniform green (the 0xed732b5a opaque fill) — a
separate vertex-color/shader fidelity issue, NOT compositing (next iterate).

Determinism: UI-only; draw_capture additions only run when frame_captures=Some.
check -n50m --gpu-inline --stable-digest --expect = "matches golden" (2x).
cargo test --workspace = 682 passed. Temp probes removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 19:53:25 +02:00
MechaCat02
80fbff8bd1 [iterate-3S] Real splash geometry renders: fix ALU/vfetch decode + per-draw NDC transform
The 3O→3R real-render slice ran the guest's real translated VS/PS on real
captured vertices at full boot speed, but the --ui window stayed blank.
Bifurcated with an env-gated frontbuffer readback + per-vertex NDC dump
(both removed): the captured splash quads (RectangleList, k_32_32_FLOAT,
3 verts) were non-zero and sane, so this was a transform/decode chain of
bugs, not missing geometry. Four coupled root causes:

- GPUBUG-106 (ucode/alu.rs): decode_alu read EVERY field out of w2, but
  canary's AluInstruction lays dest/write-mask/export/scalar-opcode in w0,
  the vector opcode + source regs in w2, swizzle/negate/pred in w1. The
  misread made every *export* ALU decode with vector_write_mask=0 → no
  oPos/oColor export emitted → the translated VS collapsed every vertex to
  the clip origin. Rewrote the field map to match ucode.h:2036-2086.

- GPUBUG-107 (ucode/fetch.rs + translator.rs): the translator hardcoded
  R32G32B32A32_FLOAT (4 floats, stride 4); the splash quads are
  k_32_32_FLOAT (2 floats, stride 2). Over-striding read the next vertex's
  X into .w → negative W → the rectangle clipped behind the camera. Decode
  the real VertexFormat + dword stride and emit the matching component
  read (1/2/3/4 float formats; others reject to the interpreter).

- GPUBUG-108 (translator.rs + xenos_interp.wgsl): the vfetch recomputed
  the buffer base from xenos_consts.fetch[], but that uniform carries the
  last-published per-frame fetch constant, not this draw's (stale
  0x8a000002 vs the real base). The captured window already begins at the
  fetch base, so index from 0 (vertex i at i*stride) when a real window is
  present; only the synthetic fallback consults the uniform.

- iterate-3S NDC transform (draw_capture.rs + xenos_pipeline.rs + WGSL):
  the guest VS emits screen-space pixel coords (clip disabled, VTE viewport
  scale/offset off). Added compute_ndc_xy (mirrors canary
  GetHostViewportInfo): rescales render-target pixels to [-1,1] clip with
  the Y-flip for wgpu, plumbed per-draw into DrawConstants and applied in
  both the translated and interpreter VS.

Result (env-gated readback, since removed): the real splash geometry now
fills ~50% of the frontbuffer in a clean triangular coverage pattern, real
positions from real guest vertices through the real translated shaders
(textures are the next stage — sampled color is still the magenta/white
texture stub, tex-cache=0). Headless-inert: draw_capture is only built
when frame_captures is Some (--ui); the changed decoders feed only the UI
translator/metrics. Golden byte-identical (check -n50m --gpu-inline
--stable-digest exit 0); 679 workspace tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 16:35:01 +02:00
MechaCat02
504592ac13 [iterate-3O] Real-render slice: replay guest geometry in --ui (Route A)
Replace the synthetic placeholder triangle in the --ui window with the
splash's REAL guest geometry, proving the faithful-render pipe end to end.

Architecture: Route A (UI-side replay). A per-draw capture channel carries
each PM4_DRAW_INDX*'s real state to the UI, which replays it through the
existing wgpu Xenos pipeline. The deterministic headless core is untouched:
capture is gated on an Option<Vec<DrawCapture>> that is None in headless
mode and only enabled on the --ui path, so the --gpu-inline n50m golden is
byte-identical (verified 2x).

The hard part was sourcing real vertices. The WGSL VS already does
format-aware vertex fetch from the b4 storage buffer at the address from the
fetch constant -- but b4 was never populated and the fetch address is an
absolute guest dword address. The slice:
  * xenia-gpu/draw_capture.rs: parse the active VS, find its first vertex
    fetch, read that fetch constant, copy a bounded window of guest memory
    at the fetch base. Best-effort: has_real_vertices=false falls back to
    procedural geometry (never fabricated pixels).
  * gpu_system.rs: accumulate one DrawCapture per draw into frame_captures.
  * exports.rs (vd_swap): drain + publish the frame's captures to the UI.
  * ui_bridge/bridge.rs: new publish_geometry channel + UiHandles.geometry.
  * WGSL (interp + translator): rebase the absolute fetch address by a new
    DrawConstants.vertex_base_dwords so it indexes the uploaded window.
  * render.rs: dispatch_xenos_captures uploads each draw's real vertex
    window + matching shader, issues real DrawRequests (real prim type,
    host vertex count, vs/ps keys).
  * app.rs: prefer the real-capture replay; HUD adds real-geo=N counter.

Verified in --ui on Sylpheed: "first Xenos capture batch replayed (real
geometry) captures=24 real_vertex_draws=24" -- all draws resolved a real
guest vertex window; WGSL compiles; no validation errors over 1616 swaps.

Still synthetic-free but not yet pixel-perfect: textures/UVs, DMA index
buffers (auto-index only for now), and kCopy resolve routing are staged
for follow-ups. Faithful: real vertex data, prim types, shaders, constants.

cargo test --workspace green; n50m golden unchanged (2x byte-identical).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 22:38:46 +02:00
MechaCat02
e2b8860e10 Add xenia-ui crate; switch analysis store to DuckDB
Workspace gains a new xenia-ui member that owns the winit/wgpu
window, the Xenos display pipeline (xenos_pipeline + render +
texture_cache_host), HUD font/blit shaders, and the input-bridge
plumbing the app uses to surface guest framebuffers and overlays.

Workspace dependencies grow accordingly: rusqlite is replaced with
duckdb (analysis pipeline now writes DuckDB stores), and tracing /
metrics / pprof / winit / wgpu / gilrs / pollster / crossbeam /
bytemuck are added at workspace level so xenia-ui and xenia-app
share versions. Cargo.lock regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:26:48 +02:00