[iterate-3S] Real splash geometry renders: fix ALU/vfetch decode + per-draw NDC transform

The 3O→3R real-render slice ran the guest's real translated VS/PS on real
captured vertices at full boot speed, but the --ui window stayed blank.
Bifurcated with an env-gated frontbuffer readback + per-vertex NDC dump
(both removed): the captured splash quads (RectangleList, k_32_32_FLOAT,
3 verts) were non-zero and sane, so this was a transform/decode chain of
bugs, not missing geometry. Four coupled root causes:

- GPUBUG-106 (ucode/alu.rs): decode_alu read EVERY field out of w2, but
  canary's AluInstruction lays dest/write-mask/export/scalar-opcode in w0,
  the vector opcode + source regs in w2, swizzle/negate/pred in w1. The
  misread made every *export* ALU decode with vector_write_mask=0 → no
  oPos/oColor export emitted → the translated VS collapsed every vertex to
  the clip origin. Rewrote the field map to match ucode.h:2036-2086.

- GPUBUG-107 (ucode/fetch.rs + translator.rs): the translator hardcoded
  R32G32B32A32_FLOAT (4 floats, stride 4); the splash quads are
  k_32_32_FLOAT (2 floats, stride 2). Over-striding read the next vertex's
  X into .w → negative W → the rectangle clipped behind the camera. Decode
  the real VertexFormat + dword stride and emit the matching component
  read (1/2/3/4 float formats; others reject to the interpreter).

- GPUBUG-108 (translator.rs + xenos_interp.wgsl): the vfetch recomputed
  the buffer base from xenos_consts.fetch[], but that uniform carries the
  last-published per-frame fetch constant, not this draw's (stale
  0x8a000002 vs the real base). The captured window already begins at the
  fetch base, so index from 0 (vertex i at i*stride) when a real window is
  present; only the synthetic fallback consults the uniform.

- iterate-3S NDC transform (draw_capture.rs + xenos_pipeline.rs + WGSL):
  the guest VS emits screen-space pixel coords (clip disabled, VTE viewport
  scale/offset off). Added compute_ndc_xy (mirrors canary
  GetHostViewportInfo): rescales render-target pixels to [-1,1] clip with
  the Y-flip for wgpu, plumbed per-draw into DrawConstants and applied in
  both the translated and interpreter VS.

Result (env-gated readback, since removed): the real splash geometry now
fills ~50% of the frontbuffer in a clean triangular coverage pattern, real
positions from real guest vertices through the real translated shaders
(textures are the next stage — sampled color is still the magenta/white
texture stub, tex-cache=0). Headless-inert: draw_capture is only built
when frame_captures is Some (--ui); the changed decoders feed only the UI
translator/metrics. Golden byte-identical (check -n50m --gpu-inline
--stable-digest exit 0); 679 workspace tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-18 16:35:01 +02:00
parent 6d8a2817a3
commit 80fbff8bd1
7 changed files with 308 additions and 85 deletions

View File

@@ -663,6 +663,10 @@ impl RenderState {
prim_kind,
// Synthetic fallback path: no real vertex window.
vertex_base_dwords: 0,
// No real geometry → no NDC transform (procedural positions are
// already in clip space).
ndc_scale: [0.0, 0.0],
ndc_offset: [0.0, 0.0],
};
if use_translated
&& let Some(p) = self.xenos_pipeline.translated_pipeline(vs_key, ps_key) {
@@ -788,6 +792,11 @@ impl RenderState {
vertex_count: cap.host_vertex_count.max(3),
prim_kind: cap.prim_code,
vertex_base_dwords: base,
// iterate-3S: apply the per-draw guest viewport → host NDC
// transform only when we have real geometry (otherwise the
// procedural fallback already emits clip-space positions).
ndc_scale: if cap.has_real_vertices { cap.ndc_scale } else { [0.0, 0.0] },
ndc_offset: if cap.has_real_vertices { cap.ndc_offset } else { [0.0, 0.0] },
};
if use_translated
&& let Some(p) = self.xenos_pipeline.translated_pipeline(cap.vs_key, cap.ps_key)

View File

@@ -39,6 +39,11 @@ struct DrawConstants {
/// iterate-3O: guest dword base of the uploaded `vertex_buffer` window.
/// The WGSL subtracts this from the absolute vertex-fetch address.
vertex_base_dwords: u32,
/// iterate-3S: guest→host NDC XY transform (mirrors canary
/// `GetHostViewportInfo`). `clip.xy = pos.xy * ndc_scale + ndc_offset*pos.w`.
/// Y is pre-flipped for wgpu. 16 bytes so the block stays 16-byte aligned.
ndc_scale: [f32; 2],
ndc_offset: [f32; 2],
}
/// Submitted to [`XenosPipeline::render_one`] to render one captured draw.
@@ -53,6 +58,10 @@ pub struct DrawRequest {
/// iterate-3O: guest dword base of the per-draw vertex window uploaded to
/// `vertex_buffer` (b4). 0 = no real vertex window (procedural fallback).
pub vertex_base_dwords: u32,
/// iterate-3S: guest→host NDC XY transform (Y pre-flipped). When all-zero
/// the shader leaves the position untransformed (procedural fallback).
pub ndc_scale: [f32; 2],
pub ndc_offset: [f32; 2],
}
/// Reasonable upper bound on a single shader blob (dwords). Most Xbox 360
@@ -199,6 +208,8 @@ impl XenosPipeline {
vertex_count: 3,
prim_kind: 4,
vertex_base_dwords: 0,
ndc_scale: [0.0, 0.0],
ndc_offset: [0.0, 0.0],
};
let draw_ctx_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
label: Some("xenos draw ctx"),
@@ -486,6 +497,8 @@ impl XenosPipeline {
vertex_count: req.vertex_count.max(3),
prim_kind: req.prim_kind,
vertex_base_dwords: req.vertex_base_dwords,
ndc_scale: req.ndc_scale,
ndc_offset: req.ndc_offset,
};
queue.write_buffer(&self.draw_ctx_buffer, 0, bytemuck::bytes_of(&cb));
@@ -612,6 +625,8 @@ impl XenosPipeline {
vertex_count: req.vertex_count.max(3),
prim_kind: req.prim_kind,
vertex_base_dwords: req.vertex_base_dwords,
ndc_scale: req.ndc_scale,
ndc_offset: req.ndc_offset,
};
queue.write_buffer(&self.draw_ctx_buffer, 0, bytemuck::bytes_of(&cb));
@@ -643,6 +658,6 @@ mod tests {
#[test]
fn draw_constants_layout_matches_wgsl_uniform() {
assert_eq!(std::mem::size_of::<DrawConstants>(), 16);
assert_eq!(std::mem::size_of::<DrawConstants>(), 32);
}
}