[iterate-3S] Real splash geometry renders: fix ALU/vfetch decode + per-draw NDC transform
The 3O→3R real-render slice ran the guest's real translated VS/PS on real captured vertices at full boot speed, but the --ui window stayed blank. Bifurcated with an env-gated frontbuffer readback + per-vertex NDC dump (both removed): the captured splash quads (RectangleList, k_32_32_FLOAT, 3 verts) were non-zero and sane, so this was a transform/decode chain of bugs, not missing geometry. Four coupled root causes: - GPUBUG-106 (ucode/alu.rs): decode_alu read EVERY field out of w2, but canary's AluInstruction lays dest/write-mask/export/scalar-opcode in w0, the vector opcode + source regs in w2, swizzle/negate/pred in w1. The misread made every *export* ALU decode with vector_write_mask=0 → no oPos/oColor export emitted → the translated VS collapsed every vertex to the clip origin. Rewrote the field map to match ucode.h:2036-2086. - GPUBUG-107 (ucode/fetch.rs + translator.rs): the translator hardcoded R32G32B32A32_FLOAT (4 floats, stride 4); the splash quads are k_32_32_FLOAT (2 floats, stride 2). Over-striding read the next vertex's X into .w → negative W → the rectangle clipped behind the camera. Decode the real VertexFormat + dword stride and emit the matching component read (1/2/3/4 float formats; others reject to the interpreter). - GPUBUG-108 (translator.rs + xenos_interp.wgsl): the vfetch recomputed the buffer base from xenos_consts.fetch[], but that uniform carries the last-published per-frame fetch constant, not this draw's (stale 0x8a000002 vs the real base). The captured window already begins at the fetch base, so index from 0 (vertex i at i*stride) when a real window is present; only the synthetic fallback consults the uniform. - iterate-3S NDC transform (draw_capture.rs + xenos_pipeline.rs + WGSL): the guest VS emits screen-space pixel coords (clip disabled, VTE viewport scale/offset off). Added compute_ndc_xy (mirrors canary GetHostViewportInfo): rescales render-target pixels to [-1,1] clip with the Y-flip for wgpu, plumbed per-draw into DrawConstants and applied in both the translated and interpreter VS. Result (env-gated readback, since removed): the real splash geometry now fills ~50% of the frontbuffer in a clean triangular coverage pattern, real positions from real guest vertices through the real translated shaders (textures are the next stage — sampled color is still the magenta/white texture stub, tex-cache=0). Headless-inert: draw_capture is only built when frame_captures is Some (--ui); the changed decoders feed only the UI translator/metrics. Golden byte-identical (check -n50m --gpu-inline --stable-digest exit 0); 679 workspace tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -39,6 +39,11 @@ struct DrawConstants {
|
||||
/// iterate-3O: guest dword base of the uploaded `vertex_buffer` window.
|
||||
/// The WGSL subtracts this from the absolute vertex-fetch address.
|
||||
vertex_base_dwords: u32,
|
||||
/// iterate-3S: guest→host NDC XY transform (mirrors canary
|
||||
/// `GetHostViewportInfo`). `clip.xy = pos.xy * ndc_scale + ndc_offset*pos.w`.
|
||||
/// Y is pre-flipped for wgpu. 16 bytes so the block stays 16-byte aligned.
|
||||
ndc_scale: [f32; 2],
|
||||
ndc_offset: [f32; 2],
|
||||
}
|
||||
|
||||
/// Submitted to [`XenosPipeline::render_one`] to render one captured draw.
|
||||
@@ -53,6 +58,10 @@ pub struct DrawRequest {
|
||||
/// iterate-3O: guest dword base of the per-draw vertex window uploaded to
|
||||
/// `vertex_buffer` (b4). 0 = no real vertex window (procedural fallback).
|
||||
pub vertex_base_dwords: u32,
|
||||
/// iterate-3S: guest→host NDC XY transform (Y pre-flipped). When all-zero
|
||||
/// the shader leaves the position untransformed (procedural fallback).
|
||||
pub ndc_scale: [f32; 2],
|
||||
pub ndc_offset: [f32; 2],
|
||||
}
|
||||
|
||||
/// Reasonable upper bound on a single shader blob (dwords). Most Xbox 360
|
||||
@@ -199,6 +208,8 @@ impl XenosPipeline {
|
||||
vertex_count: 3,
|
||||
prim_kind: 4,
|
||||
vertex_base_dwords: 0,
|
||||
ndc_scale: [0.0, 0.0],
|
||||
ndc_offset: [0.0, 0.0],
|
||||
};
|
||||
let draw_ctx_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
|
||||
label: Some("xenos draw ctx"),
|
||||
@@ -486,6 +497,8 @@ impl XenosPipeline {
|
||||
vertex_count: req.vertex_count.max(3),
|
||||
prim_kind: req.prim_kind,
|
||||
vertex_base_dwords: req.vertex_base_dwords,
|
||||
ndc_scale: req.ndc_scale,
|
||||
ndc_offset: req.ndc_offset,
|
||||
};
|
||||
queue.write_buffer(&self.draw_ctx_buffer, 0, bytemuck::bytes_of(&cb));
|
||||
|
||||
@@ -612,6 +625,8 @@ impl XenosPipeline {
|
||||
vertex_count: req.vertex_count.max(3),
|
||||
prim_kind: req.prim_kind,
|
||||
vertex_base_dwords: req.vertex_base_dwords,
|
||||
ndc_scale: req.ndc_scale,
|
||||
ndc_offset: req.ndc_offset,
|
||||
};
|
||||
queue.write_buffer(&self.draw_ctx_buffer, 0, bytemuck::bytes_of(&cb));
|
||||
|
||||
@@ -643,6 +658,6 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn draw_constants_layout_matches_wgsl_uniform() {
|
||||
assert_eq!(std::mem::size_of::<DrawConstants>(), 16);
|
||||
assert_eq!(std::mem::size_of::<DrawConstants>(), 32);
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user