[iterate-3O] Real-render slice: replay guest geometry in --ui (Route A)

Replace the synthetic placeholder triangle in the --ui window with the
splash's REAL guest geometry, proving the faithful-render pipe end to end.

Architecture: Route A (UI-side replay). A per-draw capture channel carries
each PM4_DRAW_INDX*'s real state to the UI, which replays it through the
existing wgpu Xenos pipeline. The deterministic headless core is untouched:
capture is gated on an Option<Vec<DrawCapture>> that is None in headless
mode and only enabled on the --ui path, so the --gpu-inline n50m golden is
byte-identical (verified 2x).

The hard part was sourcing real vertices. The WGSL VS already does
format-aware vertex fetch from the b4 storage buffer at the address from the
fetch constant -- but b4 was never populated and the fetch address is an
absolute guest dword address. The slice:
  * xenia-gpu/draw_capture.rs: parse the active VS, find its first vertex
    fetch, read that fetch constant, copy a bounded window of guest memory
    at the fetch base. Best-effort: has_real_vertices=false falls back to
    procedural geometry (never fabricated pixels).
  * gpu_system.rs: accumulate one DrawCapture per draw into frame_captures.
  * exports.rs (vd_swap): drain + publish the frame's captures to the UI.
  * ui_bridge/bridge.rs: new publish_geometry channel + UiHandles.geometry.
  * WGSL (interp + translator): rebase the absolute fetch address by a new
    DrawConstants.vertex_base_dwords so it indexes the uploaded window.
  * render.rs: dispatch_xenos_captures uploads each draw's real vertex
    window + matching shader, issues real DrawRequests (real prim type,
    host vertex count, vs/ps keys).
  * app.rs: prefer the real-capture replay; HUD adds real-geo=N counter.

Verified in --ui on Sylpheed: "first Xenos capture batch replayed (real
geometry) captures=24 real_vertex_draws=24" -- all draws resolved a real
guest vertex window; WGSL compiles; no validation errors over 1616 swaps.

Still synthetic-free but not yet pixel-perfect: textures/UVs, DMA index
buffers (auto-index only for now), and kCopy resolve routing are staged
for follow-ups. Faithful: real vertex data, prim types, shaders, constants.

cargo test --workspace green; n50m golden unchanged (2x byte-identical).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-17 22:38:46 +02:00
parent 6bb4355e3d
commit 504592ac13
12 changed files with 509 additions and 65 deletions

View File

@@ -0,0 +1,175 @@
//! Per-draw geometry capture for the host UI's faithful-render path.
//!
//! The deterministic headless core (`check --gpu-inline`) never touches this
//! module — it is populated only when a UI bridge is installed and consumed
//! only by `crates/xenia-ui`. The goal is to hand the UI the *real* guest
//! geometry behind each `PM4_DRAW_INDX*` packet so it can rasterize the
//! actual splash vertices instead of synthetic placeholder shapes.
//!
//! What the WGSL pipeline needs to reconstruct one draw (see
//! `shaders/xenos_interp.wgsl` `vs_main` / `interpret_vertex_fetch`):
//! * the active VS/PS blob keys (already published as assets),
//! * the primitive type + the host vertex count to issue,
//! * the raw guest vertex-buffer bytes for the fetched window, and
//! * the *dword base* of that window so the shader can rebase the absolute
//! fetch-constant address into the uploaded buffer.
//!
//! The hard part is sourcing the vertex window: the VS reads a vertex-fetch
//! constant (`xe_gpu_vertex_fetch_t`) whose dword-0 carries the absolute
//! guest dword address. We parse the active VS, find its first vertex fetch,
//! read that fetch constant out of the register file, then copy a bounded
//! window of guest memory starting at the fetch base.
use xenia_memory::access::MemoryAccess;
use crate::draw_state::{IndexSize, IndexSource, PrimitiveType};
use crate::register_file::RegisterFile;
/// Texture-fetch / vertex-fetch constant region base, in register indices.
/// Each fetch constant is 6 dwords (`xe_gpu_*_fetch_t`).
const CONST_BASE_FETCH: u32 = 0x4800;
/// Upper bound (in dwords) on the vertex window we copy per draw. The splash
/// UI draws are tiny (34 verts × ≤4 dwords); 64 KiB of dwords is generous
/// slack while bounding the per-frame copy cost and the 16 MiB host buffer.
const MAX_WINDOW_DWORDS: u32 = 16 * 1024;
/// One captured draw, with enough real state for the UI to replay it through
/// the existing wgpu Xenos pipeline.
#[derive(Clone, Debug)]
pub struct DrawCapture {
/// Monotonic global draw index (matches `GpuStats::draws_seen` at capture).
pub draw_index: u32,
/// Xenos primitive-type code (see `SwapInfo::last_draw_prim` encoding).
pub prim_code: u32,
/// Host vertex count to issue (post primitive-processor rewrite).
pub host_vertex_count: u32,
/// Active VS blob key at draw time (0 = none).
pub vs_key: u32,
/// Active PS blob key at draw time (0 = none).
pub ps_key: u32,
/// Raw guest dwords of the fetched vertex window (host-endian as stored in
/// guest memory — the WGSL applies the per-format endian swap). `addr 0`
/// of this buffer corresponds to guest dword `window_base_dwords`.
pub vertex_dwords: Vec<u32>,
/// Guest dword address that maps to index 0 of `vertex_dwords`. The shader
/// subtracts this from the fetch-constant base to index `vertex_dwords`.
pub window_base_dwords: u32,
/// `true` when we successfully resolved a real vertex window. When `false`
/// the UI falls back to its procedural geometry for this draw (honest:
/// nothing faked, just "couldn't source real vertices").
pub has_real_vertices: bool,
}
/// Encode a [`PrimitiveType`] as the raw Xenos code used across the bridge.
pub fn prim_code(p: PrimitiveType) -> u32 {
match p {
PrimitiveType::None => 0,
PrimitiveType::PointList => 1,
PrimitiveType::LineList => 2,
PrimitiveType::LineStrip => 3,
PrimitiveType::TriangleList => 4,
PrimitiveType::TriangleFan => 5,
PrimitiveType::TriangleStrip => 6,
PrimitiveType::RectangleList => 8,
PrimitiveType::QuadList => 13,
PrimitiveType::Unknown(x) => x as u32,
}
}
/// Resolve the first vertex-fetch window referenced by the parsed VS.
///
/// Walks the VS instruction stream for the first `vfetch` (mini) instruction,
/// reads its fetch constant from `rf`, and copies a bounded window of guest
/// memory starting at the fetch base. Returns `(dwords, window_base_dwords)`
/// or `None` if the VS has no vertex fetch or the constant is malformed.
fn resolve_vertex_window(
parsed_vs: &crate::ucode::ParsedShader,
rf: &RegisterFile,
mem: &dyn MemoryAccess,
) -> Option<(Vec<u32>, u32)> {
// The instruction block is 3 dwords per ALU/fetch triple. We don't have
// per-triple kind flags here, so we scan every triple and accept the
// first one that decodes as a *vertex* fetch with a plausible constant.
let instrs = &parsed_vs.instructions;
let mut fetch_const: Option<u8> = None;
let mut t = 0usize;
while t + 2 < instrs.len() {
let w0 = instrs[t];
let w1 = instrs[t + 1];
let w2 = instrs[t + 2];
if let crate::ucode::fetch::FetchInstruction::Vertex(vf) =
crate::ucode::fetch::decode_fetch([w0, w1, w2])
{
// Validate the referenced fetch constant is a real vertex fetch
// (type==3, kVertex) before trusting it.
let fc = vf.fetch_const as u32;
let dword0 = rf.read(CONST_BASE_FETCH + fc * 6);
if dword0 & 0x3 == 3 {
fetch_const = Some(vf.fetch_const);
break;
}
}
t += 3;
}
let fc = fetch_const? as u32;
let dword0 = rf.read(CONST_BASE_FETCH + fc * 6);
let dword1 = rf.read(CONST_BASE_FETCH + fc * 6 + 1);
// address:30 at bits[31:2] of dword0 (in bytes once masked).
let base_bytes = dword0 & 0xFFFF_FFFC;
if base_bytes == 0 {
return None;
}
// size:24 at bits[25:2] of dword1, in dwords. Clamp to our window cap.
let size_dwords = ((dword1 >> 2) & 0x00FF_FFFF).clamp(1, MAX_WINDOW_DWORDS);
let window_base_dwords = base_bytes >> 2;
let mut dwords = Vec::with_capacity(size_dwords as usize);
for i in 0..size_dwords {
let addr = base_bytes.wrapping_add(i * 4);
if addr < base_bytes {
break; // wrap guard
}
// `read_u32` composes big-endian bytes into the u32 value; the WGSL's
// `gpu_swap` expects the *raw little-endian dword* as it sits in guest
// memory, so undo the BE composition with `swap_bytes`.
dwords.push(mem.read_u32(addr).swap_bytes());
}
if dwords.is_empty() {
return None;
}
Some((dwords, window_base_dwords))
}
/// Build a [`DrawCapture`] for one draw. Best-effort: when the vertex window
/// can't be resolved, `has_real_vertices` is `false` and the UI falls back to
/// procedural geometry (never fabricated pixels).
#[allow(clippy::too_many_arguments)]
pub fn build(
draw_index: u32,
primitive: PrimitiveType,
host_vertex_count: u32,
_index_source: IndexSource,
_index_size: IndexSize,
vs_key: u32,
ps_key: u32,
parsed_vs: Option<&crate::ucode::ParsedShader>,
rf: &RegisterFile,
mem: &dyn MemoryAccess,
) -> DrawCapture {
let (vertex_dwords, window_base_dwords, has_real) = match parsed_vs
.and_then(|vs| resolve_vertex_window(vs, rf, mem))
{
Some((d, base)) => (d, base, true),
None => (Vec::new(), 0, false),
};
DrawCapture {
draw_index,
prim_code: prim_code(primitive),
host_vertex_count,
vs_key,
ps_key,
vertex_dwords,
window_base_dwords,
has_real_vertices: has_real,
}
}

View File

@@ -436,6 +436,12 @@ pub struct GpuSystem {
/// `GpuSystem::new` and lives for the whole GPU lifetime — no
/// per-frame churn.
pub edram: crate::edram::ShadowEdram,
/// UI-only: when `Some`, every `PM4_DRAW_INDX*` appends a
/// [`crate::draw_capture::DrawCapture`] here so the host UI can replay the
/// real guest geometry. `None` in headless/deterministic mode — the
/// `--gpu-inline` golden never enables this, so capture is entirely inert
/// for `check`. Drained (taken) by `vd_swap` at each present.
pub frame_captures: Option<Vec<crate::draw_capture::DrawCapture>>,
}
impl GpuSystem {
@@ -463,6 +469,15 @@ impl GpuSystem {
texture_cache: crate::texture_cache::TextureCache::new(),
last_draw_textures: Vec::new(),
edram: crate::edram::ShadowEdram::new(),
frame_captures: None,
}
}
/// Enable per-draw geometry capture for the host UI. Inert (and never
/// called) in headless/deterministic mode. Idempotent.
pub fn enable_frame_capture(&mut self) {
if self.frame_captures.is_none() {
self.frame_captures = Some(Vec::new());
}
}
@@ -1295,8 +1310,56 @@ impl GpuSystem {
"gpu: DRAW_INDX captured"
);
self.last_draw = Some(ds);
let host_vertex_count = processed.host_vertex_count;
self.last_primitive = Some(processed);
// iterate-3O: UI-only per-draw geometry capture. Resolves the
// real guest vertex window behind this draw (from the active
// VS's vertex-fetch constant) so the host UI can replay the
// actual splash geometry instead of synthetic shapes. Entirely
// inert in headless/deterministic mode (`frame_captures` is
// `None`), so the `--gpu-inline` golden is unaffected.
if self.frame_captures.is_some() {
let vs_key = self.active_vs_key.unwrap_or(0);
let ps_key = self.active_ps_key.unwrap_or(0);
let parsed_vs = self
.active_vs_key
.and_then(|k| self.shader_blobs.get(&k))
.map(|b| crate::ucode::parse_shader(&b.dwords));
let (idx_src, idx_size) = match ds.index_source {
crate::draw_state::IndexSource::Dma { index_size, .. } => {
(ds.index_source, index_size)
}
crate::draw_state::IndexSource::Immediate { index_size } => {
(ds.index_source, index_size)
}
crate::draw_state::IndexSource::AutoIndex => {
(ds.index_source, crate::draw_state::IndexSize::Sixteen)
}
};
let cap = crate::draw_capture::build(
self.stats.draws_seen as u32,
ds.primitive,
host_vertex_count,
idx_src,
idx_size,
vs_key,
ps_key,
parsed_vs.as_ref(),
&self.register_file,
mem,
);
if let Some(caps) = self.frame_captures.as_mut() {
// Bound the per-frame list so a runaway frame can't grow
// host memory without limit; keep the most recent.
const MAX_CAPS: usize = 4096;
if caps.len() >= MAX_CAPS {
caps.remove(0);
}
caps.push(cap);
}
}
// P5b: decode the textures the *active pixel shader* actually
// samples. Parse the bound PS, collect its `tfetch`
// fetch-constant slots, read each 6-dword fetch constant from

View File

@@ -12,6 +12,7 @@
//! [`gpu_system::GpuSystem`].
pub mod command_processor;
pub mod draw_capture;
pub mod draw_state;
pub mod edram;
pub mod gpu_system;

View File

@@ -20,7 +20,12 @@ struct XenosDrawConstants {
draw_index: u32,
vertex_count: u32,
prim_kind: u32,
_pad: u32,
// iterate-3O: guest dword address that maps to index 0 of `vertex_buffer`.
// The CPU uploads a bounded guest-memory window starting at the active
// vertex-fetch base; the shader subtracts this base from the absolute
// fetch-constant address so it indexes the uploaded window. 0 means "no
// real vertex window" (procedural fallback path).
vertex_base_dwords: u32,
};
struct XenosConstants {
@@ -652,7 +657,15 @@ fn interpret_vertex_fetch(t: u32) {
// dword 1 carries (endian[1:0], size[25:2]).
let fc0 = xenos_consts.fetch[fetch_const * 2u + 0u];
let fc1 = xenos_consts.fetch[fetch_const * 2u + 1u];
let base_dwords = (fc0 & 0xFFFFFFFCu) >> 2u;
// iterate-3O: the fetch constant holds an *absolute* guest dword address.
// The CPU uploaded a window of guest memory starting at
// `draw_ctx.vertex_base_dwords`, so rebase the absolute address into that
// window. When no real window was published (`vertex_base_dwords == 0`)
// keep the absolute value (the `addr < n` guards below then skip the read
// and the procedural fallback position is used).
let abs_base = (fc0 & 0xFFFFFFFCu) >> 2u;
let base_dwords = select(abs_base, abs_base - draw_ctx.vertex_base_dwords,
draw_ctx.vertex_base_dwords != 0u && abs_base >= draw_ctx.vertex_base_dwords);
// GPUBUG-102: per-format endian byte-swap. Xbox 360 vertex data is
// big-endian; the host is little-endian. Pre-fix every dword was
// bitcast as-is — vertex positions were byte-reversed garbage.

View File

@@ -94,7 +94,7 @@ struct XenosDrawConstants {
draw_index: u32,
vertex_count: u32,
prim_kind: u32,
_pad: u32,
vertex_base_dwords: u32,
};
struct XenosConstants {
@@ -418,7 +418,9 @@ impl EmitCtx {
"{{ let fc0 = xenos_consts.fetch[{fc0_idx}u]; \
let fc1 = xenos_consts.fetch[{fc1_idx}u]; \
let endian = fc1 & 0x3u; \
let base = (fc0 & 0xFFFFFFFCu) >> 2u; \
let abs_base = (fc0 & 0xFFFFFFFCu) >> 2u; \
let base = select(abs_base, abs_base - draw_ctx.vertex_base_dwords, \
draw_ctx.vertex_base_dwords != 0u && abs_base >= draw_ctx.vertex_base_dwords); \
let vidx = u32(r[{src_reg}u].x); \
let addr = base + vidx * 4u; \
let n = arrayLength(&vertex_buffer); \