[iterate-3O] Real-render slice: replay guest geometry in --ui (Route A)

Replace the synthetic placeholder triangle in the --ui window with the
splash's REAL guest geometry, proving the faithful-render pipe end to end.

Architecture: Route A (UI-side replay). A per-draw capture channel carries
each PM4_DRAW_INDX*'s real state to the UI, which replays it through the
existing wgpu Xenos pipeline. The deterministic headless core is untouched:
capture is gated on an Option<Vec<DrawCapture>> that is None in headless
mode and only enabled on the --ui path, so the --gpu-inline n50m golden is
byte-identical (verified 2x).

The hard part was sourcing real vertices. The WGSL VS already does
format-aware vertex fetch from the b4 storage buffer at the address from the
fetch constant -- but b4 was never populated and the fetch address is an
absolute guest dword address. The slice:
  * xenia-gpu/draw_capture.rs: parse the active VS, find its first vertex
    fetch, read that fetch constant, copy a bounded window of guest memory
    at the fetch base. Best-effort: has_real_vertices=false falls back to
    procedural geometry (never fabricated pixels).
  * gpu_system.rs: accumulate one DrawCapture per draw into frame_captures.
  * exports.rs (vd_swap): drain + publish the frame's captures to the UI.
  * ui_bridge/bridge.rs: new publish_geometry channel + UiHandles.geometry.
  * WGSL (interp + translator): rebase the absolute fetch address by a new
    DrawConstants.vertex_base_dwords so it indexes the uploaded window.
  * render.rs: dispatch_xenos_captures uploads each draw's real vertex
    window + matching shader, issues real DrawRequests (real prim type,
    host vertex count, vs/ps keys).
  * app.rs: prefer the real-capture replay; HUD adds real-geo=N counter.

Verified in --ui on Sylpheed: "first Xenos capture batch replayed (real
geometry) captures=24 real_vertex_draws=24" -- all draws resolved a real
guest vertex window; WGSL compiles; no validation errors over 1616 swaps.

Still synthetic-free but not yet pixel-perfect: textures/UVs, DMA index
buffers (auto-index only for now), and kCopy resolve routing are staged
for follow-ups. Faithful: real vertex data, prim types, shaders, constants.

cargo test --workspace green; n50m golden unchanged (2x byte-identical).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-17 22:38:46 +02:00
parent 6bb4355e3d
commit 504592ac13
12 changed files with 509 additions and 65 deletions

View File

@@ -181,10 +181,11 @@ impl App {
y += line_h;
let (fbw, fbh) = rs.frontbuffer_size();
let render_line = format!(
"Render: xdispatch: xlated={:>5} interp={:>5} xlated-pipelines={:>3} tex-cache={:>3} fb={}x{}",
"Render: xdispatch: xlated={:>5} interp={:>5} xlated-pipelines={:>3} real-geo={:>5} tex-cache={:>3} fb={}x{}",
rs.xenos_dispatches_translator,
rs.xenos_dispatches_interpreter,
rs.translated_pipeline_count(),
rs.real_geometry_draws(),
rs.host_texture_count(),
fbw,
fbh,
@@ -372,49 +373,6 @@ impl ApplicationHandler<SwapEvent> for App {
self.last_xenos_swap_frame = frame_idx;
}
let delta = (draws_total - already) as u32;
let (verts_hint, prim_kind, vs_key, ps_key) = self
.last_swap_info
.map(|s| {
(
s.last_draw_vertex_count.max(3),
s.last_draw_prim,
s.vs_blob_key,
s.ps_blob_key,
)
})
.unwrap_or((3, 4, 0, 0));
// Look up blobs + constants from the bridge and
// pack into the WGSL-interpreter layout. Empty
// slices produce zero-clause packed buffers — the
// WGSL walker short-circuits and the placeholder
// export path still renders.
let raw_vs: Vec<u32> = self
.handles
.shader_blobs
.lock()
.ok()
.and_then(|g| g.get(&vs_key).cloned())
.unwrap_or_default();
let raw_ps: Vec<u32> = self
.handles
.shader_blobs
.lock()
.ok()
.and_then(|g| g.get(&ps_key).cloned())
.unwrap_or_default();
let parsed_vs = xenia_gpu::ucode::parse_shader(&raw_vs);
let parsed_ps = xenia_gpu::ucode::parse_shader(&raw_ps);
// First time we see a blob key, run the static
// metrics analyzer. Keyed on (stage_tag, blob_key)
// because the guest can reuse a key across stages.
if self.seen_shader_blobs.insert((0u8, vs_key)) {
xenia_gpu::shader_metrics::emit_for(&parsed_vs, "vs");
}
if self.seen_shader_blobs.insert((1u8, ps_key)) {
xenia_gpu::shader_metrics::emit_for(&parsed_ps, "ps");
}
let vs_packed = xenia_gpu::ucode::pack_for_wgsl(&parsed_vs);
let ps_packed = xenia_gpu::ucode::pack_for_wgsl(&parsed_ps);
let constants = self
.handles
.xenos_constants
@@ -431,19 +389,72 @@ impl ApplicationHandler<SwapEvent> for App {
.ok()
.and_then(|g| g.clone());
rs.bind_primary_texture(tex_payload);
rs.dispatch_xenos_draws(
already,
delta,
verts_hint,
prim_kind,
vs_key,
ps_key,
&parsed_vs,
&parsed_ps,
&vs_packed,
&ps_packed,
&constants,
);
// iterate-3O real-render slice: prefer replaying the
// *real* captured guest geometry. The kernel publishes
// one `DrawCapture` per `PM4_DRAW_INDX*` this frame
// (real vertices + prim type + shader keys). Fall back
// to the legacy synthetic dispatch only when no capture
// is available (e.g. capture disabled), so we never
// regress to a blank screen.
let captures: Vec<xenia_gpu::draw_capture::DrawCapture> = self
.handles
.geometry
.lock()
.map(|g| g.clone())
.unwrap_or_default();
let blobs: std::collections::HashMap<u32, Vec<u32>> = self
.handles
.shader_blobs
.lock()
.map(|g| g.clone())
.unwrap_or_default();
if !captures.is_empty() {
rs.dispatch_xenos_captures(
&captures,
&blobs,
&constants,
&mut self.seen_shader_blobs,
);
} else {
// Legacy synthetic-geometry fallback (placeholder).
let (verts_hint, prim_kind, vs_key, ps_key) = self
.last_swap_info
.map(|s| {
(
s.last_draw_vertex_count.max(3),
s.last_draw_prim,
s.vs_blob_key,
s.ps_blob_key,
)
})
.unwrap_or((3, 4, 0, 0));
let raw_vs = blobs.get(&vs_key).cloned().unwrap_or_default();
let raw_ps = blobs.get(&ps_key).cloned().unwrap_or_default();
let parsed_vs = xenia_gpu::ucode::parse_shader(&raw_vs);
let parsed_ps = xenia_gpu::ucode::parse_shader(&raw_ps);
if self.seen_shader_blobs.insert((0u8, vs_key)) {
xenia_gpu::shader_metrics::emit_for(&parsed_vs, "vs");
}
if self.seen_shader_blobs.insert((1u8, ps_key)) {
xenia_gpu::shader_metrics::emit_for(&parsed_ps, "ps");
}
let vs_packed = xenia_gpu::ucode::pack_for_wgsl(&parsed_vs);
let ps_packed = xenia_gpu::ucode::pack_for_wgsl(&parsed_ps);
rs.dispatch_xenos_draws(
already,
delta,
verts_hint,
prim_kind,
vs_key,
ps_key,
&parsed_vs,
&parsed_ps,
&vs_packed,
&ps_packed,
&constants,
);
}
}
} else {
Self::ingest_frontbuffer(

View File

@@ -18,6 +18,7 @@ use std::sync::Mutex;
use crossbeam_utils::atomic::AtomicCell;
use winit::event_loop::EventLoopProxy;
use xenia_gpu::draw_capture::DrawCapture;
use xenia_gpu::texture_cache::TextureKey;
use xenia_gpu::xenos_constants::XenosConstantsBlock;
use xenia_hid::GamepadState;
@@ -66,6 +67,10 @@ pub struct UiHandles {
/// fetch-constant slot 0 into linear bytes that the UI should
/// upload into the host cache and bind at `@group(1) @binding(0)`.
pub primary_texture: Arc<Mutex<Option<(TextureKey, Vec<u8>)>>>,
/// iterate-3O: the most recent frame's captured per-draw geometry. The
/// redraw path drains this to replay real guest draws. Replaced wholesale
/// each `VdSwap`.
pub geometry: Arc<Mutex<Vec<DrawCapture>>>,
}
/// Swap event posted by the CPU-side `VdSwap` handler via
@@ -89,6 +94,7 @@ pub fn build(proxy: EventLoopProxy<SwapEvent>) -> (UiHandles, UiBridge) {
let xenos_constants = Arc::new(Mutex::new(XenosConstantsBlock::default()));
let primary_texture: Arc<Mutex<Option<(TextureKey, Vec<u8>)>>> =
Arc::new(Mutex::new(None));
let geometry: Arc<Mutex<Vec<DrawCapture>>> = Arc::new(Mutex::new(Vec::new()));
let kernel_bridge = UiBridge {
gamepad: {
@@ -144,6 +150,14 @@ pub fn build(proxy: EventLoopProxy<SwapEvent>) -> (UiHandles, UiBridge) {
}
})
},
publish_geometry: {
let geo = Arc::clone(&geometry);
Arc::new(move |caps| {
if let Ok(mut lock) = geo.lock() {
*lock = caps;
}
})
},
};
let handles = UiHandles {
@@ -155,6 +169,7 @@ pub fn build(proxy: EventLoopProxy<SwapEvent>) -> (UiHandles, UiBridge) {
shader_blobs,
xenos_constants,
primary_texture,
geometry,
};
(handles, kernel_bridge)
}

View File

@@ -84,6 +84,9 @@ pub struct RenderState {
/// the shader, or (c) we're running the slow interpreter path.
pub xenos_dispatches_translator: u64,
pub xenos_dispatches_interpreter: u64,
/// iterate-3O: running total of replayed draws that carried a real guest
/// vertex window (vs. the procedural fallback). Surfaced on the HUD.
real_geometry_draws: u64,
/// One-shot latch so we emit a tracing::info! on the **first** real
/// draw dispatch rather than spamming every frame. Pairs with the
/// "first translator compile" latch below.
@@ -447,6 +450,7 @@ impl RenderState {
fallback_rgb: [0.06, 0.06, 0.09],
xenos_pipeline,
xenos_draws_rendered: 0,
real_geometry_draws: 0,
xenos_dispatches_translator: 0,
xenos_dispatches_interpreter: 0,
first_dispatch_logged: false,
@@ -657,6 +661,8 @@ impl RenderState {
draw_index: idx,
vertex_count: vertex_count_hint.max(3),
prim_kind,
// Synthetic fallback path: no real vertex window.
vertex_base_dwords: 0,
};
if use_translated
&& let Some(p) = self.xenos_pipeline.translated_pipeline(vs_key, ps_key) {
@@ -707,12 +713,135 @@ impl RenderState {
}
}
/// iterate-3O real-render slice: replay a batch of *real* captured guest
/// draws. Unlike [`dispatch_xenos_draws`] (synthetic placeholder geometry),
/// each [`DrawCapture`] carries the actual guest vertex window, primitive
/// type, host vertex count, and the real (vs, ps) keys. Per capture we:
/// 1. upload the captured guest vertex bytes into `vertex_buffer` (b4),
/// 2. upload the matching VS/PS microcode + per-frame constants,
/// 3. render through the translated (P7) pipeline if it compiled, else
/// the interpreter — with `vertex_base_dwords` set so the shader
/// rebases its absolute fetch address into the uploaded window.
///
/// Returns the number of captures that had a real vertex window (vs. the
/// procedural fallback), for HUD reporting. `shader_blobs` / `constants`
/// come from the bridge; `seen` records which blobs have had static
/// metrics emitted (one-shot per blob, matching the legacy path).
pub fn dispatch_xenos_captures(
&mut self,
captures: &[xenia_gpu::draw_capture::DrawCapture],
shader_blobs: &std::collections::HashMap<u32, Vec<u32>>,
constants: &xenia_gpu::xenos_constants::XenosConstantsBlock,
seen: &mut std::collections::HashSet<(u8, u32)>,
) -> u32 {
if captures.is_empty() {
return 0;
}
let mut real_count = 0u32;
let mut encoder = self
.device
.create_command_encoder(&wgpu::CommandEncoderDescriptor {
label: Some("xenos capture replay"),
});
for cap in captures {
let raw_vs = shader_blobs.get(&cap.vs_key).cloned().unwrap_or_default();
let raw_ps = shader_blobs.get(&cap.ps_key).cloned().unwrap_or_default();
let parsed_vs = xenia_gpu::ucode::parse_shader(&raw_vs);
let parsed_ps = xenia_gpu::ucode::parse_shader(&raw_ps);
if seen.insert((0u8, cap.vs_key)) {
xenia_gpu::shader_metrics::emit_for(&parsed_vs, "vs");
}
if seen.insert((1u8, cap.ps_key)) {
xenia_gpu::shader_metrics::emit_for(&parsed_ps, "ps");
}
let vs_packed = xenia_gpu::ucode::pack_for_wgsl(&parsed_vs);
let ps_packed = xenia_gpu::ucode::pack_for_wgsl(&parsed_ps);
// Upload this draw's shader + constants + real vertex window.
self.xenos_pipeline.upload_shader_and_constants(
&self.queue,
&vs_packed,
&ps_packed,
constants,
);
if cap.has_real_vertices && !cap.vertex_dwords.is_empty() {
self.xenos_pipeline
.upload_vertex_data(&self.queue, &cap.vertex_dwords);
real_count += 1;
}
let use_translated = cap.vs_key != 0
&& cap.ps_key != 0
&& ensure_translated_pipeline(
&mut self.xenos_pipeline,
&self.device,
cap.vs_key,
cap.ps_key,
&parsed_vs,
&parsed_ps,
);
let base = if cap.has_real_vertices {
cap.window_base_dwords
} else {
0
};
let req = DrawRequest {
draw_index: cap.draw_index,
vertex_count: cap.host_vertex_count.max(3),
prim_kind: cap.prim_code,
vertex_base_dwords: base,
};
if use_translated
&& let Some(p) = self.xenos_pipeline.translated_pipeline(cap.vs_key, cap.ps_key)
{
self.xenos_pipeline.render_one_with_pipeline(
&self.queue,
&mut encoder,
&self.frontbuffer_view,
req,
p,
);
self.xenos_dispatches_translator =
self.xenos_dispatches_translator.saturating_add(1);
} else {
self.xenos_pipeline.render_one(
&self.queue,
&mut encoder,
&self.frontbuffer_view,
req,
);
self.xenos_dispatches_interpreter =
self.xenos_dispatches_interpreter.saturating_add(1);
}
}
self.queue.submit(std::iter::once(encoder.finish()));
self.xenos_draws_rendered = self
.xenos_draws_rendered
.saturating_add(captures.len() as u64);
self.real_geometry_draws = self
.real_geometry_draws
.saturating_add(real_count as u64);
if !self.first_dispatch_logged {
self.first_dispatch_logged = true;
tracing::info!(
captures = captures.len(),
real_vertex_draws = real_count,
"first Xenos capture batch replayed (real geometry)"
);
}
real_count
}
/// Count of distinct translator pipelines compiled so far. Surfaced
/// on the HUD as `xlated=N` to make "is P7 working?" observable.
pub fn translated_pipeline_count(&self) -> usize {
self.xenos_pipeline.translated_pipeline_count()
}
/// Running count of captured draws that carried a real vertex window
/// (surfaced on the HUD). Updated by [`dispatch_xenos_captures`].
pub fn real_geometry_draws(&self) -> u64 {
self.real_geometry_draws
}
/// Clear the frontbuffer to `[r,g,b,a]` in linear space. Matches the
/// fallback clear the outer swapchain render does so the two stages
/// agree on "no draws yet = dark navy".

View File

@@ -36,7 +36,9 @@ struct DrawConstants {
draw_index: u32,
vertex_count: u32,
prim_kind: u32,
_pad: u32,
/// iterate-3O: guest dword base of the uploaded `vertex_buffer` window.
/// The WGSL subtracts this from the absolute vertex-fetch address.
vertex_base_dwords: u32,
}
/// Submitted to [`XenosPipeline::render_one`] to render one captured draw.
@@ -48,6 +50,9 @@ pub struct DrawRequest {
pub vertex_count: u32,
/// Xenos primitive-type code; shader may branch on it in P3b+.
pub prim_kind: u32,
/// iterate-3O: guest dword base of the per-draw vertex window uploaded to
/// `vertex_buffer` (b4). 0 = no real vertex window (procedural fallback).
pub vertex_base_dwords: u32,
}
/// Reasonable upper bound on a single shader blob (dwords). Most Xbox 360
@@ -193,7 +198,7 @@ impl XenosPipeline {
draw_index: 0,
vertex_count: 3,
prim_kind: 4,
_pad: 0,
vertex_base_dwords: 0,
};
let draw_ctx_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
label: Some("xenos draw ctx"),
@@ -480,7 +485,7 @@ impl XenosPipeline {
draw_index: req.draw_index,
vertex_count: req.vertex_count.max(3),
prim_kind: req.prim_kind,
_pad: 0,
vertex_base_dwords: req.vertex_base_dwords,
};
queue.write_buffer(&self.draw_ctx_buffer, 0, bytemuck::bytes_of(&cb));
@@ -606,7 +611,7 @@ impl XenosPipeline {
draw_index: req.draw_index,
vertex_count: req.vertex_count.max(3),
prim_kind: req.prim_kind,
_pad: 0,
vertex_base_dwords: req.vertex_base_dwords,
};
queue.write_buffer(&self.draw_ctx_buffer, 0, bytemuck::bytes_of(&cb));