Replace the synthetic placeholder triangle in the --ui window with the
splash's REAL guest geometry, proving the faithful-render pipe end to end.
Architecture: Route A (UI-side replay). A per-draw capture channel carries
each PM4_DRAW_INDX*'s real state to the UI, which replays it through the
existing wgpu Xenos pipeline. The deterministic headless core is untouched:
capture is gated on an Option<Vec<DrawCapture>> that is None in headless
mode and only enabled on the --ui path, so the --gpu-inline n50m golden is
byte-identical (verified 2x).
The hard part was sourcing real vertices. The WGSL VS already does
format-aware vertex fetch from the b4 storage buffer at the address from the
fetch constant -- but b4 was never populated and the fetch address is an
absolute guest dword address. The slice:
* xenia-gpu/draw_capture.rs: parse the active VS, find its first vertex
fetch, read that fetch constant, copy a bounded window of guest memory
at the fetch base. Best-effort: has_real_vertices=false falls back to
procedural geometry (never fabricated pixels).
* gpu_system.rs: accumulate one DrawCapture per draw into frame_captures.
* exports.rs (vd_swap): drain + publish the frame's captures to the UI.
* ui_bridge/bridge.rs: new publish_geometry channel + UiHandles.geometry.
* WGSL (interp + translator): rebase the absolute fetch address by a new
DrawConstants.vertex_base_dwords so it indexes the uploaded window.
* render.rs: dispatch_xenos_captures uploads each draw's real vertex
window + matching shader, issues real DrawRequests (real prim type,
host vertex count, vs/ps keys).
* app.rs: prefer the real-capture replay; HUD adds real-geo=N counter.
Verified in --ui on Sylpheed: "first Xenos capture batch replayed (real
geometry) captures=24 real_vertex_draws=24" -- all draws resolved a real
guest vertex window; WGSL compiles; no validation errors over 1616 swaps.
Still synthetic-free but not yet pixel-perfect: textures/UVs, DMA index
buffers (auto-index only for now), and kCopy resolve routing are staged
for follow-ups. Faithful: real vertex data, prim types, shaders, constants.
cargo test --workspace green; n50m golden unchanged (2x byte-identical).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
200 lines
9.2 KiB
Rust
200 lines
9.2 KiB
Rust
//! Bridge between the kernel (CPU-thread side) and a host UI (main-thread side).
|
|
//!
|
|
//! The kernel side needs to:
|
|
//! - snapshot the latest host gamepad each time a guest calls
|
|
//! `XamInputGetState`, and
|
|
//! - signal the UI when the guest calls `VdSwap` so the UI can upload the
|
|
//! guest's frontbuffer to a wgpu texture and present it.
|
|
//!
|
|
//! Both directions are expressed as trait-object closures so that `xenia-kernel`
|
|
//! does not have to depend on winit/wgpu/gilrs. The [`UiBridge`] is installed
|
|
//! on [`KernelState::ui`] by `cmd_exec` when `--ui` is passed.
|
|
|
|
use std::collections::HashMap;
|
|
use std::sync::Arc;
|
|
use std::sync::atomic::{AtomicBool, AtomicU64};
|
|
|
|
use xenia_gpu::draw_capture::DrawCapture;
|
|
use xenia_gpu::texture_cache::TextureKey;
|
|
use xenia_gpu::xenos_constants::XenosConstantsBlock;
|
|
use xenia_hid::GamepadState;
|
|
use xenia_memory::MemoryAccess;
|
|
|
|
/// Information surfaced to the UI each time the guest presents a frame.
|
|
///
|
|
/// Fields mirror the seven "interesting" arguments to `VdSwap` in
|
|
/// `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc`: the raw
|
|
/// frontbuffer pointer, its dimensions, and the format/color-space enum values
|
|
/// the guest passed through.
|
|
#[derive(Clone, Copy, Debug)]
|
|
pub struct SwapInfo {
|
|
/// Guest physical/virtual address of the frontbuffer to present.
|
|
pub frontbuffer_addr: u32,
|
|
/// Width in pixels as reported by the guest.
|
|
pub width: u32,
|
|
/// Height in pixels as reported by the guest.
|
|
pub height: u32,
|
|
/// Xenos texture format enum (the guest passes a pointer; we dereference
|
|
/// it here). 0 means "unknown / guest passed a null pointer".
|
|
pub texture_format: u32,
|
|
/// Color-space enum (sRGB / BT.709 / …).
|
|
pub color_space: u32,
|
|
/// Monotonically increasing frame counter maintained by the kernel; useful
|
|
/// for HUD display and deduping.
|
|
pub frame_index: u64,
|
|
/// Total PM4 `DRAW_INDX*` packets the GPU has captured since boot.
|
|
/// Surfaced so the UI HUD can show progress even before the full
|
|
/// uber-shader pipeline is wired in.
|
|
pub draws_total: u64,
|
|
/// Total PM4 packets executed, across all opcodes — useful signal for
|
|
/// "is the GPU actually getting anything at all to consume?".
|
|
pub packets_total: u64,
|
|
/// Most-recent draw's Xenos primitive-type code (0 = none yet).
|
|
pub last_draw_prim: u32,
|
|
/// Most-recent draw's vertex count.
|
|
pub last_draw_vertex_count: u32,
|
|
/// Indirect-buffer jumps so far (useful "is the game driving the ring
|
|
/// buffer through IBs?" signal).
|
|
pub indirect_buffer_jumps: u64,
|
|
/// WAIT_REG_MEM stalls observed on the GPU slot.
|
|
pub wait_reg_mem_blocks: u64,
|
|
/// Summed CPU instruction count across all 6 HW threads. Mirrors the
|
|
/// `cycle_count` field each `PpcContext` maintains; gives the HUD a live
|
|
/// "how far has the guest run?" readout.
|
|
pub instructions_total: u64,
|
|
/// Active VS shader blob key at the most recent DRAW_INDX* (0 = none).
|
|
/// P3b: the UI uses this to index into `handles.shader_blobs` so the
|
|
/// Xenos uber-shader interpreter can upload the matching microcode.
|
|
pub vs_blob_key: u32,
|
|
/// Active PS shader blob key at the most recent DRAW_INDX*.
|
|
pub ps_blob_key: u32,
|
|
/// P4: total EDRAM→memory resolves fired since boot (TILE_FLUSH
|
|
/// events). Non-zero means the game is committing pixels.
|
|
pub resolves_total: u64,
|
|
/// Subset of `resolves_total` whose byte-copy path succeeded and wrote
|
|
/// at least one sample into guest memory.
|
|
pub resolves_copied_total: u64,
|
|
/// Subset of `resolves_total` that were skipped by the byte-copy path
|
|
/// due to an unsupported format / MSAA mode / 3D destination.
|
|
pub resolves_skipped_total: u64,
|
|
/// P4: unique RT keys seen (from the GPU's internal render-target
|
|
/// cache). Grows as the game exercises new RT footprints.
|
|
pub unique_render_targets: u64,
|
|
/// P6: total graphics-interrupt callbacks delivered (v-sync + CP).
|
|
/// Non-zero means `VdSetGraphicsInterruptCallback` has been wired end
|
|
/// to end and callbacks are actually running.
|
|
pub interrupts_delivered: u64,
|
|
/// P6: graphics-interrupts queued but dropped (callback unset,
|
|
/// thread 0 blocked, or already inside another callback).
|
|
pub interrupts_dropped: u64,
|
|
}
|
|
|
|
/// Handles the kernel uses to talk to a running host UI.
|
|
///
|
|
/// None of the closures are allowed to block for long — they are called from
|
|
/// the CPU interpreter thread on the hot path.
|
|
#[derive(Clone)]
|
|
pub struct UiBridge {
|
|
/// Snapshot the host gamepad. Called from `XamInputGetState`.
|
|
pub gamepad: Arc<dyn Fn() -> GamepadState + Send + Sync>,
|
|
/// Report that the guest completed a frame. The closure gets the swap
|
|
/// metadata plus a borrow of guest memory so it can copy the frontbuffer
|
|
/// bytes into a UI-owned staging buffer before returning. Called from
|
|
/// `VdSwap` on the CPU thread.
|
|
pub post_swap: Arc<dyn Fn(SwapInfo, &dyn MemoryAccess) + Send + Sync>,
|
|
/// Indicates the UI wants the CPU loop to stop. Checked periodically by
|
|
/// the interpreter loop.
|
|
pub shutdown: Arc<AtomicBool>,
|
|
/// Set to `true` when a gamepad is present. `XamInputGetState` returns
|
|
/// `ERROR_DEVICE_NOT_CONNECTED` when this is `false`.
|
|
pub gamepad_connected: Arc<AtomicBool>,
|
|
/// Live CPU instruction counter mirror. The app's run loop publishes
|
|
/// the sum of `ctx.cycle_count` across HW threads here every ~8k
|
|
/// instructions so the HUD can report progress between VdSwap events.
|
|
pub instructions_counter: Arc<AtomicU64>,
|
|
/// P3b asset publish: `vd_swap` snapshots the GPU's `shader_blobs` and
|
|
/// constants register region and feeds them to the UI so the Xenos
|
|
/// uber-shader interpreter has the microcode + constants needed to
|
|
/// execute the guest draw. Split from `post_swap` so the asset wire
|
|
/// stays optional — if the UI doesn't need them (headless mode) the
|
|
/// closure is a no-op.
|
|
pub publish_xenos_assets:
|
|
Arc<dyn Fn(HashMap<u32, Vec<u32>>, XenosConstantsBlock) + Send + Sync>,
|
|
/// P4 frontbuffer publish: at each `VdSwap`, the kernel CPU-side
|
|
/// detiles the guest frontbuffer (k_8_8_8_8 Tiled2D) into a linear
|
|
/// RGBA8 buffer and hands it to the UI. The closure receives
|
|
/// `(width, height, bytes)` — the UI uploads it as a texture.
|
|
pub publish_frontbuffer:
|
|
Arc<dyn Fn(u32, u32, Vec<u8>) + Send + Sync>,
|
|
/// P5 primary texture publish: at each `VdSwap`, the kernel thread
|
|
/// decodes the PS shader's primary-texture fetch constant (slot 0
|
|
/// for now) and hands the decoded linear bytes + key to the UI so
|
|
/// the xenos pipeline can bind a real texture at `@group(1)`.
|
|
/// Receives `(TextureKey, bytes)`; when `None` is sent the UI
|
|
/// reverts to its magenta stub.
|
|
pub publish_texture:
|
|
Arc<dyn Fn(Option<(TextureKey, Vec<u8>)>) + Send + Sync>,
|
|
/// iterate-3O real-render slice: at each `VdSwap`, the kernel hands the
|
|
/// UI the per-draw geometry captured this frame (one [`DrawCapture`] per
|
|
/// `PM4_DRAW_INDX*`), including the real guest vertex window. The UI
|
|
/// replays them through the Xenos wgpu pipeline so the splash renders its
|
|
/// actual geometry instead of synthetic placeholder shapes. Empty in the
|
|
/// degenerate case (no draws or capture disabled).
|
|
pub publish_geometry:
|
|
Arc<dyn Fn(Vec<DrawCapture>) + Send + Sync>,
|
|
}
|
|
|
|
impl UiBridge {
|
|
/// Snapshot input state (user 0 only; higher indices are unconnected).
|
|
pub fn snapshot_gamepad(&self) -> GamepadState {
|
|
(self.gamepad)()
|
|
}
|
|
|
|
/// True iff a gamepad is connected for user 0.
|
|
pub fn is_connected(&self, user_index: u32) -> bool {
|
|
user_index == 0
|
|
&& self
|
|
.gamepad_connected
|
|
.load(std::sync::atomic::Ordering::Relaxed)
|
|
}
|
|
|
|
/// Push a swap event to the UI thread.
|
|
pub fn notify_swap(&self, info: SwapInfo, mem: &dyn MemoryAccess) {
|
|
(self.post_swap)(info, mem);
|
|
}
|
|
|
|
/// Snapshot current shader blobs + constants and hand them to the UI.
|
|
/// Call from `vd_swap` so the UI has the matching assets for every
|
|
/// draw captured in this frame.
|
|
pub fn publish_assets(
|
|
&self,
|
|
blobs: HashMap<u32, Vec<u32>>,
|
|
constants: XenosConstantsBlock,
|
|
) {
|
|
(self.publish_xenos_assets)(blobs, constants);
|
|
}
|
|
|
|
/// True iff the UI asked for shutdown.
|
|
pub fn should_shutdown(&self) -> bool {
|
|
self.shutdown.load(std::sync::atomic::Ordering::Relaxed)
|
|
}
|
|
|
|
/// Hand a detiled frontbuffer frame to the UI. Called at most once per
|
|
/// `VdSwap`. `bytes` must be `width * height * 4` bytes in
|
|
/// `Rgba8Unorm` order (the UI pipeline's expected layout).
|
|
pub fn publish_frontbuffer(&self, width: u32, height: u32, bytes: Vec<u8>) {
|
|
(self.publish_frontbuffer)(width, height, bytes);
|
|
}
|
|
|
|
/// Hand one decoded guest texture to the UI. `Some` = update the bound
|
|
/// slot; `None` = revert to the magenta stub.
|
|
pub fn publish_texture(&self, tex: Option<(TextureKey, Vec<u8>)>) {
|
|
(self.publish_texture)(tex);
|
|
}
|
|
|
|
/// Hand this frame's captured per-draw geometry to the UI.
|
|
pub fn publish_geometry(&self, caps: Vec<DrawCapture>) {
|
|
(self.publish_geometry)(caps);
|
|
}
|
|
}
|