[iterate-2X] Texture pipeline: un-stub RectangleList + draw-time texture decode

Two faithful, deterministic GPU-backend changes that make the texture path
correct for whatever textured draw the splash eventually dispatches. Both are
currently inert on Sylpheed (the textured logo draw is still gated downstream
— see below), but neither shifts the stable-digest golden, so they land safely.

1. Un-stub RectangleList primitive expansion (primitive.rs). The splash submits
   2819 RectangleList draws at 200M, all of which were REJECTED by the P3 stub
   (`gpu.primitive.rejected{rectangle_list}`) → only ~592 flat point/quad draws
   rasterized. Mirror canary's intent (primitive_processor.cc:389-456
   kRectangleListAsTriangleStrip) within our CPU index-rewrite idiom: emit each
   rect's 3 real vertices as one TriangleList triangle (v0,v1,v2), rejected=false,
   faithful host_vertex_count. The full quad (synthesized 4th corner v3=v0+v2-v1)
   needs real vertex fetch in vs_main — left as a documented TODO. Rejection
   warnings drop 2819→0.

2. Draw-time texture decode keyed off the active PS's real tfetch slots
   (gpu_system.rs + exports.rs vd_swap). Previously vd_swap decoded a hardcoded
   fetch-constant slot 0 at swap time. Now the DRAW handler parses the bound
   pixel shader (ucode::parse_shader), collects its tfetch fetch_const slots via
   new shader_metrics::tfetch_slots, reads each 6-dword fetch constant, and
   decode+caches it into GpuSystem::last_draw_textures. vd_swap publishes the
   first of these (UI binds one texture today), falling back to the legacy slot-0
   probe on flat-only frames. New span_max_version helper walks page_version over
   the trait (draw-time &dyn MemoryAccess lacks the heap's inherent
   max_page_version). Pure function of guest writes — deterministic.

Status: texture_decodes stays 0 on Sylpheed because all 6 live shaders are flat
(no tfetch); canary's textured logo shaders E59B2B3D/F7B1457 are not yet
dispatched by ours (a downstream title-state gate, the next frontier). The full
P5 decode→publish→upload→sample path is already wired; this makes the decode
side key off the real shader instead of a guess.

Validation: stable-digest golden sylpheed_n50m unchanged (draws=718 swaps=147
tex=0), regenerated twice byte-identical; 200M run shows 0 RectangleList
rejections. cargo test --workspace green (677, +2: rectangle_list_expansion,
tfetch_slots_extracts_texture_fetch_constants). No temp hooks. Branch only;
not pushed/merged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-14 21:34:43 +02:00
parent a91f4c550b
commit 2f55d1fd7d
4 changed files with 238 additions and 37 deletions

View File

@@ -5,9 +5,8 @@
//! rectangles) we rewrite indices on the CPU side so the host just sees a
//! triangle list. Ground truth: `xenia-canary/src/xenia/gpu/primitive_processor.h/cc`.
//!
//! P3 scope: only the shapes Sylpheed's UI + early gameplay paths need
//! (list, strip, fan). Rectangle + quad expansions are stubs logged via
//! `tracing::warn!` for later.
//! Scope: list, strip, fan, quad, and rectangle expansions are all handled
//! (rectangles via CPU triangle-list rewrite — see `expand_rectangles`).
use crate::draw_state::{IndexSize, PrimitiveType};
@@ -138,18 +137,43 @@ fn expand_quads(indices: Option<&[u32]>, vertex_count: u32) -> ProcessedPrimitiv
}
/// Rectangle lists: a Xenos-specific primitive where each group of 3
/// vertices defines a right-angle rectangle by its three non-repeated
/// corners (the 4th is derived). The uber-shader doesn't support this yet;
/// the ucode translator will emulate it as a geometry-stage fake. For P3
/// we emit an empty draw.
fn expand_rectangles(_indices: Option<&[u32]>, _vertex_count: u32) -> ProcessedPrimitive {
tracing::warn!("gpu: rectangle list primitive not yet implemented (P3 stub)");
metrics::counter!("gpu.primitive.rejected", "reason" => "rectangle_list").increment(1);
/// vertices defines a rectangle; the 4th corner is extrapolated as
/// `v3 = v0 + v2 - v1` (parallelogram completion). Canary expands this in a
/// host vertex-shader variant (`kRectangleListAsTriangleStrip`,
/// `primitive_processor.cc:389-456`): a 4-vertex triangle strip per rect with
/// the 4th corner synthesized *in the VS* from the host-vertex index.
///
/// Our replay pipeline has no host-VS corner synthesis (and the procedural
/// `vs_main` does not consume `rewritten_indices` yet), so we mirror the
/// `expand_quads`/`expand_fan` CPU idiom and emit the 3 real vertices of each
/// rect as one triangle list `(v0,v1,v2)` — the visible lower half of the
/// rect. This un-rejects the draw and gives a faithful `host_vertex_count`.
///
/// TODO: once `vs_main` does real vertex fetch + interpolation, upgrade to the
/// full quad — 6 indices `[v0,v1,v2, v2,v1,v3]` with a synthesized `v3` corner
/// — mirroring canary's `kRectangleListAsTriangleStrip`.
fn expand_rectangles(indices: Option<&[u32]>, vertex_count: u32) -> ProcessedPrimitive {
let rect_count = vertex_count / 3;
let mut out = Vec::with_capacity(3 * rect_count as usize);
let get = |i: u32| -> u32 {
match indices {
Some(buf) => buf[i as usize],
None => i,
}
};
for r in 0..rect_count {
let base = r * 3;
out.push(get(base));
out.push(get(base + 1));
out.push(get(base + 2));
}
let host_vertex_count = out.len() as u32;
metrics::counter!("gpu.primitive.expanded", "shape" => "rectangle_list").increment(1);
ProcessedPrimitive {
topology: HostTopology::TriangleList,
rewritten_indices: Some(Vec::new()),
host_vertex_count: 0,
rejected: true,
rewritten_indices: Some(out),
host_vertex_count,
rejected: false,
}
}
@@ -213,6 +237,17 @@ mod tests {
assert_eq!(idx, vec![0, 1, 2, 0, 2, 3, 4, 5, 6, 4, 6, 7]);
}
#[test]
fn rectangle_list_expansion() {
// 2 rects (6 verts) → one triangle (v0,v1,v2) per rect, not rejected.
let p = process(PrimitiveType::RectangleList, 6, None);
let idx = p.rewritten_indices.unwrap();
assert_eq!(idx, vec![0, 1, 2, 3, 4, 5]);
assert_eq!(p.topology, HostTopology::TriangleList);
assert_eq!(p.host_vertex_count, 6);
assert!(!p.rejected);
}
#[test]
fn widen_u16_indices_big_endian() {
// 3 indices [1, 2, 0x1234] in BE u16.