[iterate-2X] Texture pipeline: un-stub RectangleList + draw-time texture decode

Two faithful, deterministic GPU-backend changes that make the texture path
correct for whatever textured draw the splash eventually dispatches. Both are
currently inert on Sylpheed (the textured logo draw is still gated downstream
— see below), but neither shifts the stable-digest golden, so they land safely.

1. Un-stub RectangleList primitive expansion (primitive.rs). The splash submits
   2819 RectangleList draws at 200M, all of which were REJECTED by the P3 stub
   (`gpu.primitive.rejected{rectangle_list}`) → only ~592 flat point/quad draws
   rasterized. Mirror canary's intent (primitive_processor.cc:389-456
   kRectangleListAsTriangleStrip) within our CPU index-rewrite idiom: emit each
   rect's 3 real vertices as one TriangleList triangle (v0,v1,v2), rejected=false,
   faithful host_vertex_count. The full quad (synthesized 4th corner v3=v0+v2-v1)
   needs real vertex fetch in vs_main — left as a documented TODO. Rejection
   warnings drop 2819→0.

2. Draw-time texture decode keyed off the active PS's real tfetch slots
   (gpu_system.rs + exports.rs vd_swap). Previously vd_swap decoded a hardcoded
   fetch-constant slot 0 at swap time. Now the DRAW handler parses the bound
   pixel shader (ucode::parse_shader), collects its tfetch fetch_const slots via
   new shader_metrics::tfetch_slots, reads each 6-dword fetch constant, and
   decode+caches it into GpuSystem::last_draw_textures. vd_swap publishes the
   first of these (UI binds one texture today), falling back to the legacy slot-0
   probe on flat-only frames. New span_max_version helper walks page_version over
   the trait (draw-time &dyn MemoryAccess lacks the heap's inherent
   max_page_version). Pure function of guest writes — deterministic.

Status: texture_decodes stays 0 on Sylpheed because all 6 live shaders are flat
(no tfetch); canary's textured logo shaders E59B2B3D/F7B1457 are not yet
dispatched by ours (a downstream title-state gate, the next frontier). The full
P5 decode→publish→upload→sample path is already wired; this makes the decode
side key off the real shader instead of a guess.

Validation: stable-digest golden sylpheed_n50m unchanged (draws=718 swaps=147
tex=0), regenerated twice byte-identical; 200M run shows 0 RectangleList
rejections. cargo test --workspace green (677, +2: rectangle_list_expansion,
tfetch_slots_extracts_texture_fetch_constants). No temp hooks. Branch only;
not pushed/merged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-14 21:34:43 +02:00
parent a91f4c550b
commit 2f55d1fd7d
4 changed files with 238 additions and 37 deletions

View File

@@ -3116,27 +3116,27 @@ fn vd_swap(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
);
ui.publish_assets(blobs, constants);
// P5: try to decode the primary texture (fetch constant slot 0).
// Slot 0 is the convention most games use for their main bound
// texture at draw time; full N-slot binding waits for P6+. If the
// slot is unset or the format isn't supported (magenta stub kicks
// in host-side), we skip.
//
// Texture fetch constants live at `CONST_BASE_FETCH + slot*6` in
// the register file; we read the 6 dwords, decode the key, hit
// the CPU cache (with page-version freshness), and clone the
// decoded bytes across the bridge.
const TEX_SLOT: u32 = 0;
let mut fetch6 = [0u32; 6];
for (i, slot) in fetch6.iter_mut().enumerate() {
*slot = gpu_inline
.register_file
.read(xenia_gpu::gpu_system::CONST_BASE_FETCH + TEX_SLOT * 6 + i as u32);
}
let published = if let Some(key) = xenia_gpu::texture_cache::decode_fetch_constant(fetch6)
{
// Span over the entire tiled texture footprint to pick the
// max page version covering it.
// P5b: publish the texture the last draw's *active pixel shader*
// actually sampled. The GPU draw handler decodes the PS's real
// `tfetch` fetch-constant slots into `last_draw_textures`; we publish
// the first (the UI binds a single texture today). When the last draw
// used a flat (no-tfetch) shader the list is empty, so we fall back to
// the legacy slot-0 probe to preserve behavior on flat-only frames.
let published = gpu_inline.last_draw_textures.first().cloned().or_else(|| {
// Fallback: probe fetch constant slot 0 directly. Texture fetch
// constants live at `CONST_BASE_FETCH + slot*6` in the register
// file; read 6 dwords, decode the key, hit the CPU cache with
// page-version freshness, clone the bytes across the bridge.
const TEX_SLOT: u32 = 0;
let mut fetch6 = [0u32; 6];
for (i, slot) in fetch6.iter_mut().enumerate() {
*slot = gpu_inline
.register_file
.read(xenia_gpu::gpu_system::CONST_BASE_FETCH + TEX_SLOT * 6 + i as u32);
}
let key = xenia_gpu::texture_cache::decode_fetch_constant(fetch6)?;
// Span over the entire tiled texture footprint to pick the max
// page version covering it.
let bi = key.format.block_info();
let span_bytes = (key.pitch_texels as u32)
* (key.height as u32)
@@ -3154,9 +3154,7 @@ fn vd_swap(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
None
}
}
} else {
None
};
});
metrics::gauge!("gpu.texture_cache.entries")
.set(gpu_inline.texture_cache.len() as f64);
ui.publish_texture(published);