[iterate-2X] Texture pipeline: un-stub RectangleList + draw-time texture decode

Two faithful, deterministic GPU-backend changes that make the texture path
correct for whatever textured draw the splash eventually dispatches. Both are
currently inert on Sylpheed (the textured logo draw is still gated downstream
— see below), but neither shifts the stable-digest golden, so they land safely.

1. Un-stub RectangleList primitive expansion (primitive.rs). The splash submits
   2819 RectangleList draws at 200M, all of which were REJECTED by the P3 stub
   (`gpu.primitive.rejected{rectangle_list}`) → only ~592 flat point/quad draws
   rasterized. Mirror canary's intent (primitive_processor.cc:389-456
   kRectangleListAsTriangleStrip) within our CPU index-rewrite idiom: emit each
   rect's 3 real vertices as one TriangleList triangle (v0,v1,v2), rejected=false,
   faithful host_vertex_count. The full quad (synthesized 4th corner v3=v0+v2-v1)
   needs real vertex fetch in vs_main — left as a documented TODO. Rejection
   warnings drop 2819→0.

2. Draw-time texture decode keyed off the active PS's real tfetch slots
   (gpu_system.rs + exports.rs vd_swap). Previously vd_swap decoded a hardcoded
   fetch-constant slot 0 at swap time. Now the DRAW handler parses the bound
   pixel shader (ucode::parse_shader), collects its tfetch fetch_const slots via
   new shader_metrics::tfetch_slots, reads each 6-dword fetch constant, and
   decode+caches it into GpuSystem::last_draw_textures. vd_swap publishes the
   first of these (UI binds one texture today), falling back to the legacy slot-0
   probe on flat-only frames. New span_max_version helper walks page_version over
   the trait (draw-time &dyn MemoryAccess lacks the heap's inherent
   max_page_version). Pure function of guest writes — deterministic.

Status: texture_decodes stays 0 on Sylpheed because all 6 live shaders are flat
(no tfetch); canary's textured logo shaders E59B2B3D/F7B1457 are not yet
dispatched by ours (a downstream title-state gate, the next frontier). The full
P5 decode→publish→upload→sample path is already wired; this makes the decode
side key off the real shader instead of a guess.

Validation: stable-digest golden sylpheed_n50m unchanged (draws=718 swaps=147
tex=0), regenerated twice byte-identical; 200M run shows 0 RectangleList
rejections. cargo test --workspace green (677, +2: rectangle_list_expansion,
tfetch_slots_extracts_texture_fetch_constants). No temp hooks. Branch only;
not pushed/merged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-14 21:34:43 +02:00
parent a91f4c550b
commit 2f55d1fd7d
4 changed files with 238 additions and 37 deletions

View File

@@ -78,6 +78,30 @@ pub fn physical_to_backing(addr: u32) -> u32 {
}
}
/// Max guest page-version over the `[base, base+len)` span, walking 4 KiB
/// pages via the `MemoryAccess` trait's `page_version`.
///
/// The concrete heap exposes an inherent `max_page_version(base, len)`, but
/// the draw handler only holds `&dyn MemoryAccess` (which carries the coarser
/// `page_version(addr)` accessor). This is byte-equivalent to
/// `heap::max_page_version` and stays a pure function of the per-page write
/// counters (no wall-clock), so texture-decode timing remains deterministic.
fn span_max_version(mem: &dyn MemoryAccess, base: u32, len: u32) -> u64 {
const PAGE: u32 = 0x1000;
let last = base.saturating_add(len.saturating_sub(1));
let mut page = base & !(PAGE - 1);
let last_page = last & !(PAGE - 1);
let mut max = 0u64;
loop {
max = max.max(mem.page_version(page));
if page >= last_page {
break;
}
page = page.wrapping_add(PAGE);
}
max
}
/// Cached Xenos microcode blob, produced by `PM4_IM_LOAD*` packets.
#[derive(Debug, Clone)]
pub struct ShaderBlob {
@@ -400,6 +424,12 @@ pub struct GpuSystem {
/// on every texture-fetch resolution; the UI thread sees the decoded
/// bytes via `UiBridge::publish_texture`.
pub texture_cache: crate::texture_cache::TextureCache,
/// P5b: textures decoded at the most recent `PM4_DRAW_INDX*`, keyed off
/// the *active* pixel shader's real `tfetch` fetch-constant slots (not a
/// hardcoded slot). `vd_swap` publishes the first of these to the UI so
/// the replay binds the texture the draw actually samples. Cleared and
/// repopulated each draw; empty when the active PS issues no `tfetch`.
pub last_draw_textures: Vec<(crate::texture_cache::TextureKey, Vec<u8>)>,
/// 10 MiB shadow of the Xenos EDRAM. Written by clear-resolves and
/// (future) host-render-target readback; read by the resolve byte-copy
/// path that writes tiled pixels into guest memory. Allocated once at
@@ -431,6 +461,7 @@ impl GpuSystem {
rt_cache: crate::render_target_cache::RenderTargetCache::new(),
last_resolve: None,
texture_cache: crate::texture_cache::TextureCache::new(),
last_draw_textures: Vec::new(),
edram: crate::edram::ShadowEdram::new(),
}
}
@@ -1265,6 +1296,60 @@ impl GpuSystem {
);
self.last_draw = Some(ds);
self.last_primitive = Some(processed);
// P5b: decode the textures the *active pixel shader* actually
// samples. Parse the bound PS, collect its `tfetch`
// fetch-constant slots, read each 6-dword fetch constant from
// the register file, and decode+cache it. `vd_swap` publishes
// the result. Empty for flat (no-tfetch) shaders — the
// dominant case on Sylpheed's current splash, where this stays
// inert until the textured logo draw is reached.
self.last_draw_textures.clear();
if let Some(ps_key) = self.active_ps_key {
// Collect slots under an immutable borrow of `shader_blobs`,
// then drop it before mutating `texture_cache`.
let slots: Vec<u8> = match self.shader_blobs.get(&ps_key) {
Some(blob) => {
let parsed = crate::ucode::parse_shader(&blob.dwords);
crate::shader_metrics::tfetch_slots(&parsed)
}
None => Vec::new(),
};
for slot in slots {
let mut fetch6 = [0u32; 6];
for (k, w) in fetch6.iter_mut().enumerate() {
*w = self
.register_file
.read(CONST_BASE_FETCH + slot as u32 * 6 + k as u32);
}
let Some(key) = crate::texture_cache::decode_fetch_constant(fetch6) else {
continue;
};
let bi = key.format.block_info();
let span_bytes = (key.pitch_texels as u32)
* (key.height as u32)
* (bi.bytes_per_block as u32)
/ (bi.block_w as u32);
let version = span_max_version(mem, key.base_address, span_bytes.max(4));
match self.texture_cache.ensure_cached(key, version, mem) {
Ok(entry) => {
self.last_draw_textures.push((entry.key, entry.bytes.clone()));
metrics::counter!(
"gpu.texture.decode",
"fmt" => format!("{:?}", key.format),
)
.increment(1);
}
Err(e) => {
metrics::counter!(
"gpu.texture.reject",
"reason" => format!("{e:?}"),
)
.increment(1);
}
}
}
}
}
pm4::PM4_SET_CONSTANT | pm4::PM4_SET_SHADER_CONSTANTS => {
// payload[0] = offset_type — bits[10:0] index, bits[23:16] type