[iterate-2X] Texture pipeline: un-stub RectangleList + draw-time texture decode
Two faithful, deterministic GPU-backend changes that make the texture path
correct for whatever textured draw the splash eventually dispatches. Both are
currently inert on Sylpheed (the textured logo draw is still gated downstream
— see below), but neither shifts the stable-digest golden, so they land safely.
1. Un-stub RectangleList primitive expansion (primitive.rs). The splash submits
2819 RectangleList draws at 200M, all of which were REJECTED by the P3 stub
(`gpu.primitive.rejected{rectangle_list}`) → only ~592 flat point/quad draws
rasterized. Mirror canary's intent (primitive_processor.cc:389-456
kRectangleListAsTriangleStrip) within our CPU index-rewrite idiom: emit each
rect's 3 real vertices as one TriangleList triangle (v0,v1,v2), rejected=false,
faithful host_vertex_count. The full quad (synthesized 4th corner v3=v0+v2-v1)
needs real vertex fetch in vs_main — left as a documented TODO. Rejection
warnings drop 2819→0.
2. Draw-time texture decode keyed off the active PS's real tfetch slots
(gpu_system.rs + exports.rs vd_swap). Previously vd_swap decoded a hardcoded
fetch-constant slot 0 at swap time. Now the DRAW handler parses the bound
pixel shader (ucode::parse_shader), collects its tfetch fetch_const slots via
new shader_metrics::tfetch_slots, reads each 6-dword fetch constant, and
decode+caches it into GpuSystem::last_draw_textures. vd_swap publishes the
first of these (UI binds one texture today), falling back to the legacy slot-0
probe on flat-only frames. New span_max_version helper walks page_version over
the trait (draw-time &dyn MemoryAccess lacks the heap's inherent
max_page_version). Pure function of guest writes — deterministic.
Status: texture_decodes stays 0 on Sylpheed because all 6 live shaders are flat
(no tfetch); canary's textured logo shaders E59B2B3D/F7B1457 are not yet
dispatched by ours (a downstream title-state gate, the next frontier). The full
P5 decode→publish→upload→sample path is already wired; this makes the decode
side key off the real shader instead of a guess.
Validation: stable-digest golden sylpheed_n50m unchanged (draws=718 swaps=147
tex=0), regenerated twice byte-identical; 200M run shows 0 RectangleList
rejections. cargo test --workspace green (677, +2: rectangle_list_expansion,
tfetch_slots_extracts_texture_fetch_constants). No temp hooks. Branch only;
not pushed/merged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -174,6 +174,49 @@ pub fn emit_for(parsed: &ParsedShader, stage: &'static str) {
|
||||
}
|
||||
}
|
||||
|
||||
/// Collect the unique texture-fetch-constant slot indices a shader samples.
|
||||
///
|
||||
/// Walks the same exec-clause / sequence-bitmap path as [`emit_for`] but only
|
||||
/// extracts `TextureFetch.fetch_const` slots, deduplicated and in first-seen
|
||||
/// order. The GPU draw handler uses this to decide which fetch constants to
|
||||
/// decode + cache at draw time (keyed off the *active* pixel shader's real
|
||||
/// `tfetch` instructions rather than a hardcoded slot).
|
||||
pub fn tfetch_slots(parsed: &ParsedShader) -> Vec<u8> {
|
||||
let mut slots: Vec<u8> = Vec::new();
|
||||
for clause in &parsed.cf {
|
||||
if let ControlFlowInstruction::Exec {
|
||||
address,
|
||||
count,
|
||||
sequence,
|
||||
..
|
||||
} = clause
|
||||
{
|
||||
for i in 0..(*count as usize) {
|
||||
let base = (*address as usize + i) * 3;
|
||||
if base + 2 >= parsed.instructions.len() {
|
||||
break;
|
||||
}
|
||||
// sequence bit layout: 2 bits per triple, hi bit = is-fetch.
|
||||
let is_fetch = ((sequence >> (i * 2 + 1)) & 1) != 0;
|
||||
if !is_fetch {
|
||||
continue;
|
||||
}
|
||||
let words = [
|
||||
parsed.instructions[base],
|
||||
parsed.instructions[base + 1],
|
||||
parsed.instructions[base + 2],
|
||||
];
|
||||
if let FetchInstruction::Texture(tf) = decode_fetch(words) {
|
||||
if !slots.contains(&tf.fetch_const) {
|
||||
slots.push(tf.fetch_const);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
slots
|
||||
}
|
||||
|
||||
fn mark_feature(buf: &mut Vec<&'static str>, name: &'static str) {
|
||||
if !buf.contains(&name) {
|
||||
buf.push(name);
|
||||
@@ -298,6 +341,46 @@ mod tests {
|
||||
emit_for(&shader, "vs");
|
||||
}
|
||||
|
||||
/// `tfetch_slots` should extract the fetch-constant slot of a texture
|
||||
/// fetch (and dedup), and return empty for a flat ALU-only shader.
|
||||
#[test]
|
||||
fn tfetch_slots_extracts_texture_fetch_constants() {
|
||||
// word0: opcode TEXTURE_FETCH (0x01) in low 5 bits, fetch_const=3 in
|
||||
// bits[9:5] → 0x01 | (3 << 5) = 0x61.
|
||||
let tfetch_w0: u32 = 0x01 | (3u32 << 5);
|
||||
let shader = ParsedShader {
|
||||
cf: vec![
|
||||
ControlFlowInstruction::Exec {
|
||||
address: 0,
|
||||
count: 2,
|
||||
// triple 0 is a fetch (hi bit of its 2-bit field set),
|
||||
// triple 1 is ALU. is_fetch = (sequence >> (i*2+1)) & 1.
|
||||
sequence: 0b00_10,
|
||||
is_end: false,
|
||||
predicated: false,
|
||||
predicate_condition: false,
|
||||
},
|
||||
ControlFlowInstruction::Exit,
|
||||
],
|
||||
instructions: vec![tfetch_w0, 0, 0, /* ALU triple */ 0, 0, 0],
|
||||
};
|
||||
assert_eq!(tfetch_slots(&shader), vec![3]);
|
||||
|
||||
// Flat shader: no fetch bits → no slots.
|
||||
let flat = ParsedShader {
|
||||
cf: vec![ControlFlowInstruction::Exec {
|
||||
address: 0,
|
||||
count: 1,
|
||||
sequence: 0,
|
||||
is_end: false,
|
||||
predicated: false,
|
||||
predicate_condition: false,
|
||||
}],
|
||||
instructions: vec![0, 0, 0],
|
||||
};
|
||||
assert!(tfetch_slots(&flat).is_empty());
|
||||
}
|
||||
|
||||
/// P8: a shader containing `LoopStart` should mark `cf_loop` as used
|
||||
/// so the HUD can surface which deferred feature a game triggers.
|
||||
#[test]
|
||||
|
||||
Reference in New Issue
Block a user