Compare commits

...

4 Commits

Author SHA1 Message Date
MechaCat02
6bb4355e3d [iterate-3M] Fix Xenos shader CF/fetch decode so the textured logo binds
The publisher splash (title idx0) rendered FLAT in ours while canary samples
a texture: ours never decoded the logo's textured pixel shader
(E59B2B3D, a `tfetch2D` sprite) even though our guest IM_LOADs the exact same
microcode canary does (verified byte-identical against the Wine oracle). The
shader was misparsed as flat. Three coupled bugs in the ucode decoder, all
off vs canary `gpu/ucode.h`:

1. CF opcode table was off-by-one (`control_flow.rs`): mapped opcode 0→Exec
   and 1→Exit, but Xenos has 0=kNop, 1=kExec, 2=kExecEnd, 3..6/13..14 the
   cond-exec variants, 7/8 loop, 9/10 call/return, 11 condjmp, 12 alloc,
   15 mark-vs-fetch-done. So a real `kExec` clause was read as a terminal
   `Exit`, truncating the CF block and dropping every instruction (incl. the
   `tfetch`) after it. Added Nop/MarkVsFetchDone variants; parse now ends on
   an END-bit exec clause.

2. exec/loop `address` is an absolute instruction-triple index from shader
   dword 0, but indexed our post-CF `instructions` slice directly
   (`ucode/mod.rs`). Rebase addresses by the CF triple count so `address*3`
   lands on the right instruction.

3. Fetch instruction bitfields were wrong (`ucode/fetch.rs`): `const_index`
   read from bit 5 (actually `src_reg`) instead of bit 20, and texture
   `dimension` from dword1 instead of dword2 bit14. The logo's `tfetch ..,tf0`
   was read as `tf1`, whose empty fetch-constant failed to decode → no
   texture. Also the `sequence` fetch/ALU bit is bit[0] of each pair, not
   bit[1] (`shader_metrics.rs`, `translator.rs`, `xenos_interp.wgsl`).

Result (--gpu-inline, deterministic 2x): the active PS's `tfetch_slots` now
resolves slot 0, the tf0 fetch-constant decodes (fmt K8888), and
`gpu.texture.decode` fires (137x at -n 50M; texture_cache_entries 0→1, the
only golden field that changed — all draw/swap counts unchanged). The same
fixes correct the WGSL uber-shader's fetch/CF walk for the threaded/--ui path.

Added a regression test that parses the real E59B2B3D microcode and asserts a
tfetch slot is found. Golden re-baselined (texture_cache_entries 0→1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 21:53:35 +02:00
MechaCat02
3f5d5cf5f7 [iterate-2Z] Implement NtSetInformationFile FileRenameInformation for cache: files
The GamePart title-logo gate first-divergence: Sylpheed's asset cache
decompresses each packed resource to a staging `cache:\<hash><tail>.tmp`
file, then renames it into its final nested path `cache:\<hash>\<dir>\<file>`
(e.g. the title logo texture `\69d8e45c\e\534ffea`) via
NtSetInformationFile class 10 (XFileRenameInformation). Our handler treated
class 10 as a permissive no-op (catch-all `_ => STATUS_SUCCESS`), so the host
rename never happened: the nested target directories were created but left
EMPTY while the decompressed data stayed in the flat `.tmp` file. When the
title later reads back `\69d8e45c\...` to build the logo texture the read
misses, so the textured logo pixel shader (canary `E59B2B3D`, tfetch2D) is
never dispatched and the logo never renders.

Fix: implement class 10 faithfully, mirroring canary
`xboxkrnl_io_info.cc:226` (`X_FILE_RENAME_INFORMATION{ replace_existing@0,
root_dir_handle@4, ANSI_STRING@8 }` -> `file->Rename(TranslateAnsiPath)`).
Read the target path from the embedded ANSI_STRING at info_ptr+8, resolve it
against the host cache backing dir (`resolve_cache_path`), create the parent
dirs, `std::fs::rename` the backing file, and update the handle's `path` +
`host_path`. Non-cache (read-only VFS) sources keep the prior permissive
acknowledge. Verified at runtime: 20 renames/80M now move
`69d8e45ce534ffea.tmp -> 69d8e45c/e/534ffea` etc., and the nested cache tree
now matches canary's HostPathDevice layout byte-for-byte (data present, not
empty dirs).

Made `path::read_ansi_string` pub so the handler can parse the rename target.

Deterministic + golden-invariant: two `check --gpu-inline --stable-digest
-n 50000000` runs are byte-identical and the 50M stable digest is unchanged
(draws=718/swaps=147/6 shaders/tex=0); the logo read-back occurs later than
the observable window so GPU counters at 1B/2.5B are unchanged
(2.5B: draws=48734, swaps=16060, still 6 flat shaders, texture_decodes=0).
The fix is a verified-necessary precondition — without it the nested asset
read-back is guaranteed to miss. A downstream gate (the 2nd title thread's
load-completion post skipped when its notify target `[r29+8]==0`, and the
later read-back phase being beyond 2.5B) remains for follow-up.

New test: `nt_set_information_file_rename_moves_cache_file` (678 total, was
677).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 21:33:25 +02:00
MechaCat02
2f55d1fd7d [iterate-2X] Texture pipeline: un-stub RectangleList + draw-time texture decode
Two faithful, deterministic GPU-backend changes that make the texture path
correct for whatever textured draw the splash eventually dispatches. Both are
currently inert on Sylpheed (the textured logo draw is still gated downstream
— see below), but neither shifts the stable-digest golden, so they land safely.

1. Un-stub RectangleList primitive expansion (primitive.rs). The splash submits
   2819 RectangleList draws at 200M, all of which were REJECTED by the P3 stub
   (`gpu.primitive.rejected{rectangle_list}`) → only ~592 flat point/quad draws
   rasterized. Mirror canary's intent (primitive_processor.cc:389-456
   kRectangleListAsTriangleStrip) within our CPU index-rewrite idiom: emit each
   rect's 3 real vertices as one TriangleList triangle (v0,v1,v2), rejected=false,
   faithful host_vertex_count. The full quad (synthesized 4th corner v3=v0+v2-v1)
   needs real vertex fetch in vs_main — left as a documented TODO. Rejection
   warnings drop 2819→0.

2. Draw-time texture decode keyed off the active PS's real tfetch slots
   (gpu_system.rs + exports.rs vd_swap). Previously vd_swap decoded a hardcoded
   fetch-constant slot 0 at swap time. Now the DRAW handler parses the bound
   pixel shader (ucode::parse_shader), collects its tfetch fetch_const slots via
   new shader_metrics::tfetch_slots, reads each 6-dword fetch constant, and
   decode+caches it into GpuSystem::last_draw_textures. vd_swap publishes the
   first of these (UI binds one texture today), falling back to the legacy slot-0
   probe on flat-only frames. New span_max_version helper walks page_version over
   the trait (draw-time &dyn MemoryAccess lacks the heap's inherent
   max_page_version). Pure function of guest writes — deterministic.

Status: texture_decodes stays 0 on Sylpheed because all 6 live shaders are flat
(no tfetch); canary's textured logo shaders E59B2B3D/F7B1457 are not yet
dispatched by ours (a downstream title-state gate, the next frontier). The full
P5 decode→publish→upload→sample path is already wired; this makes the decode
side key off the real shader instead of a guess.

Validation: stable-digest golden sylpheed_n50m unchanged (draws=718 swaps=147
tex=0), regenerated twice byte-identical; 200M run shows 0 RectangleList
rejections. cargo test --workspace green (677, +2: rectangle_list_expansion,
tfetch_slots_extracts_texture_fetch_constants). No temp hooks. Branch only;
not pushed/merged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 21:34:43 +02:00
MechaCat02
a91f4c550b [iterate-2W] Sustain the title present loop: viewport-size register + ISR CPU impersonation
The title's per-frame loop (sub_822F1AA8) is clock-B-paced and only re-fires
when the swap count [controller+88] changes, which advances only on source=1
CP swap-complete interrupts. Each present batch the guest submits (via the
sub_824CE348 -> sub_824BF4D0 builder) ends with a WAIT_REG_MEM on a per-CPU
swap-acknowledge fence [GCTX+0] (GCTX = [device+10772]); the GPU parks there
until the graphics ISR (sub_824BE9A0) clears that CPU's bit. Two coupled gaps
kept ours emitting only ONE source=1 then dead-locking (draws plateaued at 28,
run halted ~19.27M):

1. GPU MMIO register 0x1961 (AVIVO_D1MODE_VIEWPORT_SIZE) read as 0. The swap
   callback sub_824CE2B8 divides by its low 12 bits (display height) as a
   refresh-pacing term, so a 0 read tripped its `twi` divide-by-zero guard and
   aborted the ISR before it reached the fence-clear. Mirror canary
   GraphicsSystem::ReadRegister (graphics_system.cc:311): return 0x050002D0
   (1280x720).

2. The ISR ran on an arbitrary borrowed thread, so [r13+268] (the PCR
   processor number) did not match the interrupt's target CPU. The ISR clears
   `1 << current_cpu` from the fence; running on the wrong CPU cleared the
   wrong bit and the fence (bit 2, from cpu_mask 0x4) never reached 0. Carry
   the target CPU through the interrupt queue (bit index of the PM4_INTERRUPT
   cpu_mask for CP, 2 for vsync per canary DispatchInterruptCallback(0, 2)) and
   impersonate it on the borrowed thread's PCR around the ISR, mirroring canary
   EmulateCPInterruptDPC -> XThread::SetActiveCpu.

With both fixes the fence clears, the GPU drains each present batch, source=1
sustains per-present, clock B advances, and the loop runs continuously. Draws
climb linearly with the budget (no re-stall): 50M 28->718, 200M ->3411,
1B ->18734; swaps 2->147/950/6060. No "Unanticipated CPU_INTERRUPT" trap.
Inline-deterministic (--stable-digest byte-identical x2); n50m golden
re-baselined. 675 tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 20:49:32 +02:00
14 changed files with 667 additions and 138 deletions

View File

@@ -2338,10 +2338,22 @@ fn coord_post_round(
} }
if kernel.gpu.has_pending_interrupts() { if kernel.gpu.has_pending_interrupts() {
for _pi in kernel.gpu.take_pending_interrupts() { for pi in kernel.gpu.take_pending_interrupts() {
// Canary `ExecutePacketType3_INTERRUPT` dispatches the callback
// once per set bit of `cpu_mask` with that bit's index as the
// target CPU (`DispatchInterruptCallback(1, n)`). The guest's
// swap-acknowledge fence stores `cpu_mask`, and the ISR clears
// `1 << current_cpu` from it — so the ISR must run impersonating
// the masked CPU or the fence never reaches 0. Sylpheed uses a
// single-bit mask (`0x4` → CPU 2); take the lowest set bit.
let cpu = if pi.cpu_mask == 0 {
xenia_kernel::interrupts::VSYNC_TARGET_CPU
} else {
pi.cpu_mask.trailing_zeros().min(5) as u8
};
kernel kernel
.interrupts .interrupts
.queue_interrupt(xenia_kernel::INTERRUPT_SOURCE_CP); .queue_interrupt(xenia_kernel::INTERRUPT_SOURCE_CP, cpu);
} }
} }
@@ -3545,7 +3557,17 @@ fn dispatch_graphics_interrupts(
None None
}; };
/// X_KPCR offset of `prcb_data.current_cpu` (canary `xthread.cc`
/// `SetActiveCpu` → `pcr.prcb_data.current_cpu`). The guest graphics
/// ISR reads it via `lbz r10, 268(r13)` to decide which per-CPU bit of
/// the swap-acknowledge fence to clear.
const PCR_CURRENT_CPU_OFF: u32 = 268;
while let Some(source) = kernel.interrupts.peek_next() { while let Some(source) = kernel.interrupts.peek_next() {
let target_cpu = kernel
.interrupts
.peek_next_cpu()
.unwrap_or(xenia_kernel::interrupts::VSYNC_TARGET_CPU);
// Victim selection: Ready first, then Blocked (canary's // Victim selection: Ready first, then Blocked (canary's
// `XThread::GetCurrentThread()` analog — any live thread will // `XThread::GetCurrentThread()` analog — any live thread will
// do for borrowing context). Skip Idle/Exited/ServicingIrq. // do for borrowing context). Skip Idle/Exited/ServicingIrq.
@@ -3615,6 +3637,19 @@ fn dispatch_graphics_interrupts(
saved saved
}; };
// Impersonate the interrupt's target CPU on the borrowed thread's
// PCR, mirroring canary `EmulateCPInterruptDPC` →
// `XThread::SetActiveCpu(cpu)`. The guest swap-complete ISR clears
// `1 << [pcr.current_cpu]` from the per-present swap-acknowledge
// fence; if it runs on the wrong CPU it clears the wrong bit and
// the GPU's trailing `WAIT_REG_MEM` on that fence never releases —
// stranding the present/title loop. Save/restore so borrowing a
// thread doesn't permanently rewrite its processor number.
let pcr_addr = (kernel.scheduler.ctx_mut_ref(target_ref).gpr[13] as u32)
.wrapping_add(PCR_CURRENT_CPU_OFF);
let saved_cpu = mem.read_u8(pcr_addr);
mem.write_u8(pcr_addr, target_cpu);
// Stash the previous `scheduler.current` (call_export reaches // Stash the previous `scheduler.current` (call_export reaches
// it; imports the ISR calls must dispatch on the borrowed // it; imports the ISR calls must dispatch on the borrowed
// thread). Restore on the way out. // thread). Restore on the way out.
@@ -3707,6 +3742,7 @@ fn dispatch_graphics_interrupts(
// Restore the borrowed context. // Restore the borrowed context.
saved.restore(kernel.scheduler.ctx_mut_ref(target_ref)); saved.restore(kernel.scheduler.ctx_mut_ref(target_ref));
mem.write_u8(pcr_addr, saved_cpu);
kernel.scheduler.current = prev_current; kernel.scheduler.current = prev_current;
kernel.interrupts.delivered += 1; kernel.interrupts.delivered += 1;

View File

@@ -1,10 +1,10 @@
{ {
"instructions": 19274336, "instructions": 50000014,
"imports": 72513, "imports": 352251,
"unimpl": 0, "unimpl": 0,
"draws": 28, "draws": 718,
"swaps": 2, "swaps": 147,
"unique_render_targets": 2, "unique_render_targets": 2,
"shader_blobs_live": 3, "shader_blobs_live": 6,
"texture_cache_entries": 0 "texture_cache_entries": 1
} }

View File

@@ -78,6 +78,30 @@ pub fn physical_to_backing(addr: u32) -> u32 {
} }
} }
/// Max guest page-version over the `[base, base+len)` span, walking 4 KiB
/// pages via the `MemoryAccess` trait's `page_version`.
///
/// The concrete heap exposes an inherent `max_page_version(base, len)`, but
/// the draw handler only holds `&dyn MemoryAccess` (which carries the coarser
/// `page_version(addr)` accessor). This is byte-equivalent to
/// `heap::max_page_version` and stays a pure function of the per-page write
/// counters (no wall-clock), so texture-decode timing remains deterministic.
fn span_max_version(mem: &dyn MemoryAccess, base: u32, len: u32) -> u64 {
const PAGE: u32 = 0x1000;
let last = base.saturating_add(len.saturating_sub(1));
let mut page = base & !(PAGE - 1);
let last_page = last & !(PAGE - 1);
let mut max = 0u64;
loop {
max = max.max(mem.page_version(page));
if page >= last_page {
break;
}
page = page.wrapping_add(PAGE);
}
max
}
/// Cached Xenos microcode blob, produced by `PM4_IM_LOAD*` packets. /// Cached Xenos microcode blob, produced by `PM4_IM_LOAD*` packets.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct ShaderBlob { pub struct ShaderBlob {
@@ -400,6 +424,12 @@ pub struct GpuSystem {
/// on every texture-fetch resolution; the UI thread sees the decoded /// on every texture-fetch resolution; the UI thread sees the decoded
/// bytes via `UiBridge::publish_texture`. /// bytes via `UiBridge::publish_texture`.
pub texture_cache: crate::texture_cache::TextureCache, pub texture_cache: crate::texture_cache::TextureCache,
/// P5b: textures decoded at the most recent `PM4_DRAW_INDX*`, keyed off
/// the *active* pixel shader's real `tfetch` fetch-constant slots (not a
/// hardcoded slot). `vd_swap` publishes the first of these to the UI so
/// the replay binds the texture the draw actually samples. Cleared and
/// repopulated each draw; empty when the active PS issues no `tfetch`.
pub last_draw_textures: Vec<(crate::texture_cache::TextureKey, Vec<u8>)>,
/// 10 MiB shadow of the Xenos EDRAM. Written by clear-resolves and /// 10 MiB shadow of the Xenos EDRAM. Written by clear-resolves and
/// (future) host-render-target readback; read by the resolve byte-copy /// (future) host-render-target readback; read by the resolve byte-copy
/// path that writes tiled pixels into guest memory. Allocated once at /// path that writes tiled pixels into guest memory. Allocated once at
@@ -431,6 +461,7 @@ impl GpuSystem {
rt_cache: crate::render_target_cache::RenderTargetCache::new(), rt_cache: crate::render_target_cache::RenderTargetCache::new(),
last_resolve: None, last_resolve: None,
texture_cache: crate::texture_cache::TextureCache::new(), texture_cache: crate::texture_cache::TextureCache::new(),
last_draw_textures: Vec::new(),
edram: crate::edram::ShadowEdram::new(), edram: crate::edram::ShadowEdram::new(),
} }
} }
@@ -1265,6 +1296,60 @@ impl GpuSystem {
); );
self.last_draw = Some(ds); self.last_draw = Some(ds);
self.last_primitive = Some(processed); self.last_primitive = Some(processed);
// P5b: decode the textures the *active pixel shader* actually
// samples. Parse the bound PS, collect its `tfetch`
// fetch-constant slots, read each 6-dword fetch constant from
// the register file, and decode+cache it. `vd_swap` publishes
// the result. Empty for flat (no-tfetch) shaders — the
// dominant case on Sylpheed's current splash, where this stays
// inert until the textured logo draw is reached.
self.last_draw_textures.clear();
if let Some(ps_key) = self.active_ps_key {
// Collect slots under an immutable borrow of `shader_blobs`,
// then drop it before mutating `texture_cache`.
let slots: Vec<u8> = match self.shader_blobs.get(&ps_key) {
Some(blob) => {
let parsed = crate::ucode::parse_shader(&blob.dwords);
crate::shader_metrics::tfetch_slots(&parsed)
}
None => Vec::new(),
};
for slot in slots {
let mut fetch6 = [0u32; 6];
for (k, w) in fetch6.iter_mut().enumerate() {
*w = self
.register_file
.read(CONST_BASE_FETCH + slot as u32 * 6 + k as u32);
}
let Some(key) = crate::texture_cache::decode_fetch_constant(fetch6) else {
continue;
};
let bi = key.format.block_info();
let span_bytes = (key.pitch_texels as u32)
* (key.height as u32)
* (bi.bytes_per_block as u32)
/ (bi.block_w as u32);
let version = span_max_version(mem, key.base_address, span_bytes.max(4));
match self.texture_cache.ensure_cached(key, version, mem) {
Ok(entry) => {
self.last_draw_textures.push((entry.key, entry.bytes.clone()));
metrics::counter!(
"gpu.texture.decode",
"fmt" => format!("{:?}", key.format),
)
.increment(1);
}
Err(e) => {
metrics::counter!(
"gpu.texture.reject",
"reason" => format!("{e:?}"),
)
.increment(1);
}
}
}
}
} }
pm4::PM4_SET_CONSTANT | pm4::PM4_SET_SHADER_CONSTANTS => { pm4::PM4_SET_CONSTANT | pm4::PM4_SET_SHADER_CONSTANTS => {
// payload[0] = offset_type — bits[10:0] index, bits[23:16] type // payload[0] = offset_type — bits[10:0] index, bits[23:16] type
@@ -1544,6 +1629,15 @@ pub mod reg {
/// `XE_GPU_REG_D1MODE_VBLANK_VLINE_STATUS` (Canary register_table.inc:1126). /// `XE_GPU_REG_D1MODE_VBLANK_VLINE_STATUS` (Canary register_table.inc:1126).
/// Bit 0 = VBLANK_INT_OCCURRED. /// Bit 0 = VBLANK_INT_OCCURRED.
pub const D1MODE_VBLANK_VLINE_STATUS: u32 = 0x1951; pub const D1MODE_VBLANK_VLINE_STATUS: u32 = 0x1951;
/// `XE_GPU_REG_D1MODE_VIEWPORT_SIZE` / `AVIVO_D1MODE_VIEWPORT_SIZE`
/// (Canary `register_table.inc:1134`). Packs the active display resolution
/// as `(width << 16) | height` with 12-bit fields. The guest's
/// swap-complete interrupt callback (`sub_824CE2B8`) divides by the low
/// 12 bits (`height`) as a refresh-pacing term, so a 0 read makes its
/// `twi` divide-by-zero guard trap and abort the ISR before it clears the
/// swap-acknowledge fence. Canary returns the constant below from
/// `GraphicsSystem::ReadRegister` (graphics_system.cc:311).
pub const D1MODE_VIEWPORT_SIZE: u32 = 0x1961;
/// `XE_GPU_REG_VGT_EVENT_INITIATOR` — set by EVENT_WRITE. /// `XE_GPU_REG_VGT_EVENT_INITIATOR` — set by EVENT_WRITE.
pub const VGT_EVENT_INITIATOR: u32 = 0x21F9; pub const VGT_EVENT_INITIATOR: u32 = 0x21F9;
/// `XE_GPU_REG_COHER_STATUS_HOST` — coherency bits /// `XE_GPU_REG_COHER_STATUS_HOST` — coherency bits

View File

@@ -58,6 +58,15 @@ pub fn build_region(mmio: &GpuMmio) -> MmioRegion {
reg::D1MODE_VBLANK_VLINE_STATUS => { reg::D1MODE_VBLANK_VLINE_STATUS => {
read_vblank_status.load(Ordering::Relaxed) read_vblank_status.load(Ordering::Relaxed)
} }
// AVIVO_D1MODE_VIEWPORT_SIZE: the active display resolution
// (1280x720) packed as `(width << 16) | height`. Canary
// serves this constant from `GraphicsSystem::ReadRegister`
// (graphics_system.cc:311). The guest swap-complete interrupt
// callback divides by the low 12 bits (`height = 0x2D0`); a 0
// read trips its `twi` divide-guard and aborts the ISR before
// it acknowledges the per-present swap fence — which strands
// the present/title loop. Mirror canary exactly.
reg::D1MODE_VIEWPORT_SIZE => 0x0500_02D0,
_ => { _ => {
tracing::trace!( tracing::trace!(
reg = format_args!("{reg_index:#x}"), reg = format_args!("{reg_index:#x}"),

View File

@@ -5,9 +5,8 @@
//! rectangles) we rewrite indices on the CPU side so the host just sees a //! rectangles) we rewrite indices on the CPU side so the host just sees a
//! triangle list. Ground truth: `xenia-canary/src/xenia/gpu/primitive_processor.h/cc`. //! triangle list. Ground truth: `xenia-canary/src/xenia/gpu/primitive_processor.h/cc`.
//! //!
//! P3 scope: only the shapes Sylpheed's UI + early gameplay paths need //! Scope: list, strip, fan, quad, and rectangle expansions are all handled
//! (list, strip, fan). Rectangle + quad expansions are stubs logged via //! (rectangles via CPU triangle-list rewrite — see `expand_rectangles`).
//! `tracing::warn!` for later.
use crate::draw_state::{IndexSize, PrimitiveType}; use crate::draw_state::{IndexSize, PrimitiveType};
@@ -138,18 +137,43 @@ fn expand_quads(indices: Option<&[u32]>, vertex_count: u32) -> ProcessedPrimitiv
} }
/// Rectangle lists: a Xenos-specific primitive where each group of 3 /// Rectangle lists: a Xenos-specific primitive where each group of 3
/// vertices defines a right-angle rectangle by its three non-repeated /// vertices defines a rectangle; the 4th corner is extrapolated as
/// corners (the 4th is derived). The uber-shader doesn't support this yet; /// `v3 = v0 + v2 - v1` (parallelogram completion). Canary expands this in a
/// the ucode translator will emulate it as a geometry-stage fake. For P3 /// host vertex-shader variant (`kRectangleListAsTriangleStrip`,
/// we emit an empty draw. /// `primitive_processor.cc:389-456`): a 4-vertex triangle strip per rect with
fn expand_rectangles(_indices: Option<&[u32]>, _vertex_count: u32) -> ProcessedPrimitive { /// the 4th corner synthesized *in the VS* from the host-vertex index.
tracing::warn!("gpu: rectangle list primitive not yet implemented (P3 stub)"); ///
metrics::counter!("gpu.primitive.rejected", "reason" => "rectangle_list").increment(1); /// Our replay pipeline has no host-VS corner synthesis (and the procedural
/// `vs_main` does not consume `rewritten_indices` yet), so we mirror the
/// `expand_quads`/`expand_fan` CPU idiom and emit the 3 real vertices of each
/// rect as one triangle list `(v0,v1,v2)` — the visible lower half of the
/// rect. This un-rejects the draw and gives a faithful `host_vertex_count`.
///
/// TODO: once `vs_main` does real vertex fetch + interpolation, upgrade to the
/// full quad — 6 indices `[v0,v1,v2, v2,v1,v3]` with a synthesized `v3` corner
/// — mirroring canary's `kRectangleListAsTriangleStrip`.
fn expand_rectangles(indices: Option<&[u32]>, vertex_count: u32) -> ProcessedPrimitive {
let rect_count = vertex_count / 3;
let mut out = Vec::with_capacity(3 * rect_count as usize);
let get = |i: u32| -> u32 {
match indices {
Some(buf) => buf[i as usize],
None => i,
}
};
for r in 0..rect_count {
let base = r * 3;
out.push(get(base));
out.push(get(base + 1));
out.push(get(base + 2));
}
let host_vertex_count = out.len() as u32;
metrics::counter!("gpu.primitive.expanded", "shape" => "rectangle_list").increment(1);
ProcessedPrimitive { ProcessedPrimitive {
topology: HostTopology::TriangleList, topology: HostTopology::TriangleList,
rewritten_indices: Some(Vec::new()), rewritten_indices: Some(out),
host_vertex_count: 0, host_vertex_count,
rejected: true, rejected: false,
} }
} }
@@ -213,6 +237,17 @@ mod tests {
assert_eq!(idx, vec![0, 1, 2, 0, 2, 3, 4, 5, 6, 4, 6, 7]); assert_eq!(idx, vec![0, 1, 2, 0, 2, 3, 4, 5, 6, 4, 6, 7]);
} }
#[test]
fn rectangle_list_expansion() {
// 2 rects (6 verts) → one triangle (v0,v1,v2) per rect, not rejected.
let p = process(PrimitiveType::RectangleList, 6, None);
let idx = p.rewritten_indices.unwrap();
assert_eq!(idx, vec![0, 1, 2, 3, 4, 5]);
assert_eq!(p.topology, HostTopology::TriangleList);
assert_eq!(p.host_vertex_count, 6);
assert!(!p.rejected);
}
#[test] #[test]
fn widen_u16_indices_big_endian() { fn widen_u16_indices_big_endian() {
// 3 indices [1, 2, 0x1234] in BE u16. // 3 indices [1, 2, 0x1234] in BE u16.

View File

@@ -45,8 +45,9 @@ pub fn emit_for(parsed: &ParsedShader, stage: &'static str) {
parsed.instructions[base + 1], parsed.instructions[base + 1],
parsed.instructions[base + 2], parsed.instructions[base + 2],
]; ];
// sequence bit layout: 2 bits per triple, hi bit = is-fetch. // sequence: 2 bits per instruction — bit[0]=fetch(1)/ALU(0),
let is_fetch = ((sequence >> (i * 2 + 1)) & 1) != 0; // bit[1]=serialize (Xenos `ucode.h:226`).
let is_fetch = ((sequence >> (i * 2)) & 1) != 0;
if is_fetch { if is_fetch {
match decode_fetch(words) { match decode_fetch(words) {
FetchInstruction::Vertex(_) => vfetch_count += 1, FetchInstruction::Vertex(_) => vfetch_count += 1,
@@ -174,6 +175,50 @@ pub fn emit_for(parsed: &ParsedShader, stage: &'static str) {
} }
} }
/// Collect the unique texture-fetch-constant slot indices a shader samples.
///
/// Walks the same exec-clause / sequence-bitmap path as [`emit_for`] but only
/// extracts `TextureFetch.fetch_const` slots, deduplicated and in first-seen
/// order. The GPU draw handler uses this to decide which fetch constants to
/// decode + cache at draw time (keyed off the *active* pixel shader's real
/// `tfetch` instructions rather than a hardcoded slot).
pub fn tfetch_slots(parsed: &ParsedShader) -> Vec<u8> {
let mut slots: Vec<u8> = Vec::new();
for clause in &parsed.cf {
if let ControlFlowInstruction::Exec {
address,
count,
sequence,
..
} = clause
{
for i in 0..(*count as usize) {
let base = (*address as usize + i) * 3;
if base + 2 >= parsed.instructions.len() {
break;
}
// sequence: 2 bits per instruction — bit[0]=fetch(1)/ALU(0),
// bit[1]=serialize (Xenos `ucode.h:226`).
let is_fetch = ((sequence >> (i * 2)) & 1) != 0;
if !is_fetch {
continue;
}
let words = [
parsed.instructions[base],
parsed.instructions[base + 1],
parsed.instructions[base + 2],
];
if let FetchInstruction::Texture(tf) = decode_fetch(words) {
if !slots.contains(&tf.fetch_const) {
slots.push(tf.fetch_const);
}
}
}
}
}
slots
}
fn mark_feature(buf: &mut Vec<&'static str>, name: &'static str) { fn mark_feature(buf: &mut Vec<&'static str>, name: &'static str) {
if !buf.contains(&name) { if !buf.contains(&name) {
buf.push(name); buf.push(name);
@@ -298,6 +343,46 @@ mod tests {
emit_for(&shader, "vs"); emit_for(&shader, "vs");
} }
/// `tfetch_slots` should extract the fetch-constant slot of a texture
/// fetch (and dedup), and return empty for a flat ALU-only shader.
#[test]
fn tfetch_slots_extracts_texture_fetch_constants() {
// word0: opcode TEXTURE_FETCH (0x01) in low 5 bits, const_index=3 in
// bits[24:20] (Xenos `ucode.h:844`) → 0x01 | (3 << 20).
let tfetch_w0: u32 = 0x01 | (3u32 << 20);
let shader = ParsedShader {
cf: vec![
ControlFlowInstruction::Exec {
address: 0,
count: 2,
// instruction 0 is a fetch (bit[0] of its 2-bit field set),
// instruction 1 is ALU. is_fetch = (sequence >> (i*2)) & 1.
sequence: 0b00_01,
is_end: false,
predicated: false,
predicate_condition: false,
},
ControlFlowInstruction::Exit,
],
instructions: vec![tfetch_w0, 0, 0, /* ALU triple */ 0, 0, 0],
};
assert_eq!(tfetch_slots(&shader), vec![3]);
// Flat shader: no fetch bits → no slots.
let flat = ParsedShader {
cf: vec![ControlFlowInstruction::Exec {
address: 0,
count: 1,
sequence: 0,
is_end: false,
predicated: false,
predicate_condition: false,
}],
instructions: vec![0, 0, 0],
};
assert!(tfetch_slots(&flat).is_empty());
}
/// P8: a shader containing `LoopStart` should mark `cf_loop` as used /// P8: a shader containing `LoopStart` should mark `cf_loop` as used
/// so the HUD can surface which deferred feature a game triggers. /// so the HUD can surface which deferred feature a game triggers.
#[test] #[test]

View File

@@ -56,6 +56,7 @@ const CF_KIND_LOOP_END: u32 = 5u;
const CF_KIND_COND_JMP: u32 = 6u; const CF_KIND_COND_JMP: u32 = 6u;
const CF_KIND_COND_CALL: u32 = 7u; const CF_KIND_COND_CALL: u32 = 7u;
const CF_KIND_RETURN: u32 = 8u; const CF_KIND_RETURN: u32 = 8u;
const CF_KIND_NOP: u32 = 9u;
const CF_KIND_UNKNOWN: u32 = 15u; const CF_KIND_UNKNOWN: u32 = 15u;
// ── Alloc-kind codes (mirrors `xenia_gpu::ucode::cf_alloc_kind`). ────── // ── Alloc-kind codes (mirrors `xenia_gpu::ucode::cf_alloc_kind`). ──────
@@ -628,8 +629,8 @@ const VFMT_32_32_32_FLOAT: u32 = 57u;
// layout in `ucode.h:690`): // layout in `ucode.h:690`):
// w0 [4:0] opcode // w0 [4:0] opcode
// w0 [10:5] src_reg[5:0] // w0 [10:5] src_reg[5:0]
// w0 [17:11] dst_reg[6:0] + must-be-one // w0 [17:12] dst_reg[5:0]
// w0 [21:17] const_index[4:0], [23:22] const_index_sel[1:0] // w0 [24:20] const_index[4:0], [26:25] const_index_sel[1:0]
// w1 [21:16] format[5:0] // w1 [21:16] format[5:0]
// w2 [7:0] stride (in dwords) // w2 [7:0] stride (in dwords)
// w2 [30:8] offset (signed, in dwords) // w2 [30:8] offset (signed, in dwords)
@@ -641,9 +642,9 @@ fn interpret_vertex_fetch(t: u32) {
let w0 = vs_instr_dword(t, 0u); let w0 = vs_instr_dword(t, 0u);
let w1 = vs_instr_dword(t, 1u); let w1 = vs_instr_dword(t, 1u);
let w2 = vs_instr_dword(t, 2u); let w2 = vs_instr_dword(t, 2u);
let fetch_const = (w0 >> 5u) & 0x1Fu; let fetch_const = (w0 >> 20u) & 0x1Fu;
let dst_reg = (w0 >> 10u) & 0x7Fu; let dst_reg = (w0 >> 12u) & 0x3Fu;
let src_reg = (w0 >> 17u) & 0x7Fu; let src_reg = (w0 >> 5u) & 0x3Fu;
let format = (w1 >> 16u) & 0x3Fu; let format = (w1 >> 16u) & 0x3Fu;
let stride = w2 & 0xFFu; let stride = w2 & 0xFFu;
@@ -773,20 +774,20 @@ fn interpret_texture_fetch(t: u32, is_vertex: bool) {
} else { } else {
w0 = ps_instr_dword(t, 0u); w0 = ps_instr_dword(t, 0u);
} }
let dst_reg = (w0 >> 10u) & 0x7Fu; let dst_reg = (w0 >> 12u) & 0x3Fu;
let src_reg = (w0 >> 17u) & 0x7Fu; let src_reg = (w0 >> 5u) & 0x3Fu;
let uv = registers[src_reg & 0x7Fu].xy; let uv = registers[src_reg & 0x3Fu].xy;
let sample = textureSampleLevel(xenos_tex, xenos_samp, uv, 0.0); let sample = textureSampleLevel(xenos_tex, xenos_samp, uv, 0.0);
registers[dst_reg & 0x7Fu] = sample; registers[dst_reg & 0x3Fu] = sample;
} }
// Walk an Exec clause's instruction triples. // Walk an Exec clause's instruction triples.
// sequence: 2-bit-per-triple bitmap. Bit 0 of a pair = serialize flag // sequence: 2-bit-per-instruction bitmap. Bit 0 of a pair = fetch(1)/ALU(0);
// (we ignore in MVP); bit 1 = is-fetch. // bit 1 = serialize (ignored). (Xenos `ucode.h:226`.)
fn exec_vs(address: u32, count: u32, sequence: u32) { fn exec_vs(address: u32, count: u32, sequence: u32) {
for (var i: u32 = 0u; i < count; i = i + 1u) { for (var i: u32 = 0u; i < count; i = i + 1u) {
let t = address + i; let t = address + i;
let is_fetch = ((sequence >> (i * 2u + 1u)) & 1u) != 0u; let is_fetch = ((sequence >> (i * 2u)) & 1u) != 0u;
if is_fetch { if is_fetch {
let opcode = vs_instr_dword(t, 0u) & 0x1Fu; let opcode = vs_instr_dword(t, 0u) & 0x1Fu;
// 0x00 = vertex fetch, 0x01 = texture fetch. // 0x00 = vertex fetch, 0x01 = texture fetch.
@@ -803,7 +804,7 @@ fn exec_vs(address: u32, count: u32, sequence: u32) {
fn exec_ps(address: u32, count: u32, sequence: u32) { fn exec_ps(address: u32, count: u32, sequence: u32) {
for (var i: u32 = 0u; i < count; i = i + 1u) { for (var i: u32 = 0u; i < count; i = i + 1u) {
let t = address + i; let t = address + i;
let is_fetch = ((sequence >> (i * 2u + 1u)) & 1u) != 0u; let is_fetch = ((sequence >> (i * 2u)) & 1u) != 0u;
if is_fetch { if is_fetch {
interpret_texture_fetch(t, false); interpret_texture_fetch(t, false);
} else { } else {
@@ -962,6 +963,9 @@ fn walk_cf_vs() {
// No call stack — mark and continue. // No call stack — mark and continue.
reject_mask |= REJECT_CF_CALL; reject_mask |= REJECT_CF_CALL;
} }
case CF_KIND_NOP: {
// kNop padding / kMarkVsFetchDone hint — no-op, just advance.
}
default: { reject_mask |= REJECT_CF_JUMP; } default: { reject_mask |= REJECT_CF_JUMP; }
} }
if stop { break; } if stop { break; }

View File

@@ -237,6 +237,10 @@ impl EmitCtx {
current_alloc = *kind; current_alloc = *kind;
} }
ControlFlowInstruction::Exit => break, ControlFlowInstruction::Exit => break,
// Non-executing CF clauses: padding (`kNop`) and the
// vertex-fetch-done hint (`kMarkVsFetchDone`). Skip them.
ControlFlowInstruction::Nop
| ControlFlowInstruction::MarkVsFetchDone => {}
ControlFlowInstruction::LoopStart { .. } ControlFlowInstruction::LoopStart { .. }
| ControlFlowInstruction::LoopEnd { .. } => return Err(reject::CF_LOOP), | ControlFlowInstruction::LoopEnd { .. } => return Err(reject::CF_LOOP),
ControlFlowInstruction::CondJmp { .. } => return Err(reject::CF_COND), ControlFlowInstruction::CondJmp { .. } => return Err(reject::CF_COND),
@@ -284,7 +288,9 @@ impl EmitCtx {
parsed.instructions[base + 1], parsed.instructions[base + 1],
parsed.instructions[base + 2], parsed.instructions[base + 2],
]; ];
let is_fetch = ((sequence >> (i * 2 + 1)) & 1) != 0; // sequence: 2 bits per instruction — bit[0]=fetch(1)/ALU(0),
// bit[1]=serialize (Xenos `ucode.h:226`).
let is_fetch = ((sequence >> (i * 2)) & 1) != 0;
if is_fetch { if is_fetch {
match decode_fetch(words) { match decode_fetch(words) {
FetchInstruction::Vertex(vf) => self.emit_vfetch(&vf)?, FetchInstruction::Vertex(vf) => self.emit_vfetch(&vf)?,

View File

@@ -43,7 +43,15 @@ pub enum ControlFlowInstruction {
Return, Return,
/// `kAlloc` — pre-allocate export registers (position, interpolators, colors). /// `kAlloc` — pre-allocate export registers (position, interpolators, colors).
Alloc { size: u32, kind: AllocKind }, Alloc { size: u32, kind: AllocKind },
/// Exit the shader (terminal). /// `kNop` — fills space in the CF block; executes nothing, does not end
/// the shader. (Xenos opcode 0.)
Nop,
/// `kMarkVsFetchDone` — hint that no more vertex fetches will be performed.
/// (Xenos opcode 15.) Non-terminating.
MarkVsFetchDone,
/// Exit the shader (terminal). Synthesized — Xenos has no dedicated exit
/// opcode; the shader ends after an `Exec`/`CondExec` clause with the
/// END bit set (`is_end`). Retained for callers/tests that reference it.
Exit, Exit,
/// Unknown / unhandled opcode. /// Unknown / unhandled opcode.
Unknown { opcode: u8 }, Unknown { opcode: u8 },
@@ -93,37 +101,45 @@ fn decode_single(payload: u64) -> ControlFlowInstruction {
let predicated = ((payload >> 28) & 1) != 0; let predicated = ((payload >> 28) & 1) != 0;
let predicate_condition = ((payload >> 29) & 1) != 0; let predicate_condition = ((payload >> 29) & 1) != 0;
// Xenos `ControlFlowOpcode` (canary `ucode.h:86-160`):
// 0 kNop, 1 kExec, 2 kExecEnd, 3 kCondExec, 4 kCondExecEnd,
// 5 kCondExecPred, 6 kCondExecPredEnd, 7 kLoopStart, 8 kLoopEnd,
// 9 kCondCall, 10 kReturn, 11 kCondJmp, 12 kAlloc,
// 13 kCondExecPredClean, 14 kCondExecPredCleanEnd, 15 kMarkVsFetchDone.
// All exec variants share the address(12)/count(3)/sequence(12) layout
// of `ControlFlowExecInstruction`; the `*End` variants terminate the
// shader. (Prior table was off-by-one — it mapped 0→Exec and 1→Exit,
// so a real `kExec` clause was misread as a terminal `Exit`, truncating
// the CF block and dropping every `tfetch` in it.)
let exec = |is_end: bool| ControlFlowInstruction::Exec {
address: (payload & 0xFFF) as u32,
count: ((payload >> 12) & 0x7) as u32,
sequence: ((payload >> 16) & 0xFFF) as u32,
is_end,
predicated,
predicate_condition,
};
match opcode { match opcode {
0 => ControlFlowInstruction::Exec { 0 => ControlFlowInstruction::Nop,
address: (payload & 0xFFF) as u32, 1 => exec(false),
count: ((payload >> 12) & 0x7) as u32, 2 => exec(true),
sequence: ((payload >> 16) & 0xFFF) as u32, 3 => exec(false),
is_end: false, 4 => exec(true),
predicated, 5 => exec(false),
predicate_condition, 6 => exec(true),
}, 7 => ControlFlowInstruction::LoopStart {
1 => ControlFlowInstruction::Exit,
2 => ControlFlowInstruction::Exec {
address: (payload & 0xFFF) as u32,
count: ((payload >> 12) & 0x7) as u32,
sequence: ((payload >> 16) & 0xFFF) as u32,
is_end: true,
predicated,
predicate_condition,
},
6 => ControlFlowInstruction::LoopStart {
address: (payload & 0x3FF) as u32, address: (payload & 0x3FF) as u32,
loop_id: ((payload >> 16) & 0x1F) as u32, loop_id: ((payload >> 16) & 0x1F) as u32,
}, },
7 => ControlFlowInstruction::LoopEnd { 8 => ControlFlowInstruction::LoopEnd {
address: (payload & 0x3FF) as u32, address: (payload & 0x3FF) as u32,
loop_id: ((payload >> 16) & 0x1F) as u32, loop_id: ((payload >> 16) & 0x1F) as u32,
}, },
8 => ControlFlowInstruction::CondCall { 9 => ControlFlowInstruction::CondCall {
target: (payload & 0x3FF) as u32, target: (payload & 0x3FF) as u32,
}, },
9 => ControlFlowInstruction::Return, 10 => ControlFlowInstruction::Return,
10 => ControlFlowInstruction::CondJmp { 11 => ControlFlowInstruction::CondJmp {
target: (payload & 0x3FF) as u32, target: (payload & 0x3FF) as u32,
predicated, predicated,
predicate_condition, predicate_condition,
@@ -132,6 +148,9 @@ fn decode_single(payload: u64) -> ControlFlowInstruction {
size: (payload & 0x7) as u32, size: (payload & 0x7) as u32,
kind: AllocKind::from_bits(((payload >> 4) & 0x7) as u32), kind: AllocKind::from_bits(((payload >> 4) & 0x7) as u32),
}, },
13 => exec(false),
14 => exec(true),
15 => ControlFlowInstruction::MarkVsFetchDone,
other => ControlFlowInstruction::Unknown { opcode: other }, other => ControlFlowInstruction::Unknown { opcode: other },
} }
} }
@@ -141,12 +160,49 @@ mod tests {
use super::*; use super::*;
#[test] #[test]
fn opcode_exit_decodes() { fn opcode_nop_and_exec_decode() {
// opcode 1 (Exit) in bits 44..47 of A's 48-bit payload. // Xenos opcode 0 = kNop (non-terminating padding).
let payload: u64 = 0u64 << 44;
let (hi, lo) = ((payload & 0xFFFF_FFFF) as u32, ((payload >> 32) & 0xFFFF) as u32);
assert_eq!(decode_cf_pair(hi, lo, 0).0, ControlFlowInstruction::Nop);
// Xenos opcode 1 = kExec (executes instructions; NOT a terminal exit).
let payload: u64 = 1u64 << 44; let payload: u64 = 1u64 << 44;
let (hi, lo) = ((payload & 0xFFFF_FFFF) as u32, ((payload >> 32) & 0xFFFF) as u32); let (hi, lo) = ((payload & 0xFFFF_FFFF) as u32, ((payload >> 32) & 0xFFFF) as u32);
let cf = decode_cf_pair(hi, lo, 0).0; match decode_cf_pair(hi, lo, 0).0 {
assert_eq!(cf, ControlFlowInstruction::Exit); ControlFlowInstruction::Exec { is_end, .. } => assert!(!is_end),
other => panic!("opcode 1 should be non-end Exec, got {other:?}"),
}
// Xenos opcode 15 = kMarkVsFetchDone (non-terminating hint).
let payload: u64 = 15u64 << 44;
let (hi, lo) = ((payload & 0xFFFF_FFFF) as u32, ((payload >> 32) & 0xFFFF) as u32);
assert_eq!(
decode_cf_pair(hi, lo, 0).0,
ControlFlowInstruction::MarkVsFetchDone
);
}
#[test]
fn real_logo_shader_has_tfetch_clauses() {
// The publisher-logo pixel shader E59B2B3DA4AA9008 (captured from the
// canary oracle, byte-identical to the microcode our guest IM_LOADs).
// Regression for iterate-3M: the old off-by-one opcode table decoded
// its leading `kExec` (opcode 1) as a terminal `Exit`, truncating the
// CF block so the `tfetch2D` never appeared → flat splash.
let ucode: [u32; 24] = [
0x00011002, 0x00001200, 0xC4000000, 0x00004003, 0x00002200, 0x00000000,
0x10082021, 0x1F1FF688, 0x00004000, 0xC8080001, 0x001B1B00, 0xC1020000,
0xC8070000, 0x00C0C000, 0xC1020000, 0xC8070001, 0x00C01B00, 0xC1000100,
0xC80F8000, 0x00000000, 0xC2010100, 0x00000000, 0x00000000, 0x00000000,
];
let p = crate::ucode::parse_shader(&ucode);
let exec_clauses = p
.cf
.iter()
.filter(|c| matches!(c, ControlFlowInstruction::Exec { .. }))
.count();
assert!(exec_clauses >= 1, "expected >=1 Exec clause, cf={:?}", p.cf);
let slots = crate::shader_metrics::tfetch_slots(&p);
assert!(!slots.is_empty(), "expected tfetch slots, got none; cf={:?}", p.cf);
} }
#[test] #[test]

View File

@@ -54,23 +54,32 @@ pub mod op {
} }
pub fn decode_fetch(words: [u32; 3]) -> FetchInstruction { pub fn decode_fetch(words: [u32; 3]) -> FetchInstruction {
// Fetch dword0 bitfields (Xenos `ucode.h:740-749` vfetch / `844-845`
// tfetch): opcode_value:5, src_reg:6, src_reg_am:1, dst_reg:6,
// dst_reg_am:1, (fetch_valid_only|must_be_one):1, const_index:5 @ bit20,
// ... The prior decoder read `const_index` from bit 5 (which is actually
// `src_reg`), so every fetch reported the wrong fetch-constant slot — the
// logo `tfetch2D ..., tf0` was read as `tf1`, and slot 1's empty constant
// failed to decode → no texture. The texture-fetch `dimension` lives in
// dword2 bits 14..15, not dword1.
let w0 = words[0]; let w0 = words[0];
let w1 = words[1]; let w1 = words[1];
let w2 = words[2];
let opcode = (w0 & 0x1F) as u8; let opcode = (w0 & 0x1F) as u8;
match opcode { match opcode {
op::VERTEX_FETCH => FetchInstruction::Vertex(VertexFetch { op::VERTEX_FETCH => FetchInstruction::Vertex(VertexFetch {
fetch_const: ((w0 >> 5) & 0x1F) as u8, fetch_const: ((w0 >> 20) & 0x1F) as u8,
src_register: ((w0 >> 17) & 0x7F) as u8, src_register: ((w0 >> 5) & 0x3F) as u8,
dest_register: ((w0 >> 10) & 0x7F) as u8, dest_register: ((w0 >> 12) & 0x3F) as u8,
dest_write_mask: ((w1 >> 23) & 0xF) as u8, dest_write_mask: (w1 & 0xF) as u8,
raw: words, raw: words,
}), }),
op::TEXTURE_FETCH => FetchInstruction::Texture(TextureFetch { op::TEXTURE_FETCH => FetchInstruction::Texture(TextureFetch {
fetch_const: ((w0 >> 5) & 0x1F) as u8, fetch_const: ((w0 >> 20) & 0x1F) as u8,
src_register: ((w0 >> 17) & 0x7F) as u8, src_register: ((w0 >> 5) & 0x3F) as u8,
dest_register: ((w0 >> 10) & 0x7F) as u8, dest_register: ((w0 >> 12) & 0x3F) as u8,
dest_write_mask: ((w1 >> 23) & 0xF) as u8, dest_write_mask: (w1 & 0xF) as u8,
dimension: ((w1 >> 29) & 0x3) as u8, dimension: ((w2 >> 14) & 0x3) as u8,
raw: words, raw: words,
}), }),
_ => FetchInstruction::Unknown { opcode, raw: words }, _ => FetchInstruction::Unknown { opcode, raw: words },
@@ -83,8 +92,9 @@ mod tests {
#[test] #[test]
fn decode_vertex_fetch() { fn decode_vertex_fetch() {
// opcode=0 (vertex), fetch_const=5, src=2, dest=7. // opcode=0 (vertex). Xenos dword0: src_reg@bit5, dst_reg@bit12,
let w0 = 0u32 | (5 << 5) | (7 << 10) | (2 << 17); // const_index@bit20. fetch_const=5, src=2, dest=7.
let w0 = 0u32 | (2 << 5) | (7 << 12) | (5 << 20);
let v = decode_fetch([w0, 0, 0]); let v = decode_fetch([w0, 0, 0]);
match v { match v {
FetchInstruction::Vertex(vf) => { FetchInstruction::Vertex(vf) => {
@@ -98,11 +108,16 @@ mod tests {
#[test] #[test]
fn decode_texture_fetch() { fn decode_texture_fetch() {
let w0 = 1u32 | (3 << 5) | (4 << 10) | (1 << 17); // opcode=1 (texture). const_index@bit20=3, src@bit5=1, dst@bit12=4.
let t = decode_fetch([w0, (2u32 << 29), 0]); // dimension lives in dword2 bits 14..15.
let w0 = 1u32 | (1 << 5) | (4 << 12) | (3 << 20);
let w2 = 2u32 << 14;
let t = decode_fetch([w0, 0, w2]);
match t { match t {
FetchInstruction::Texture(tf) => { FetchInstruction::Texture(tf) => {
assert_eq!(tf.fetch_const, 3); assert_eq!(tf.fetch_const, 3);
assert_eq!(tf.src_register, 1);
assert_eq!(tf.dest_register, 4);
assert_eq!(tf.dimension, 2); assert_eq!(tf.dimension, 2);
} }
other => panic!("expected Texture, got {other:?}"), other => panic!("expected Texture, got {other:?}"),

View File

@@ -48,6 +48,9 @@ pub mod cf_kind {
pub const COND_JMP: u32 = 6; pub const COND_JMP: u32 = 6;
pub const COND_CALL: u32 = 7; pub const COND_CALL: u32 = 7;
pub const RETURN: u32 = 8; pub const RETURN: u32 = 8;
/// Non-executing CF clause: `kNop` padding or `kMarkVsFetchDone` hint.
/// The WGSL CF walker treats this as a no-op (advance, do not reject).
pub const NOP: u32 = 9;
pub const UNKNOWN: u32 = 15; pub const UNKNOWN: u32 = 15;
} }
@@ -136,6 +139,7 @@ fn encode_cf(c: ControlFlowInstruction) -> (u32, u32, u32) {
} }
CondCall { target } => (cf_kind::COND_CALL, target, 0), CondCall { target } => (cf_kind::COND_CALL, target, 0),
Return => (cf_kind::RETURN, 0, 0), Return => (cf_kind::RETURN, 0, 0),
Nop | MarkVsFetchDone => (cf_kind::NOP, 0, 0),
Unknown { opcode } => (cf_kind::UNKNOWN, opcode as u32, 0), Unknown { opcode } => (cf_kind::UNKNOWN, opcode as u32, 0),
} }
} }
@@ -164,9 +168,11 @@ pub struct ParsedShader {
} }
/// Decode a shader blob. `raw_dwords` is a host-endian slice of the entire /// Decode a shader blob. `raw_dwords` is a host-endian slice of the entire
/// microcode buffer (control flow + instructions). Heuristic: CF dword count /// microcode buffer (control flow + instructions). The CF block is implicitly
/// is encoded in the first word's low 12 bits of the last exec clause — /// bounded: we walk clause-pair rows until one terminates the shader (an
/// canary iterates until it hits a clause of kind `Exit`. We do the same. /// `Exec`/`CondExec` clause with the END bit set, per Xenos). Everything after
/// that row is the instruction block; exec/loop addresses are then rebased to
/// be relative to it.
pub fn parse_shader(raw_dwords: &[u32]) -> ParsedShader { pub fn parse_shader(raw_dwords: &[u32]) -> ParsedShader {
let mut cf = Vec::new(); let mut cf = Vec::new();
// CF clauses are 48-bit (word1 lo 16 + word0 = 48 or so per canary's // CF clauses are 48-bit (word1 lo 16 + word0 = 48 or so per canary's
@@ -175,22 +181,50 @@ pub fn parse_shader(raw_dwords: &[u32]) -> ParsedShader {
while i + 2 < raw_dwords.len() { while i + 2 < raw_dwords.len() {
let a = decode_cf_pair(raw_dwords[i], raw_dwords[i + 1], raw_dwords[i + 2]); let a = decode_cf_pair(raw_dwords[i], raw_dwords[i + 1], raw_dwords[i + 2]);
let (first, second) = a; let (first, second) = a;
let seen_exit = matches!( // The CF block ends after the clause that terminates the shader: an
first, // `Exec` with the END bit set (Xenos `kExecEnd`/`kCondExec*End`), a
ControlFlowInstruction::Exit | ControlFlowInstruction::Unknown { .. } // synthetic `Exit`, or an `Unknown` opcode (decode ran off the CF
) || matches!( // block into instruction data — stop defensively). `Nop` padding
second, // does NOT terminate. (Previously this stopped on the first `Exit`,
ControlFlowInstruction::Exit | ControlFlowInstruction::Unknown { .. } // but with the corrected opcode table opcode 1 is `kExec`, not exit,
); // so real exec clauses kept the parse going as intended.)
let terminates = |cf: &ControlFlowInstruction| {
matches!(
cf,
ControlFlowInstruction::Exec { is_end: true, .. }
| ControlFlowInstruction::Exit
| ControlFlowInstruction::Unknown { .. }
)
};
let seen_end = terminates(&first) || terminates(&second);
cf.push(first); cf.push(first);
cf.push(second); cf.push(second);
i += 3; i += 3;
if seen_exit { if seen_end {
break; break;
} }
} }
// Everything after `i` dwords is the instruction block. // Everything after `i` dwords is the instruction block.
let instructions = raw_dwords[i..].to_vec(); let instructions = raw_dwords[i..].to_vec();
// Xenos exec/loop `address` fields are absolute instruction-triple indices
// counted from shader dword 0, but `instructions` here begins *after* the
// CF block. Rebase those addresses to be relative to the instruction block
// (subtract the CF triple count) so `address * 3` indexes `instructions`
// directly. (Without this, every exec read 3 dwords too far per CF triple —
// the publisher-logo `tfetch` triple was skipped → flat splash.)
let cf_triples = (i / 3) as u32;
for clause in cf.iter_mut() {
match clause {
ControlFlowInstruction::Exec { address, .. } => {
*address = address.saturating_sub(cf_triples);
}
ControlFlowInstruction::LoopStart { address, .. }
| ControlFlowInstruction::LoopEnd { address, .. } => {
*address = address.saturating_sub(cf_triples);
}
_ => {}
}
}
ParsedShader { cf, instructions } ParsedShader { cf, instructions }
} }
@@ -235,15 +269,19 @@ mod tests {
} }
#[test] #[test]
fn trivial_exit_clause_stops_parsing() { fn exec_end_clause_stops_parsing() {
// Two clauses: [NOP (kind=0), EXIT (kind=1)] encoded per canary. // Row: clause B = kExecEnd (opcode 2) terminates the CF block.
// Exit clause is opcode 1 in the top 4 bits of the upper 16 bits. // 48-bit payload of B occupies hi16(word1) + word2; opcode lives in
let w0 = 0u32; // clause A body // bits 44..47 of that payload. Put opcode 2 there: payload bit 44 set
let w1 = (1u32 << 12) << 16; // upper 16 bits = 0x1000 → opcode=1 (EXIT) for clause A // for the `2` → (2 << 44). In B's framing, bits 16..47 come from
let w2 = 0u32; // word2, so word2 bit (44-16)=28 region holds the opcode nibble.
let p = parse_shader(&[w0, w1, w2, 0xDEAD_BEEF]); let b_payload: u64 = 2u64 << 44; // kExecEnd
// B = lo16 from hi16(word1), hi from word2. Reconstruct word1/word2.
let word1 = ((b_payload & 0xFFFF) as u32) << 16; // B's low 16 bits → hi16(word1)
let word2 = ((b_payload >> 16) & 0xFFFF_FFFF) as u32;
let p = parse_shader(&[0, word1, word2, 0xDEAD_BEEF]);
assert!(!p.cf.is_empty()); assert!(!p.cf.is_empty());
// Exit detected → remaining dword is instruction data. // ExecEnd detected in the first row → remaining dword is instruction data.
assert_eq!(p.instructions, vec![0xDEAD_BEEF]); assert_eq!(p.instructions, vec![0xDEAD_BEEF]);
} }
} }

View File

@@ -1652,6 +1652,79 @@ fn nt_set_information_file(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut
return; return;
} }
// XFileRenameInformation (10): move the backing file to a new path.
// Sylpheed's asset-cache decompresses each packed resource to a staging
// `cache:\<hash><tail>.tmp` then renames it into its final nested path
// `cache:\<hash>\<dir>\<file>`. Without an actual host-FS rename the
// nested target stays empty, the later read-back of the decompressed
// asset (e.g. the title logo texture `\69d8e45c\e\534ffea`) misses, and
// the logo never loads. Mirror canary `xboxkrnl_io_info.cc:226`
// (`X_FILE_RENAME_INFORMATION{ replace_existing@0, root_dir_handle@4,
// ansi_string@8 }` → `file->Rename(TranslateAnsiPath(ansi_string))`).
if info_class == 10 {
// Read the target path from the embedded ANSI_STRING at info_ptr+8.
let target_raw = match crate::path::read_ansi_string(mem, info_ptr + 8) {
Some(s) if !s.is_empty() => s,
_ => {
const STATUS_OBJECT_NAME_INVALID: u64 = 0xC000_0033;
ctx.gpr[3] = STATUS_OBJECT_NAME_INVALID;
return;
}
};
// Resolve the destination against the host cache backing dir. We only
// support renames within the writable `cache:` mount (the only place
// a guest can create files); disc/synth entries are read-only.
let new_host = state.resolve_cache_path(&target_raw);
// Current backing host path of the handle.
let old_host = match state.objects.get(&handle) {
Some(KernelObject::File { host_path: Some(hp), .. }) => Some(hp.clone()),
Some(KernelObject::File { .. }) => None,
_ => {
ctx.gpr[3] = STATUS_INVALID_HANDLE;
return;
}
};
let status: u64 = match (old_host, new_host) {
(Some(old), Some(new)) => {
if let Some(parent) = new.parent() {
let _ = std::fs::create_dir_all(parent);
}
match std::fs::rename(&old, &new) {
Ok(()) => {
// Update the handle so subsequent I/O targets the new
// host path + guest path.
if let Some(KernelObject::File { path, host_path, .. }) =
state.objects.get_mut(&handle)
{
*path = crate::path::normalize_path(&target_raw);
*host_path = Some(new.clone());
}
tracing::info!(
"NtSetInformationFile rename cache {:?} -> {:?} ({:?})",
old, new, target_raw
);
STATUS_SUCCESS
}
Err(e) => {
tracing::warn!(
"NtSetInformationFile rename {:?} -> {:?} failed: {}",
old, new, e
);
STATUS_UNSUCCESSFUL
}
}
}
// Non-cache (read-only VFS) source/target: acknowledge without a
// host move, matching the prior permissive behaviour.
_ => STATUS_SUCCESS,
};
if iosb_ptr != 0 {
write_io_status_block(mem, iosb_ptr, status as u32, info_length);
}
ctx.gpr[3] = status;
return;
}
// Handle lookup. // Handle lookup.
let Some(KernelObject::File { size, position, host_path, .. }) = state.objects.get_mut(&handle) else { let Some(KernelObject::File { size, position, host_path, .. }) = state.objects.get_mut(&handle) else {
ctx.gpr[3] = STATUS_INVALID_HANDLE; ctx.gpr[3] = STATUS_INVALID_HANDLE;
@@ -3116,27 +3189,27 @@ fn vd_swap(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
); );
ui.publish_assets(blobs, constants); ui.publish_assets(blobs, constants);
// P5: try to decode the primary texture (fetch constant slot 0). // P5b: publish the texture the last draw's *active pixel shader*
// Slot 0 is the convention most games use for their main bound // actually sampled. The GPU draw handler decodes the PS's real
// texture at draw time; full N-slot binding waits for P6+. If the // `tfetch` fetch-constant slots into `last_draw_textures`; we publish
// slot is unset or the format isn't supported (magenta stub kicks // the first (the UI binds a single texture today). When the last draw
// in host-side), we skip. // used a flat (no-tfetch) shader the list is empty, so we fall back to
// // the legacy slot-0 probe to preserve behavior on flat-only frames.
// Texture fetch constants live at `CONST_BASE_FETCH + slot*6` in let published = gpu_inline.last_draw_textures.first().cloned().or_else(|| {
// the register file; we read the 6 dwords, decode the key, hit // Fallback: probe fetch constant slot 0 directly. Texture fetch
// the CPU cache (with page-version freshness), and clone the // constants live at `CONST_BASE_FETCH + slot*6` in the register
// decoded bytes across the bridge. // file; read 6 dwords, decode the key, hit the CPU cache with
const TEX_SLOT: u32 = 0; // page-version freshness, clone the bytes across the bridge.
let mut fetch6 = [0u32; 6]; const TEX_SLOT: u32 = 0;
for (i, slot) in fetch6.iter_mut().enumerate() { let mut fetch6 = [0u32; 6];
*slot = gpu_inline for (i, slot) in fetch6.iter_mut().enumerate() {
.register_file *slot = gpu_inline
.read(xenia_gpu::gpu_system::CONST_BASE_FETCH + TEX_SLOT * 6 + i as u32); .register_file
} .read(xenia_gpu::gpu_system::CONST_BASE_FETCH + TEX_SLOT * 6 + i as u32);
let published = if let Some(key) = xenia_gpu::texture_cache::decode_fetch_constant(fetch6) }
{ let key = xenia_gpu::texture_cache::decode_fetch_constant(fetch6)?;
// Span over the entire tiled texture footprint to pick the // Span over the entire tiled texture footprint to pick the max
// max page version covering it. // page version covering it.
let bi = key.format.block_info(); let bi = key.format.block_info();
let span_bytes = (key.pitch_texels as u32) let span_bytes = (key.pitch_texels as u32)
* (key.height as u32) * (key.height as u32)
@@ -3154,9 +3227,7 @@ fn vd_swap(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
None None
} }
} }
} else { });
None
};
metrics::gauge!("gpu.texture_cache.entries") metrics::gauge!("gpu.texture_cache.entries")
.set(gpu_inline.texture_cache.len() as f64); .set(gpu_inline.texture_cache.len() as f64);
ui.publish_texture(published); ui.publish_texture(published);
@@ -5583,6 +5654,67 @@ mod tests {
} }
} }
/// `NtSetInformationFile` class 10 (`XFileRenameInformation`) must move
/// the backing host file to the new `cache:` path and update the handle.
/// Mirrors Sylpheed's asset-cache `.tmp` → `\<hash>\<dir>\<file>` move;
/// without it the nested target stays empty and the decompressed asset
/// (logo texture) never reads back. Faithful to canary `file->Rename`.
#[test]
fn nt_set_information_file_rename_moves_cache_file() {
let (mut ctx, mut mem, mut state) = fresh();
// Real temp cache root + a staging `.tmp` file with known bytes.
let root = std::env::temp_dir().join(format!("xenia-rs-rename-test-{}", std::process::id()));
let _ = std::fs::remove_dir_all(&root);
std::fs::create_dir_all(&root).unwrap();
let old_host = root.join("69d8e45ce534ffea.tmp");
std::fs::write(&old_host, b"LOGOTEX!").unwrap();
state.cache_root = Some(root.clone());
// Open handle whose backing host_path is the staging file.
let handle = state.alloc_handle_for(KernelObject::File {
path: "69d8e45ce534ffea.tmp".to_string(),
size: 8,
position: 0,
data: Arc::new(Vec::new()),
dir_enum_pos: None,
host_path: Some(old_host.clone()),
});
// X_FILE_RENAME_INFORMATION { replace@0, root_dir@4, ANSI_STRING@8 }.
// ANSI_STRING { len u16, max u16, buf u32 } at info_ptr+8; buffer holds
// the target path "cache:\69d8e45c\e\534ffea".
let info_ptr = SCRATCH_BASE + 0x100;
let str_buf = SCRATCH_BASE + 0x200;
let target = b"cache:\\69d8e45c\\e\\534ffea";
for (i, b) in target.iter().enumerate() {
mem.write_u8(str_buf + i as u32, *b);
}
mem.write_u32(info_ptr, 0); // replace_existing
mem.write_u32(info_ptr + 4, 0); // root_dir_handle
mem.write_u16(info_ptr + 8, target.len() as u16); // ANSI_STRING.Length
mem.write_u16(info_ptr + 10, target.len() as u16); // MaximumLength
mem.write_u32(info_ptr + 12, str_buf); // Buffer
let iosb_ptr = SCRATCH_BASE + 0x140;
ctx.gpr[3] = handle as u64;
ctx.gpr[4] = iosb_ptr as u64;
ctx.gpr[5] = info_ptr as u64;
ctx.gpr[6] = 16;
ctx.gpr[7] = 10; // XFileRenameInformation
nt_set_information_file(&mut ctx, &mut mem, &mut state);
assert_eq!(ctx.gpr[3], STATUS_SUCCESS);
// Staging file gone; nested target exists with the same bytes.
let new_host = root.join("69d8e45c").join("e").join("534ffea");
assert!(!old_host.exists(), "staging .tmp should be moved away");
assert_eq!(std::fs::read(&new_host).unwrap(), b"LOGOTEX!");
// Handle now points at the new host + guest path.
match state.objects.get(&handle) {
Some(KernelObject::File { host_path: Some(hp), path, .. }) => {
assert_eq!(hp, &new_host);
assert_eq!(path, "cache:/69d8e45c/e/534ffea");
}
_ => panic!("file handle lost or host_path missing"),
}
let _ = std::fs::remove_dir_all(&root);
}
/// Read-only VFS — truncating to a different size must fail with /// Read-only VFS — truncating to a different size must fail with
/// `STATUS_UNSUCCESSFUL`, matching Canary's error path when /// `STATUS_UNSUCCESSFUL`, matching Canary's error path when
/// `file->SetLength(...)` can't honour the request. /// `file->SetLength(...)` can't honour the request.

View File

@@ -30,6 +30,12 @@ use xenia_cpu::ThreadRef;
pub const INTERRUPT_SOURCE_VSYNC: u32 = 0; pub const INTERRUPT_SOURCE_VSYNC: u32 = 0;
pub const INTERRUPT_SOURCE_CP: u32 = 1; pub const INTERRUPT_SOURCE_CP: u32 = 1;
/// The processor the graphics ISR impersonates for a v-sync interrupt.
/// Canary hard-codes this: `MarkVblank` → `DispatchInterruptCallback(0, 2)`
/// (graphics_system.cc:478). CP interrupts instead use the bit index of the
/// `PM4_INTERRUPT` `cpu_mask`.
pub const VSYNC_TARGET_CPU: u8 = 2;
/// Guest-registered V-sync / graphics-interrupt callback (from /// Guest-registered V-sync / graphics-interrupt callback (from
/// `VdSetGraphicsInterruptCallback`). /// `VdSetGraphicsInterruptCallback`).
#[derive(Debug, Clone, Copy)] #[derive(Debug, Clone, Copy)]
@@ -145,9 +151,16 @@ pub type PendingLocalIrq = [std::sync::atomic::AtomicU8;
pub struct InterruptState { pub struct InterruptState {
/// Registered callback (set by `VdSetGraphicsInterruptCallback`). /// Registered callback (set by `VdSetGraphicsInterruptCallback`).
pub callback: Option<GraphicsInterruptCallback>, pub callback: Option<GraphicsInterruptCallback>,
/// Bounded FIFO of pending interrupt sources awaiting injection. /// Bounded FIFO of pending interrupts awaiting injection, as
/// Push-back on queue, pop-front on inject. Over-cap pushes drop. /// `(source, target_cpu)`. Push-back on queue, pop-front on inject.
pub pending: VecDeque<u32>, /// Over-cap pushes drop. `target_cpu` is the processor the graphics
/// ISR must impersonate (canary `XThread::SetActiveCpu` / the
/// `DispatchInterruptCallback(source, cpu)` argument): the bit index
/// of the CP `PM4_INTERRUPT` `cpu_mask` for source=1, and a fixed `2`
/// for vsync (canary `DispatchInterruptCallback(0, 2)`). The ISR reads
/// it from the PCR (`[r13+268]`) to clear the matching per-CPU bit of
/// the swap-acknowledge fence.
pub pending: VecDeque<(u32, u8)>,
/// When `Some`, some HW thread is currently running a callback; on /// When `Some`, some HW thread is currently running a callback; on
/// return-to-sentinel we restore this and clear the flag. /// return-to-sentinel we restore this and clear the flag.
pub saved: Option<SavedCallbackCtx>, pub saved: Option<SavedCallbackCtx>,
@@ -211,8 +224,9 @@ impl InterruptState {
}); });
} }
/// Queue an interrupt for the next safe injection point. /// Queue an interrupt for the next safe injection point. `cpu` is the
pub fn queue_interrupt(&mut self, source: u32) { /// processor the ISR must impersonate (see `pending`).
pub fn queue_interrupt(&mut self, source: u32, cpu: u8) {
if self.callback.is_none() { if self.callback.is_none() {
self.dropped += 1; self.dropped += 1;
return; return;
@@ -221,18 +235,23 @@ impl InterruptState {
self.dropped += 1; self.dropped += 1;
return; return;
} }
self.pending.push_back(source); self.pending.push_back((source, cpu));
} }
/// Peek at the next pending source without removing it. /// Peek at the next pending source without removing it.
pub fn peek_next(&self) -> Option<u32> { pub fn peek_next(&self) -> Option<u32> {
self.pending.front().copied() self.pending.front().map(|&(source, _)| source)
}
/// Peek at the target CPU of the next pending interrupt.
pub fn peek_next_cpu(&self) -> Option<u8> {
self.pending.front().map(|&(_, cpu)| cpu)
} }
/// Pop the next pending source (called by the injector after it has /// Pop the next pending source (called by the injector after it has
/// committed to dispatching it). /// committed to dispatching it).
pub fn take_next(&mut self) -> Option<u32> { pub fn take_next(&mut self) -> Option<u32> {
self.pending.pop_front() self.pending.pop_front().map(|(source, _)| source)
} }
/// **Legacy** — instruction-count v-sync ticker. Kept for unit tests /// **Legacy** — instruction-count v-sync ticker. Kept for unit tests
@@ -249,7 +268,7 @@ impl InterruptState {
let periods = self.vsync_accumulator / VSYNC_INSTR_PERIOD; let periods = self.vsync_accumulator / VSYNC_INSTR_PERIOD;
self.vsync_accumulator %= VSYNC_INSTR_PERIOD; self.vsync_accumulator %= VSYNC_INSTR_PERIOD;
for _ in 0..periods { for _ in 0..periods {
self.queue_interrupt(INTERRUPT_SOURCE_VSYNC); self.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
} }
true true
} }
@@ -288,7 +307,7 @@ impl InterruptState {
self.last_vsync_instant = Some(anchor + advance); self.last_vsync_instant = Some(anchor + advance);
let to_queue = (periods as usize).min(INTERRUPT_QUEUE_CAP); let to_queue = (periods as usize).min(INTERRUPT_QUEUE_CAP);
for _ in 0..to_queue { for _ in 0..to_queue {
self.queue_interrupt(INTERRUPT_SOURCE_VSYNC); self.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
} }
true true
} }
@@ -306,7 +325,7 @@ mod tests {
#[test] #[test]
fn queue_interrupt_drops_without_callback() { fn queue_interrupt_drops_without_callback() {
let mut s = InterruptState::default(); let mut s = InterruptState::default();
s.queue_interrupt(INTERRUPT_SOURCE_VSYNC); s.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
assert_eq!(s.dropped, 1); assert_eq!(s.dropped, 1);
assert!(s.pending.is_empty()); assert!(s.pending.is_empty());
} }
@@ -315,9 +334,9 @@ mod tests {
fn queue_interrupt_fifo_preserves_order() { fn queue_interrupt_fifo_preserves_order() {
let mut s = InterruptState::default(); let mut s = InterruptState::default();
s.set_callback(0x1000, 0xAB); s.set_callback(0x1000, 0xAB);
s.queue_interrupt(INTERRUPT_SOURCE_VSYNC); s.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
s.queue_interrupt(INTERRUPT_SOURCE_CP); s.queue_interrupt(INTERRUPT_SOURCE_CP, 2);
s.queue_interrupt(INTERRUPT_SOURCE_VSYNC); s.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
assert_eq!(s.dropped, 0); assert_eq!(s.dropped, 0);
// FIFO: take_next hands them out in push order. // FIFO: take_next hands them out in push order.
assert_eq!(s.take_next(), Some(INTERRUPT_SOURCE_VSYNC)); assert_eq!(s.take_next(), Some(INTERRUPT_SOURCE_VSYNC));
@@ -331,11 +350,11 @@ mod tests {
let mut s = InterruptState::default(); let mut s = InterruptState::default();
s.set_callback(0x1000, 0xAB); s.set_callback(0x1000, 0xAB);
for _ in 0..INTERRUPT_QUEUE_CAP { for _ in 0..INTERRUPT_QUEUE_CAP {
s.queue_interrupt(INTERRUPT_SOURCE_VSYNC); s.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
} }
// Over-cap: drops rather than evicting the oldest. // Over-cap: drops rather than evicting the oldest.
s.queue_interrupt(INTERRUPT_SOURCE_VSYNC); s.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
s.queue_interrupt(INTERRUPT_SOURCE_VSYNC); s.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
assert_eq!(s.dropped, 2); assert_eq!(s.dropped, 2);
assert_eq!(s.pending.len(), INTERRUPT_QUEUE_CAP); assert_eq!(s.pending.len(), INTERRUPT_QUEUE_CAP);
} }

View File

@@ -13,7 +13,7 @@ use xenia_memory::{GuestMemory, MemoryAccess};
/// u16 Length /// u16 Length
/// u16 MaximumLength /// u16 MaximumLength
/// u32 Buffer (guest pointer) /// u32 Buffer (guest pointer)
fn read_ansi_string(mem: &GuestMemory, ptr: u32) -> Option<String> { pub fn read_ansi_string(mem: &GuestMemory, ptr: u32) -> Option<String> {
if ptr == 0 { if ptr == 0 {
return None; return None;
} }