[iterate-4A] Milestone-2: XMA audio decoder + RE tooling (dispatch recorder, analyzer vtable-fix, non-perturbing probes)
Milestone-2 (intro video dat/movie/ADV.wmv) audio path + major RE tooling. XMA AUDIO (built, working, deterministic, tested): - APU MMIO 0x7FEA0000 + 320x64B register-mapped context array; real XMACreateContext/Release (xma.rs); real FFmpeg xma2 decoder XMA_CONTEXT_DATA->S16BE PCM (xma_decode.rs, xma2_codec.rs, ffmpeg-sys-next). Decode runs synchronously on the CPU thread (deterministic, no host thread). - Audio-worker scheduler fix (main.rs LR_HALT restore + scheduler.rs): the XAudio render-callback worker was wrongly exited after ~2 deliveries; now survives -> guest drives XMA decode (70 kicks). - XAudioSubmitRenderDriverFrame made faithful. Golden sylpheed_n50m re-baselined; tests pass. RE TOOLING: - Runtime indirect-dispatch recorder (dispatch_rec.rs): records (call-site->target, r3, lr); env-gated XENIA_DISPATCH_REC, filters XENIA_DISPATCH_REC_TARGETS/_SITES; deterministic, observe-only. - Repaired static analyzer (vtables.rs): vtable extraction silently fragmented vtables with non-function head slots (missed the XMV engine vtable). Fixed via vptr-write-anchoring -> engine fully typed (vtables 722->1150 on rebuild). - Fixed probe HEISENBUG (main.rs run_superblock): --audit-pc-probe-hex/--mem-watch no longer disable superblock chaining; probes fire inside the chain loop -> scheduling identical armed-vs-unarmed, movie subsystem now observable. Fixed a --quiet bug swallowing armed trace reports. VIDEO still doesn't play (B, guest-side): the XMV engine never issues begin-playback (sub_825076F0, vtable 0x8200a1e8 slot21) -> never primes -> 2000ms timeout. Narrowed to the ARM2 engine-setup wrappers; no honest our-side gate-fix (masking forbidden). See HANDOFF-iterate-4A-milestone2.md for new-machine setup (incl. the FFmpeg apt deps + sylpheed.db regeneration) and continuation pointers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
217
crates/xenia-cpu/src/dispatch_rec.rs
Normal file
217
crates/xenia-cpu/src/dispatch_rec.rs
Normal file
@@ -0,0 +1,217 @@
|
||||
//! Runtime indirect-dispatch recorder.
|
||||
//!
|
||||
//! A reusable, env-gated facility that captures every indirect call performed
|
||||
//! through CTR (`bcctr`/`bcctrl`/`bctr`) as a unique `(call_site_pc ->
|
||||
//! target_pc)` pair, together with the object register `r3` seen at the call
|
||||
//! and a hit count. It exists to provide GROUND-TRUTH indirect-dispatch
|
||||
//! resolution for reverse-engineering vtable dispatch that the static
|
||||
//! analyzer fails to resolve (e.g. the Sylpheed movie engine vtable
|
||||
//! `0x8200a908`).
|
||||
//!
|
||||
//! ## Gating & overhead
|
||||
//! Recording is OFF by default. It is enabled only when the environment
|
||||
//! variable `XENIA_DISPATCH_REC` is set to a non-empty, non-`0` value at
|
||||
//! process start. When OFF, [`record`] is a single relaxed atomic-bool load
|
||||
//! followed by an early return — no allocation, no locking, no behavior
|
||||
//! change. The recorder is pure: it never reads the clock, never touches
|
||||
//! scheduling, and never mutates guest/CPU state, so enabling it does not
|
||||
//! perturb deterministic runs (only adds a HashMap insert behind a mutex).
|
||||
//!
|
||||
//! ## Focus filters (optional)
|
||||
//! Two env vars narrow what is recorded (both default to "record everything"):
|
||||
//! - `XENIA_DISPATCH_REC_TARGETS=0x82505c08,...` — only edges whose resolved
|
||||
//! target is in the list. Answers "who calls `<target>`": every recorded
|
||||
//! edge then carries the caller `site` and `lr`.
|
||||
//! - `XENIA_DISPATCH_REC_SITES=0x825078d8,...` — only edges from the listed
|
||||
//! call-site PCs.
|
||||
//! When both are set, an edge must satisfy BOTH. These keep a long focused
|
||||
//! run (e.g. the intro-movie trace) producing a small, relevant table instead
|
||||
//! of the whole program-wide dispatch set. Pure observe-only — filtering only
|
||||
//! affects which edges are stored, never guest/CPU state.
|
||||
//!
|
||||
//! ## Output
|
||||
//! On [`dump`] (call at end-of-run) the table is written to the path in
|
||||
//! `XENIA_DISPATCH_REC_OUT` (default `/tmp/dispatch_rec.txt`), sorted by
|
||||
//! descending hit count, one record per line:
|
||||
//! `callsite_pc target_pc count r3=<obj>` (all hex).
|
||||
|
||||
use std::collections::HashMap;
|
||||
use std::sync::atomic::{AtomicBool, Ordering};
|
||||
use std::sync::Mutex;
|
||||
use std::sync::OnceLock;
|
||||
|
||||
/// Enabled flag, resolved once from the environment at first touch.
|
||||
static ENABLED: OnceLock<bool> = OnceLock::new();
|
||||
/// Fast-path mirror of `ENABLED` so the hot path is a single relaxed load
|
||||
/// (avoids the `OnceLock` get + deref on every indirect branch when OFF).
|
||||
static ENABLED_FAST: AtomicBool = AtomicBool::new(false);
|
||||
|
||||
/// One observed indirect-dispatch edge.
|
||||
#[derive(Default, Clone, Copy)]
|
||||
struct Edge {
|
||||
count: u64,
|
||||
/// Last-seen object register (`r3`) at this (site,target) edge. Stable for
|
||||
/// a vtable dispatch where the same call site always dispatches on the
|
||||
/// same kind of object.
|
||||
last_r3: u64,
|
||||
/// Last-seen link register (return address) for the call.
|
||||
last_lr: u64,
|
||||
}
|
||||
|
||||
/// (call_site_pc, target_pc) -> Edge
|
||||
static TABLE: OnceLock<Mutex<HashMap<(u32, u32), Edge>>> = OnceLock::new();
|
||||
|
||||
/// Optional focus filters, resolved once from the environment. When either is
|
||||
/// non-empty, an edge is recorded only if its `target` is in `TARGET_FILTER`
|
||||
/// (when that set is non-empty) AND its `site` is in `SITE_FILTER` (when that
|
||||
/// set is non-empty). Empty sets mean "no constraint on that axis". This lets
|
||||
/// a long focused run (e.g. the intro-movie trace) record ONLY the dispatch
|
||||
/// edges relevant to a target-set under investigation — for example "every
|
||||
/// indirect call whose target is the XMV submit `sub_82505C08`", which answers
|
||||
/// the milestone-2 "who calls submit on the engine" question with the caller
|
||||
/// `lr` — instead of the whole (large) program-wide dispatch table.
|
||||
static TARGET_FILTER: OnceLock<Vec<u32>> = OnceLock::new();
|
||||
static SITE_FILTER: OnceLock<Vec<u32>> = OnceLock::new();
|
||||
|
||||
/// Parse a comma-separated list of hex PCs (`0x` prefix optional) into a
|
||||
/// sorted, deduped Vec. Empty/garbage tokens are skipped.
|
||||
fn parse_pc_list_str(s: &str) -> Vec<u32> {
|
||||
let mut v: Vec<u32> = s
|
||||
.split(',')
|
||||
.map(str::trim)
|
||||
.filter(|t| !t.is_empty())
|
||||
.filter_map(|t| {
|
||||
let hex = t.strip_prefix("0x").or_else(|| t.strip_prefix("0X")).unwrap_or(t);
|
||||
u32::from_str_radix(hex, 16).ok()
|
||||
})
|
||||
.collect();
|
||||
v.sort_unstable();
|
||||
v.dedup();
|
||||
v
|
||||
}
|
||||
|
||||
/// Parse a PC list from an env var. Missing var → empty Vec (no constraint).
|
||||
fn parse_pc_list(var: &str) -> Vec<u32> {
|
||||
match std::env::var(var) {
|
||||
Ok(s) => parse_pc_list_str(&s),
|
||||
Err(_) => Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Resolve the enabled flag (and focus filters) from the environment exactly
|
||||
/// once.
|
||||
fn init_enabled() -> bool {
|
||||
let on = match std::env::var("XENIA_DISPATCH_REC") {
|
||||
Ok(v) => !v.is_empty() && v != "0",
|
||||
Err(_) => false,
|
||||
};
|
||||
ENABLED_FAST.store(on, Ordering::Relaxed);
|
||||
let _ = TARGET_FILTER.set(parse_pc_list("XENIA_DISPATCH_REC_TARGETS"));
|
||||
let _ = SITE_FILTER.set(parse_pc_list("XENIA_DISPATCH_REC_SITES"));
|
||||
on
|
||||
}
|
||||
|
||||
/// Whether recording is enabled. Cheap after the first call.
|
||||
#[inline(always)]
|
||||
pub fn enabled() -> bool {
|
||||
// Hot path: relaxed atomic load. ENABLED_FAST is initialised by the first
|
||||
// call to `enabled_init` (below); until then it is `false`, which is also
|
||||
// the correct default. We force initialisation eagerly from `install`.
|
||||
ENABLED_FAST.load(Ordering::Relaxed)
|
||||
}
|
||||
|
||||
/// Force the env resolution (call once early in startup). Idempotent.
|
||||
pub fn install() {
|
||||
let _ = ENABLED.get_or_init(init_enabled);
|
||||
}
|
||||
|
||||
/// Record one indirect (CTR) call edge. No-op when disabled.
|
||||
///
|
||||
/// `site` = PC of the `bcctr`/`bctr` instruction, `target` = resolved CTR
|
||||
/// target, `r3` = object register at the call, `lr` = link register.
|
||||
#[inline(always)]
|
||||
pub fn record(site: u32, target: u32, r3: u64, lr: u64) {
|
||||
// Single predictable branch when OFF.
|
||||
if !ENABLED_FAST.load(Ordering::Relaxed) {
|
||||
return;
|
||||
}
|
||||
// Focus filters (only consulted when recording is ON, i.e. rare). An empty
|
||||
// filter set imposes no constraint on its axis.
|
||||
if let Some(targets) = TARGET_FILTER.get()
|
||||
&& !targets.is_empty()
|
||||
&& targets.binary_search(&target).is_err()
|
||||
{
|
||||
return;
|
||||
}
|
||||
if let Some(sites) = SITE_FILTER.get()
|
||||
&& !sites.is_empty()
|
||||
&& sites.binary_search(&site).is_err()
|
||||
{
|
||||
return;
|
||||
}
|
||||
let table = TABLE.get_or_init(|| Mutex::new(HashMap::new()));
|
||||
if let Ok(mut t) = table.lock() {
|
||||
let e = t.entry((site, target)).or_default();
|
||||
e.count += 1;
|
||||
e.last_r3 = r3;
|
||||
e.last_lr = lr;
|
||||
}
|
||||
}
|
||||
|
||||
/// Dump the recorded table to the output file. No-op when disabled or empty.
|
||||
pub fn dump() {
|
||||
if !enabled() {
|
||||
return;
|
||||
}
|
||||
let path = std::env::var("XENIA_DISPATCH_REC_OUT")
|
||||
.unwrap_or_else(|_| "/tmp/dispatch_rec.txt".to_string());
|
||||
let table = match TABLE.get() {
|
||||
Some(t) => t,
|
||||
None => return,
|
||||
};
|
||||
let guard = match table.lock() {
|
||||
Ok(g) => g,
|
||||
Err(_) => return,
|
||||
};
|
||||
let mut rows: Vec<((u32, u32), Edge)> =
|
||||
guard.iter().map(|(k, v)| (*k, *v)).collect();
|
||||
// Deterministic order: count desc, then site, then target.
|
||||
rows.sort_by(|a, b| {
|
||||
b.1.count
|
||||
.cmp(&a.1.count)
|
||||
.then(a.0 .0.cmp(&b.0 .0))
|
||||
.then(a.0 .1.cmp(&b.0 .1))
|
||||
});
|
||||
let mut out = String::with_capacity(rows.len() * 48);
|
||||
out.push_str("# callsite_pc target_pc count r3 lr\n");
|
||||
for ((site, target), e) in rows {
|
||||
out.push_str(&format!(
|
||||
"{:#010x} {:#010x} {} r3={:#018x} lr={:#018x}\n",
|
||||
site, target, e.count, e.last_r3, e.last_lr
|
||||
));
|
||||
}
|
||||
if let Err(err) = std::fs::write(&path, out) {
|
||||
eprintln!("dispatch_rec: failed to write {}: {}", path, err);
|
||||
} else {
|
||||
eprintln!("dispatch_rec: wrote {} edges to {}", guard.len(), path);
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::parse_pc_list_str;
|
||||
|
||||
#[test]
|
||||
fn parse_pc_list_handles_prefixes_whitespace_and_dedup() {
|
||||
// Mixed 0x / bare hex, surrounding whitespace, an empty token, and a
|
||||
// duplicate. Result is sorted + deduped; garbage tokens are dropped.
|
||||
let got = parse_pc_list_str(" 0x82505c08 , 825078d8,, 82505c08 , zzz ");
|
||||
assert_eq!(got, vec![0x82505c08, 0x825078d8]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_pc_list_empty_is_no_constraint() {
|
||||
assert!(parse_pc_list_str("").is_empty());
|
||||
assert!(parse_pc_list_str(" , , ").is_empty());
|
||||
}
|
||||
}
|
||||
@@ -1012,7 +1012,13 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
|
||||
|
||||
if cond_ok {
|
||||
let next_pc = ctx.pc + 4;
|
||||
ctx.pc = (ctx.ctr as u32) & !3;
|
||||
let target = (ctx.ctr as u32) & !3;
|
||||
// Ground-truth indirect-dispatch recording (env-gated, off by
|
||||
// default; pure record-only, no scheduling/state change).
|
||||
if crate::dispatch_rec::enabled() {
|
||||
crate::dispatch_rec::record(ctx.pc, target, ctx.gpr[3], ctx.lr);
|
||||
}
|
||||
ctx.pc = target;
|
||||
if instr.lk() {
|
||||
ctx.lr = next_pc as u64;
|
||||
}
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
pub mod block_cache;
|
||||
pub mod context;
|
||||
pub mod decoder;
|
||||
pub mod dispatch_rec;
|
||||
pub mod disasm;
|
||||
pub mod fpscr;
|
||||
pub mod interpreter;
|
||||
|
||||
@@ -205,6 +205,21 @@ pub enum BlockReason {
|
||||
CriticalSection(u32),
|
||||
}
|
||||
|
||||
/// Floor of the **synthetic park-handle** range. Handles at or above this
|
||||
/// value are deliberately OUTSIDE the kernel object allocator (which starts
|
||||
/// at `0x1000`); they are used to park threads that must NEVER be woken by
|
||||
/// the normal signal/wait machinery — currently the dedicated audio-worker
|
||||
/// threads (`xenia_kernel::xaudio::XAUDIO_SYNTHETIC_HANDLE_BASE = 0xF000_0000`),
|
||||
/// which are only ever un-parked by audio-callback injection. The deadlock
|
||||
/// force-wake ([`Scheduler::unblock_on_deadlock`]) must skip waiters parked
|
||||
/// solely on such handles: they are not deadlock participants (the guest
|
||||
/// genuinely blocked on its own objects), and waking one runs its thread
|
||||
/// entry to the `LR_HALT` sentinel → premature exit, which then drops every
|
||||
/// subsequent injection. Kept in `xenia-cpu` (not imported from
|
||||
/// `xenia-kernel`, which depends on this crate); the kernel const must stay
|
||||
/// within `[SYNTHETIC_PARK_HANDLE_FLOOR, u32::MAX]`.
|
||||
pub const SYNTHETIC_PARK_HANDLE_FLOOR: u32 = 0xF000_0000;
|
||||
|
||||
/// Sink for PCR+0x2C writes — the scheduler writes the guest-visible
|
||||
/// current-processor-id here at spawn and Axis 4 rewrites on affinity
|
||||
/// migration. Implemented by `xenia-kernel` for `GuestMemory`; keeping it
|
||||
@@ -1399,6 +1414,27 @@ impl Scheduler {
|
||||
let mut woken = Vec::new();
|
||||
for (hw_id, slot) in self.slots.iter_mut().enumerate() {
|
||||
for (idx, t) in slot.runqueue.iter_mut().enumerate() {
|
||||
// Skip threads parked SOLELY on synthetic park-handles
|
||||
// (audio workers). They are not deadlock participants — the
|
||||
// guest blocked on its own objects — and waking one runs its
|
||||
// thread entry to the LR_HALT sentinel, exiting it and
|
||||
// dropping every subsequent audio-callback injection. Only
|
||||
// audio-callback injection may un-park them. A wait whose
|
||||
// handle set mixes synthetic and real handles is still
|
||||
// eligible (the real handle makes it a genuine waiter).
|
||||
let synthetic_park = match &t.state {
|
||||
HwState::Blocked(BlockReason::WaitAny { handles, .. })
|
||||
| HwState::Blocked(BlockReason::WaitAll { handles, .. }) => {
|
||||
!handles.is_empty()
|
||||
&& handles
|
||||
.iter()
|
||||
.all(|&h| h >= SYNTHETIC_PARK_HANDLE_FLOOR)
|
||||
}
|
||||
_ => false,
|
||||
};
|
||||
if synthetic_park {
|
||||
continue;
|
||||
}
|
||||
if matches!(
|
||||
t.state,
|
||||
HwState::Blocked(BlockReason::WaitAny { .. })
|
||||
@@ -1485,6 +1521,41 @@ mod tests {
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unblock_on_deadlock_skips_synthetic_park_waiters() {
|
||||
// The audio worker parks on a synthetic handle (>= FLOOR) and must
|
||||
// survive the deadlock force-wake; a peer parked on a real handle
|
||||
// must be woken. Regression for the milestone-2 stall where the
|
||||
// force-wake destroyed the audio worker → all callbacks dropped.
|
||||
let mut s = mk_scheduler_with_initial();
|
||||
s.spawn(worker_spawn_params(2, 0x2000), &mut NullPcr).unwrap();
|
||||
s.spawn(worker_spawn_params(3, 0x2010), &mut NullPcr).unwrap();
|
||||
let audio = ThreadRef { hw_id: 1, idx: 0, generation: 0 };
|
||||
let real = ThreadRef { hw_id: 2, idx: 0, generation: 0 };
|
||||
s.thread_mut(audio).state = HwState::Blocked(BlockReason::WaitAny {
|
||||
handles: vec![SYNTHETIC_PARK_HANDLE_FLOOR],
|
||||
deadline: None,
|
||||
});
|
||||
s.thread_mut(real).state = HwState::Blocked(BlockReason::WaitAny {
|
||||
handles: vec![0x1234],
|
||||
deadline: None,
|
||||
});
|
||||
let woken = s.unblock_on_deadlock();
|
||||
assert!(
|
||||
woken.contains(&real),
|
||||
"real-handle waiter must be force-woken"
|
||||
);
|
||||
assert!(
|
||||
!woken.contains(&audio),
|
||||
"synthetic-park audio worker must NOT be force-woken"
|
||||
);
|
||||
assert!(matches!(
|
||||
s.thread(audio).state,
|
||||
HwState::Blocked(BlockReason::WaitAny { .. })
|
||||
));
|
||||
assert_eq!(s.thread(real).state, HwState::Ready);
|
||||
}
|
||||
|
||||
// ---- preserved from pre-Axis-1 (updated names and params) ----
|
||||
|
||||
#[test]
|
||||
|
||||
Reference in New Issue
Block a user