[iterate-4A] Milestone-2: XMA audio decoder + RE tooling (dispatch recorder, analyzer vtable-fix, non-perturbing probes)

Milestone-2 (intro video dat/movie/ADV.wmv) audio path + major RE tooling.

XMA AUDIO (built, working, deterministic, tested):
- APU MMIO 0x7FEA0000 + 320x64B register-mapped context array; real XMACreateContext/Release
  (xma.rs); real FFmpeg xma2 decoder XMA_CONTEXT_DATA->S16BE PCM (xma_decode.rs, xma2_codec.rs,
  ffmpeg-sys-next). Decode runs synchronously on the CPU thread (deterministic, no host thread).
- Audio-worker scheduler fix (main.rs LR_HALT restore + scheduler.rs): the XAudio render-callback
  worker was wrongly exited after ~2 deliveries; now survives -> guest drives XMA decode (70 kicks).
- XAudioSubmitRenderDriverFrame made faithful. Golden sylpheed_n50m re-baselined; tests pass.

RE TOOLING:
- Runtime indirect-dispatch recorder (dispatch_rec.rs): records (call-site->target, r3, lr);
  env-gated XENIA_DISPATCH_REC, filters XENIA_DISPATCH_REC_TARGETS/_SITES; deterministic, observe-only.
- Repaired static analyzer (vtables.rs): vtable extraction silently fragmented vtables with
  non-function head slots (missed the XMV engine vtable). Fixed via vptr-write-anchoring -> engine
  fully typed (vtables 722->1150 on rebuild).
- Fixed probe HEISENBUG (main.rs run_superblock): --audit-pc-probe-hex/--mem-watch no longer disable
  superblock chaining; probes fire inside the chain loop -> scheduling identical armed-vs-unarmed,
  movie subsystem now observable. Fixed a --quiet bug swallowing armed trace reports.

VIDEO still doesn't play (B, guest-side): the XMV engine never issues begin-playback (sub_825076F0,
vtable 0x8200a1e8 slot21) -> never primes -> 2000ms timeout. Narrowed to the ARM2 engine-setup
wrappers; no honest our-side gate-fix (masking forbidden). See HANDOFF-iterate-4A-milestone2.md for
new-machine setup (incl. the FFmpeg apt deps + sylpheed.db regeneration) and continuation pointers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-21 21:38:19 +02:00
parent acb29db444
commit 23189b95af
19 changed files with 3106 additions and 46 deletions

View File

@@ -0,0 +1,217 @@
//! Runtime indirect-dispatch recorder.
//!
//! A reusable, env-gated facility that captures every indirect call performed
//! through CTR (`bcctr`/`bcctrl`/`bctr`) as a unique `(call_site_pc ->
//! target_pc)` pair, together with the object register `r3` seen at the call
//! and a hit count. It exists to provide GROUND-TRUTH indirect-dispatch
//! resolution for reverse-engineering vtable dispatch that the static
//! analyzer fails to resolve (e.g. the Sylpheed movie engine vtable
//! `0x8200a908`).
//!
//! ## Gating & overhead
//! Recording is OFF by default. It is enabled only when the environment
//! variable `XENIA_DISPATCH_REC` is set to a non-empty, non-`0` value at
//! process start. When OFF, [`record`] is a single relaxed atomic-bool load
//! followed by an early return — no allocation, no locking, no behavior
//! change. The recorder is pure: it never reads the clock, never touches
//! scheduling, and never mutates guest/CPU state, so enabling it does not
//! perturb deterministic runs (only adds a HashMap insert behind a mutex).
//!
//! ## Focus filters (optional)
//! Two env vars narrow what is recorded (both default to "record everything"):
//! - `XENIA_DISPATCH_REC_TARGETS=0x82505c08,...` — only edges whose resolved
//! target is in the list. Answers "who calls `<target>`": every recorded
//! edge then carries the caller `site` and `lr`.
//! - `XENIA_DISPATCH_REC_SITES=0x825078d8,...` — only edges from the listed
//! call-site PCs.
//! When both are set, an edge must satisfy BOTH. These keep a long focused
//! run (e.g. the intro-movie trace) producing a small, relevant table instead
//! of the whole program-wide dispatch set. Pure observe-only — filtering only
//! affects which edges are stored, never guest/CPU state.
//!
//! ## Output
//! On [`dump`] (call at end-of-run) the table is written to the path in
//! `XENIA_DISPATCH_REC_OUT` (default `/tmp/dispatch_rec.txt`), sorted by
//! descending hit count, one record per line:
//! `callsite_pc target_pc count r3=<obj>` (all hex).
use std::collections::HashMap;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Mutex;
use std::sync::OnceLock;
/// Enabled flag, resolved once from the environment at first touch.
static ENABLED: OnceLock<bool> = OnceLock::new();
/// Fast-path mirror of `ENABLED` so the hot path is a single relaxed load
/// (avoids the `OnceLock` get + deref on every indirect branch when OFF).
static ENABLED_FAST: AtomicBool = AtomicBool::new(false);
/// One observed indirect-dispatch edge.
#[derive(Default, Clone, Copy)]
struct Edge {
count: u64,
/// Last-seen object register (`r3`) at this (site,target) edge. Stable for
/// a vtable dispatch where the same call site always dispatches on the
/// same kind of object.
last_r3: u64,
/// Last-seen link register (return address) for the call.
last_lr: u64,
}
/// (call_site_pc, target_pc) -> Edge
static TABLE: OnceLock<Mutex<HashMap<(u32, u32), Edge>>> = OnceLock::new();
/// Optional focus filters, resolved once from the environment. When either is
/// non-empty, an edge is recorded only if its `target` is in `TARGET_FILTER`
/// (when that set is non-empty) AND its `site` is in `SITE_FILTER` (when that
/// set is non-empty). Empty sets mean "no constraint on that axis". This lets
/// a long focused run (e.g. the intro-movie trace) record ONLY the dispatch
/// edges relevant to a target-set under investigation — for example "every
/// indirect call whose target is the XMV submit `sub_82505C08`", which answers
/// the milestone-2 "who calls submit on the engine" question with the caller
/// `lr` — instead of the whole (large) program-wide dispatch table.
static TARGET_FILTER: OnceLock<Vec<u32>> = OnceLock::new();
static SITE_FILTER: OnceLock<Vec<u32>> = OnceLock::new();
/// Parse a comma-separated list of hex PCs (`0x` prefix optional) into a
/// sorted, deduped Vec. Empty/garbage tokens are skipped.
fn parse_pc_list_str(s: &str) -> Vec<u32> {
let mut v: Vec<u32> = s
.split(',')
.map(str::trim)
.filter(|t| !t.is_empty())
.filter_map(|t| {
let hex = t.strip_prefix("0x").or_else(|| t.strip_prefix("0X")).unwrap_or(t);
u32::from_str_radix(hex, 16).ok()
})
.collect();
v.sort_unstable();
v.dedup();
v
}
/// Parse a PC list from an env var. Missing var → empty Vec (no constraint).
fn parse_pc_list(var: &str) -> Vec<u32> {
match std::env::var(var) {
Ok(s) => parse_pc_list_str(&s),
Err(_) => Vec::new(),
}
}
/// Resolve the enabled flag (and focus filters) from the environment exactly
/// once.
fn init_enabled() -> bool {
let on = match std::env::var("XENIA_DISPATCH_REC") {
Ok(v) => !v.is_empty() && v != "0",
Err(_) => false,
};
ENABLED_FAST.store(on, Ordering::Relaxed);
let _ = TARGET_FILTER.set(parse_pc_list("XENIA_DISPATCH_REC_TARGETS"));
let _ = SITE_FILTER.set(parse_pc_list("XENIA_DISPATCH_REC_SITES"));
on
}
/// Whether recording is enabled. Cheap after the first call.
#[inline(always)]
pub fn enabled() -> bool {
// Hot path: relaxed atomic load. ENABLED_FAST is initialised by the first
// call to `enabled_init` (below); until then it is `false`, which is also
// the correct default. We force initialisation eagerly from `install`.
ENABLED_FAST.load(Ordering::Relaxed)
}
/// Force the env resolution (call once early in startup). Idempotent.
pub fn install() {
let _ = ENABLED.get_or_init(init_enabled);
}
/// Record one indirect (CTR) call edge. No-op when disabled.
///
/// `site` = PC of the `bcctr`/`bctr` instruction, `target` = resolved CTR
/// target, `r3` = object register at the call, `lr` = link register.
#[inline(always)]
pub fn record(site: u32, target: u32, r3: u64, lr: u64) {
// Single predictable branch when OFF.
if !ENABLED_FAST.load(Ordering::Relaxed) {
return;
}
// Focus filters (only consulted when recording is ON, i.e. rare). An empty
// filter set imposes no constraint on its axis.
if let Some(targets) = TARGET_FILTER.get()
&& !targets.is_empty()
&& targets.binary_search(&target).is_err()
{
return;
}
if let Some(sites) = SITE_FILTER.get()
&& !sites.is_empty()
&& sites.binary_search(&site).is_err()
{
return;
}
let table = TABLE.get_or_init(|| Mutex::new(HashMap::new()));
if let Ok(mut t) = table.lock() {
let e = t.entry((site, target)).or_default();
e.count += 1;
e.last_r3 = r3;
e.last_lr = lr;
}
}
/// Dump the recorded table to the output file. No-op when disabled or empty.
pub fn dump() {
if !enabled() {
return;
}
let path = std::env::var("XENIA_DISPATCH_REC_OUT")
.unwrap_or_else(|_| "/tmp/dispatch_rec.txt".to_string());
let table = match TABLE.get() {
Some(t) => t,
None => return,
};
let guard = match table.lock() {
Ok(g) => g,
Err(_) => return,
};
let mut rows: Vec<((u32, u32), Edge)> =
guard.iter().map(|(k, v)| (*k, *v)).collect();
// Deterministic order: count desc, then site, then target.
rows.sort_by(|a, b| {
b.1.count
.cmp(&a.1.count)
.then(a.0 .0.cmp(&b.0 .0))
.then(a.0 .1.cmp(&b.0 .1))
});
let mut out = String::with_capacity(rows.len() * 48);
out.push_str("# callsite_pc target_pc count r3 lr\n");
for ((site, target), e) in rows {
out.push_str(&format!(
"{:#010x} {:#010x} {} r3={:#018x} lr={:#018x}\n",
site, target, e.count, e.last_r3, e.last_lr
));
}
if let Err(err) = std::fs::write(&path, out) {
eprintln!("dispatch_rec: failed to write {}: {}", path, err);
} else {
eprintln!("dispatch_rec: wrote {} edges to {}", guard.len(), path);
}
}
#[cfg(test)]
mod tests {
use super::parse_pc_list_str;
#[test]
fn parse_pc_list_handles_prefixes_whitespace_and_dedup() {
// Mixed 0x / bare hex, surrounding whitespace, an empty token, and a
// duplicate. Result is sorted + deduped; garbage tokens are dropped.
let got = parse_pc_list_str(" 0x82505c08 , 825078d8,, 82505c08 , zzz ");
assert_eq!(got, vec![0x82505c08, 0x825078d8]);
}
#[test]
fn parse_pc_list_empty_is_no_constraint() {
assert!(parse_pc_list_str("").is_empty());
assert!(parse_pc_list_str(" , , ").is_empty());
}
}

View File

@@ -1012,7 +1012,13 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
if cond_ok {
let next_pc = ctx.pc + 4;
ctx.pc = (ctx.ctr as u32) & !3;
let target = (ctx.ctr as u32) & !3;
// Ground-truth indirect-dispatch recording (env-gated, off by
// default; pure record-only, no scheduling/state change).
if crate::dispatch_rec::enabled() {
crate::dispatch_rec::record(ctx.pc, target, ctx.gpr[3], ctx.lr);
}
ctx.pc = target;
if instr.lk() {
ctx.lr = next_pc as u64;
}

View File

@@ -1,6 +1,7 @@
pub mod block_cache;
pub mod context;
pub mod decoder;
pub mod dispatch_rec;
pub mod disasm;
pub mod fpscr;
pub mod interpreter;

View File

@@ -205,6 +205,21 @@ pub enum BlockReason {
CriticalSection(u32),
}
/// Floor of the **synthetic park-handle** range. Handles at or above this
/// value are deliberately OUTSIDE the kernel object allocator (which starts
/// at `0x1000`); they are used to park threads that must NEVER be woken by
/// the normal signal/wait machinery — currently the dedicated audio-worker
/// threads (`xenia_kernel::xaudio::XAUDIO_SYNTHETIC_HANDLE_BASE = 0xF000_0000`),
/// which are only ever un-parked by audio-callback injection. The deadlock
/// force-wake ([`Scheduler::unblock_on_deadlock`]) must skip waiters parked
/// solely on such handles: they are not deadlock participants (the guest
/// genuinely blocked on its own objects), and waking one runs its thread
/// entry to the `LR_HALT` sentinel → premature exit, which then drops every
/// subsequent injection. Kept in `xenia-cpu` (not imported from
/// `xenia-kernel`, which depends on this crate); the kernel const must stay
/// within `[SYNTHETIC_PARK_HANDLE_FLOOR, u32::MAX]`.
pub const SYNTHETIC_PARK_HANDLE_FLOOR: u32 = 0xF000_0000;
/// Sink for PCR+0x2C writes — the scheduler writes the guest-visible
/// current-processor-id here at spawn and Axis 4 rewrites on affinity
/// migration. Implemented by `xenia-kernel` for `GuestMemory`; keeping it
@@ -1399,6 +1414,27 @@ impl Scheduler {
let mut woken = Vec::new();
for (hw_id, slot) in self.slots.iter_mut().enumerate() {
for (idx, t) in slot.runqueue.iter_mut().enumerate() {
// Skip threads parked SOLELY on synthetic park-handles
// (audio workers). They are not deadlock participants — the
// guest blocked on its own objects — and waking one runs its
// thread entry to the LR_HALT sentinel, exiting it and
// dropping every subsequent audio-callback injection. Only
// audio-callback injection may un-park them. A wait whose
// handle set mixes synthetic and real handles is still
// eligible (the real handle makes it a genuine waiter).
let synthetic_park = match &t.state {
HwState::Blocked(BlockReason::WaitAny { handles, .. })
| HwState::Blocked(BlockReason::WaitAll { handles, .. }) => {
!handles.is_empty()
&& handles
.iter()
.all(|&h| h >= SYNTHETIC_PARK_HANDLE_FLOOR)
}
_ => false,
};
if synthetic_park {
continue;
}
if matches!(
t.state,
HwState::Blocked(BlockReason::WaitAny { .. })
@@ -1485,6 +1521,41 @@ mod tests {
}
}
#[test]
fn unblock_on_deadlock_skips_synthetic_park_waiters() {
// The audio worker parks on a synthetic handle (>= FLOOR) and must
// survive the deadlock force-wake; a peer parked on a real handle
// must be woken. Regression for the milestone-2 stall where the
// force-wake destroyed the audio worker → all callbacks dropped.
let mut s = mk_scheduler_with_initial();
s.spawn(worker_spawn_params(2, 0x2000), &mut NullPcr).unwrap();
s.spawn(worker_spawn_params(3, 0x2010), &mut NullPcr).unwrap();
let audio = ThreadRef { hw_id: 1, idx: 0, generation: 0 };
let real = ThreadRef { hw_id: 2, idx: 0, generation: 0 };
s.thread_mut(audio).state = HwState::Blocked(BlockReason::WaitAny {
handles: vec![SYNTHETIC_PARK_HANDLE_FLOOR],
deadline: None,
});
s.thread_mut(real).state = HwState::Blocked(BlockReason::WaitAny {
handles: vec![0x1234],
deadline: None,
});
let woken = s.unblock_on_deadlock();
assert!(
woken.contains(&real),
"real-handle waiter must be force-woken"
);
assert!(
!woken.contains(&audio),
"synthetic-park audio worker must NOT be force-woken"
);
assert!(matches!(
s.thread(audio).state,
HwState::Blocked(BlockReason::WaitAny { .. })
));
assert_eq!(s.thread(real).state, HwState::Ready);
}
// ---- preserved from pre-Axis-1 (updated names and params) ----
#[test]