[iterate-4A] Milestone-2: XMA audio decoder + RE tooling (dispatch recorder, analyzer vtable-fix, non-perturbing probes)
Milestone-2 (intro video dat/movie/ADV.wmv) audio path + major RE tooling. XMA AUDIO (built, working, deterministic, tested): - APU MMIO 0x7FEA0000 + 320x64B register-mapped context array; real XMACreateContext/Release (xma.rs); real FFmpeg xma2 decoder XMA_CONTEXT_DATA->S16BE PCM (xma_decode.rs, xma2_codec.rs, ffmpeg-sys-next). Decode runs synchronously on the CPU thread (deterministic, no host thread). - Audio-worker scheduler fix (main.rs LR_HALT restore + scheduler.rs): the XAudio render-callback worker was wrongly exited after ~2 deliveries; now survives -> guest drives XMA decode (70 kicks). - XAudioSubmitRenderDriverFrame made faithful. Golden sylpheed_n50m re-baselined; tests pass. RE TOOLING: - Runtime indirect-dispatch recorder (dispatch_rec.rs): records (call-site->target, r3, lr); env-gated XENIA_DISPATCH_REC, filters XENIA_DISPATCH_REC_TARGETS/_SITES; deterministic, observe-only. - Repaired static analyzer (vtables.rs): vtable extraction silently fragmented vtables with non-function head slots (missed the XMV engine vtable). Fixed via vptr-write-anchoring -> engine fully typed (vtables 722->1150 on rebuild). - Fixed probe HEISENBUG (main.rs run_superblock): --audit-pc-probe-hex/--mem-watch no longer disable superblock chaining; probes fire inside the chain loop -> scheduling identical armed-vs-unarmed, movie subsystem now observable. Fixed a --quiet bug swallowing armed trace reports. VIDEO still doesn't play (B, guest-side): the XMV engine never issues begin-playback (sub_825076F0, vtable 0x8200a1e8 slot21) -> never primes -> 2000ms timeout. Narrowed to the ARM2 engine-setup wrappers; no honest our-side gate-fix (masking forbidden). See HANDOFF-iterate-4A-milestone2.md for new-machine setup (incl. the FFmpeg apt deps + sylpheed.db regeneration) and continuation pointers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -415,6 +415,18 @@ fn main() -> Result<()> {
|
||||
// metrics summary.
|
||||
let _obs = observability::init(&config)?;
|
||||
|
||||
// Env-gated indirect-dispatch recorder (off by default). Resolve the env
|
||||
// once here; a scope guard dumps the recorded (call_site -> target) table
|
||||
// at end-of-run no matter how the run terminates.
|
||||
xenia_cpu::dispatch_rec::install();
|
||||
struct DispatchRecGuard;
|
||||
impl Drop for DispatchRecGuard {
|
||||
fn drop(&mut self) {
|
||||
xenia_cpu::dispatch_rec::dump();
|
||||
}
|
||||
}
|
||||
let _dispatch_rec_guard = DispatchRecGuard;
|
||||
|
||||
let result = match cli.command {
|
||||
Commands::Disasm { path, count, at } => cmd_disasm(&path, count, at),
|
||||
Commands::Exec {
|
||||
@@ -1437,6 +1449,45 @@ fn cmd_exec_inner(
|
||||
// atoms that live inside `kernel.gpu.mmio`.
|
||||
mem.add_mmio_region(xenia_gpu::build_mmio_region(kernel.gpu.mmio()));
|
||||
|
||||
// apu stage 1 — reserve the 320-entry XMA context array and install the
|
||||
// `0x7FEA0000` register aperture (mirrors canary's `XmaDecoder::Setup`).
|
||||
//
|
||||
// Physical placement: canary stores a *physical* address in
|
||||
// `ContextArrayAddress` (reg 0x600) — `PhysicalHeap::GetPhysicalAddress`
|
||||
// returns `va - heap_base` (== `va & 0x1FFFFFFF` for the physical heaps).
|
||||
// Our memory model is FLAT: `translate_virtual` is a raw `membase + addr`
|
||||
// with no separate physical-window mirror, and `translate_physical` masks
|
||||
// `& 0x1FFFFFFF` — so the two only coincide for low (`< 0x2000_0000`) VAs.
|
||||
// `heap_alloc` returns a `0x40000000`-region VA, so `va & 0x1FFFFFFF` would
|
||||
// be 0 (disagreeing with the context pointers `XMACreateContext` hands out
|
||||
// at `va + i*64`). The guest reads `ContextArrayAddress` and indexes it as
|
||||
// `base + i*64`; for that to equal the pointers it dereferences, the base
|
||||
// MUST equal the VA. So we advertise `va` itself — self-consistent in the
|
||||
// flat model (the guest reaches every context through the same VA space).
|
||||
// Stage 3's decoder will read the context structs via this VA directly
|
||||
// (not via `translate_physical`). The 20480-byte buffer is page-committed
|
||||
// by `heap_alloc`, so the guest never faults writing the 64-byte structs.
|
||||
{
|
||||
let array_size =
|
||||
(xenia_apu::XMA_CONTEXT_COUNT as u32) * xenia_apu::XMA_CONTEXT_SIZE; // 320 * 64
|
||||
match kernel.heap_alloc(array_size, &mem) {
|
||||
Some(va) => {
|
||||
let phys = va; // flat model: array base == VA (see note above)
|
||||
kernel.xma.lock().unwrap().init(va, phys);
|
||||
mem.add_mmio_region(xenia_apu::build_mmio_region(kernel.xma.clone()));
|
||||
tracing::info!(
|
||||
va = format_args!("{va:#010x}"),
|
||||
phys = format_args!("{phys:#010x}"),
|
||||
size = format_args!("{array_size:#x}"),
|
||||
"xma: context array reserved + 0x7FEA0000 aperture installed"
|
||||
);
|
||||
}
|
||||
None => {
|
||||
tracing::error!("xma: failed to reserve context array (heap exhausted)");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Install the initial guest thread on HW slot 0. The thread handle we
|
||||
// hand the scheduler isn't visible to any guest API yet, but joiners
|
||||
// (XThreadWait-style) will see it via `find_by_tid`.
|
||||
@@ -2354,6 +2405,14 @@ fn coord_post_round(
|
||||
let _ = gpu_runs;
|
||||
}
|
||||
|
||||
// APU stage 3 — pump the XMA decoder on the CPU thread, same cadence as the
|
||||
// inline GPU. Deterministic (no host thread / clock): for each context with
|
||||
// a pending kick it runs one Work() pass, decoding the guest's XMA packets
|
||||
// into PCM and writing it back into the output ring + context struct.
|
||||
if let Ok(mut xma) = kernel.xma.try_lock() {
|
||||
xma.decode_pending(mem);
|
||||
}
|
||||
|
||||
if kernel.gpu.has_pending_interrupts() {
|
||||
for pi in kernel.gpu.take_pending_interrupts() {
|
||||
// Canary `ExecutePacketType3_INTERRUPT` dispatches the callback
|
||||
@@ -2445,7 +2504,7 @@ fn worker_prologue(
|
||||
stats: &mut ExecStats,
|
||||
) -> PrologueOutcome {
|
||||
use xenia_cpu::interpreter::{step_cached, StepResult};
|
||||
use xenia_cpu::scheduler::{HwState, INITIAL_GUEST_TID};
|
||||
use xenia_cpu::scheduler::{BlockReason, HwState, INITIAL_GUEST_TID};
|
||||
use xenia_cpu::PpcOpcode;
|
||||
const LR_HALT: u32 = xenia_cpu::context::LR_HALT_SENTINEL as u32;
|
||||
|
||||
@@ -2492,12 +2551,26 @@ fn worker_prologue(
|
||||
|
||||
// 1) Halt-sentinel check (per HW thread).
|
||||
if pc == LR_HALT {
|
||||
// iterate-4A: the async audio-callback injection (`try_inject_audio_callback`)
|
||||
// sets `interrupts.saved`/`injected_ref` to the dedicated audio
|
||||
// worker and runs REAL guest code (`sub_824D29F0`, which calls
|
||||
// blocking kernel APIs) across MANY scheduler rounds before
|
||||
// returning to `LR_HALT_SENTINEL`. The restore must fire only when
|
||||
// the thread that *actually* reached the sentinel is the injected
|
||||
// worker itself — i.e. the FULL `ThreadRef` (hw_id AND idx), which
|
||||
// `scheduler.current` holds after `begin_slot_visit`. Matching on
|
||||
// `hw_id` alone let ANY OTHER thread sharing that HW slot reach
|
||||
// `LR_HALT` and consume the audio worker's `saved` slot; when the
|
||||
// worker later truly returned, `saved` was already `None`, the
|
||||
// guard failed, and control fell through to "marking exited" — the
|
||||
// worker was removed and every subsequent audio callback dropped
|
||||
// (`find_by_handle` skips Exited threads). The graphics ISR path is
|
||||
// fully synchronous (`dispatch_graphics_interrupts` restores inline
|
||||
// and never leaves `interrupts.saved` set across rounds), so this
|
||||
// restore lifecycle is exclusive to audio and graphics is
|
||||
// unaffected.
|
||||
let injected_here = kernel.interrupts.saved.is_some()
|
||||
&& kernel
|
||||
.interrupts
|
||||
.injected_ref
|
||||
.map(|r| r.hw_id == hw_id)
|
||||
== Some(true);
|
||||
&& kernel.interrupts.injected_ref == kernel.scheduler.current;
|
||||
if injected_here
|
||||
&& let Some(saved) = kernel.interrupts.saved.take()
|
||||
{
|
||||
@@ -2509,17 +2582,64 @@ fn worker_prologue(
|
||||
kernel.interrupts.delivered += 1;
|
||||
let source = saved.source;
|
||||
let mut restore_outcome = "ready";
|
||||
let current = kernel.scheduler.thread(target_ref).state.clone();
|
||||
if let HwState::ServicingIrq(reason) = current {
|
||||
kernel.scheduler.thread_mut(target_ref).state =
|
||||
HwState::Blocked(reason);
|
||||
restore_outcome = "reblocked";
|
||||
|
||||
// iterate-4A: the dedicated audio worker's canonical resting
|
||||
// state is "parked on its synthetic handle, awaiting the next
|
||||
// callback injection". The callback (`sub_824D29F0`) runs real
|
||||
// guest code that can be flipped `ServicingIrq -> Ready` by an
|
||||
// intervening `wake_ref` (a `KeSetEvent`/timeout targeting the
|
||||
// worker as a waiter mid-callback). The old re-block heuristic
|
||||
// only re-parked when the state was *still* `ServicingIrq`, so
|
||||
// such a wake left the worker `Ready` — it then ran its thread
|
||||
// entry to the `LR_HALT` sentinel, EXITED, and every subsequent
|
||||
// callback dropped (`find_by_handle` skips Exited workers),
|
||||
// wedging the intro-video audio→XMA pipeline. When this restore
|
||||
// is an audio callback (`source == INTERRUPT_SOURCE_AUDIO`),
|
||||
// re-park the worker UNCONDITIONALLY onto its synthetic
|
||||
// park-handle so it survives to receive the next fire. (Graphics
|
||||
// restores keep the `ServicingIrq`-only re-block: a graphics
|
||||
// victim is a borrowed real thread, not a parked worker, and the
|
||||
// old behavior there must stay byte-identical.)
|
||||
if source == xenia_kernel::INTERRUPT_SOURCE_AUDIO {
|
||||
let worker_handle =
|
||||
kernel.scheduler.thread(target_ref).thread_handle;
|
||||
let index = worker_handle.and_then(|h| {
|
||||
kernel
|
||||
.xaudio
|
||||
.worker_handles
|
||||
.iter()
|
||||
.position(|wh| *wh == Some(h))
|
||||
});
|
||||
if let Some(index) = index {
|
||||
let park = xenia_kernel::xaudio::synthetic_park_handle(index);
|
||||
kernel.scheduler.thread_mut(target_ref).state =
|
||||
HwState::Blocked(BlockReason::WaitAny {
|
||||
handles: vec![park],
|
||||
deadline: None,
|
||||
});
|
||||
restore_outcome = "reparked";
|
||||
} else if let HwState::ServicingIrq(reason) =
|
||||
kernel.scheduler.thread(target_ref).state.clone()
|
||||
{
|
||||
// Fallback (handle unresolved): preserve the legacy
|
||||
// ServicingIrq-only re-block rather than leak the worker.
|
||||
kernel.scheduler.thread_mut(target_ref).state =
|
||||
HwState::Blocked(reason);
|
||||
restore_outcome = "reblocked";
|
||||
}
|
||||
} else {
|
||||
let current = kernel.scheduler.thread(target_ref).state.clone();
|
||||
if let HwState::ServicingIrq(reason) = current {
|
||||
kernel.scheduler.thread_mut(target_ref).state =
|
||||
HwState::Blocked(reason);
|
||||
restore_outcome = "reblocked";
|
||||
}
|
||||
}
|
||||
tracing::debug!(
|
||||
source,
|
||||
hw_id,
|
||||
outcome = restore_outcome,
|
||||
"graphics interrupt: callback returned"
|
||||
"interrupt: callback returned"
|
||||
);
|
||||
return PrologueOutcome::Continue;
|
||||
}
|
||||
@@ -2905,12 +3025,55 @@ fn run_superblock(
|
||||
|
||||
let budget = superblock_budget();
|
||||
|
||||
// Probe / mem-watch / debugger-hook modes need per-block-entry
|
||||
// observability; in those modes never chain (run exactly one block,
|
||||
// identical to the pre-superblock behaviour). The block-cache fast
|
||||
// path is only entered when hooks/DB are off anyway, but a probe or
|
||||
// mem-watch can be armed alongside it.
|
||||
let chain_allowed = !kernel.any_probe_active() && !mem.has_mem_watch();
|
||||
// Heisenbug fix (toolkit audit, 2026-06-21): probes and mem-watch are
|
||||
// OBSERVE-ONLY diagnostics and must NOT change guest scheduling. The
|
||||
// previous implementation disabled superblock chaining whenever any
|
||||
// probe / mem-watch was armed (so the per-block-entry observation in
|
||||
// `worker_prologue` was reached for every block). But chaining is what
|
||||
// determines thread interleaving, so arming a probe perturbed the
|
||||
// schedule — it starved the movie/XMV subsystem so it never reached the
|
||||
// video state, making the probe useless on exactly the code we most
|
||||
// needed to observe (`XENIA_SUPERBLOCK_BUDGET=1` reproduces the same
|
||||
// starvation, confirming chaining is the lever).
|
||||
//
|
||||
// The fix fires the SAME per-block-entry observation INSIDE the chain
|
||||
// loop, at every chained block's entry PC (see `fire_block_entry_probes`
|
||||
// below), so chaining — and therefore scheduling — is byte-identical
|
||||
// whether or not a probe is armed. `chain_allowed` no longer depends on
|
||||
// the probe/mem-watch state.
|
||||
//
|
||||
// `wants_hooks()` (the interactive debugger / breakpoint path) still
|
||||
// forces the per-instruction path in `worker_prologue` and never reaches
|
||||
// `run_superblock`, so the only remaining reason to never chain here is
|
||||
// the explicit budget==1 reproduction request.
|
||||
let chain_allowed = budget > 1;
|
||||
|
||||
// Per-block-entry diagnostic observation, replicating exactly what
|
||||
// `worker_prologue` does at the first block of a slot visit:
|
||||
// 1. the four `fire_*_if_match` probe helpers (read-only; each
|
||||
// re-checks its own armed set against the live ctx PC), and
|
||||
// 2. the mem-watch writer-context publish, so a watched store that
|
||||
// fires mid-block is attributed to the CORRECT chained block's
|
||||
// entry PC / LR (matching the single-block reporting granularity)
|
||||
// instead of the stale superblock-entry PC.
|
||||
// The closure is a pure function of the live scheduler context; the
|
||||
// caller must ensure `ctx.pc` equals the block-entry PC before calling.
|
||||
let probe_hw_id = wc.hw_id;
|
||||
let fire_block_entry_probes =
|
||||
|kernel: &mut xenia_kernel::KernelState, mem: &xenia_memory::GuestMemory| {
|
||||
let hw_id = probe_hw_id;
|
||||
if kernel.any_probe_active() {
|
||||
kernel.fire_ctor_probe_if_match(hw_id, mem);
|
||||
kernel.fire_branch_probe_if_match(hw_id);
|
||||
kernel.fire_audit_pc_probe_if_match(hw_id, mem);
|
||||
kernel.fire_lr_trace_if_match(hw_id);
|
||||
}
|
||||
if mem.has_mem_watch() {
|
||||
let ctx = kernel.scheduler.ctx(hw_id);
|
||||
let tid_w = kernel.scheduler.tid(hw_id).unwrap_or(0);
|
||||
xenia_memory::set_writer_ctx(tid_w, ctx.pc, ctx.lr as u32);
|
||||
}
|
||||
};
|
||||
|
||||
let mut block_ptr = first_block_ptr;
|
||||
let mut pc_before = first_pc_before;
|
||||
@@ -2955,11 +3118,20 @@ fn run_superblock(
|
||||
break (result, block_ptr, pc_before);
|
||||
}
|
||||
|
||||
// Chain: build/fetch the next block. Re-borrows `wc.block_cache`,
|
||||
// which invalidates the previous `block_ptr` — but we've already
|
||||
// finished using it (only `sync_sensitive`/diagnostics were read,
|
||||
// above), so the raw-pointer aliasing rule is respected.
|
||||
// Chain into the next block. `ctx.pc` now equals `next_pc` (the
|
||||
// chained block's entry), so fire the per-block-entry observation
|
||||
// BEFORE stepping it — identical to what `worker_prologue` did at
|
||||
// the first block. This keeps the probe firing at EVERY armed
|
||||
// block-entry while leaving the chaining decision (and thus the
|
||||
// schedule) untouched. The first block was already observed by the
|
||||
// prologue, so we only observe the newly-chained blocks here.
|
||||
pc_before = next_pc;
|
||||
fire_block_entry_probes(kernel, mem);
|
||||
|
||||
// Build/fetch the next block. Re-borrows `wc.block_cache`, which
|
||||
// invalidates the previous `block_ptr` — but we've already finished
|
||||
// using it (only `sync_sensitive`/diagnostics were read, above), so
|
||||
// the raw-pointer aliasing rule is respected.
|
||||
block_ptr = wc.block_cache.lookup_or_build(next_pc, mem) as *const _;
|
||||
};
|
||||
|
||||
@@ -2993,6 +3165,15 @@ fn run_execution(
|
||||
let mut stats = ExecStats::default();
|
||||
let _ = quiet; // retained for future per-kind suppression
|
||||
|
||||
// APU stage 3 — give the XMA decoder a stable pointer to the guest memory
|
||||
// mapping `run_execution` runs against, so the kick MMIO write can run
|
||||
// Work() synchronously (canary `!use_dedicated_xma_thread` semantics: the
|
||||
// game observes the updated context the instant its kick store retires).
|
||||
// `mem` outlives this call for both the headless and UI paths.
|
||||
if let Ok(mut xma) = kernel.xma.lock() {
|
||||
xma.set_memory(mem);
|
||||
}
|
||||
|
||||
// `--halt-on-deadlock` CLI flag OR `XENIA_HALT_ON_DEADLOCK=1|true` env var:
|
||||
// when the scheduler next hits a hard deadlock (every live HW thread
|
||||
// blocked on a handle wait with no pending timer) we bail out with a
|
||||
@@ -4093,10 +4274,18 @@ fn dump_thread_diagnostic(
|
||||
),
|
||||
}
|
||||
}
|
||||
if quiet {
|
||||
return;
|
||||
}
|
||||
use xenia_kernel::objects::KernelObject;
|
||||
|
||||
// Toolkit-audit fix (2026-06-21): only the ALWAYS-ON thread/waiter table
|
||||
// is suppressed by `--quiet`. The explicitly-armed diagnostics below
|
||||
// (`--trace-handles`, `--trace-handles-focus`, `--dump-addr`) are
|
||||
// requested output — arming the flag IS the user asking for it — and
|
||||
// were previously swallowed by the blanket `if quiet { return; }`, which
|
||||
// made the documented headless `--quiet` invocation silently drop every
|
||||
// handle/focus/dump report. They are each self-gated below (on
|
||||
// `audit.enabled` / `!audit.focus.is_empty()` / `!dump_addrs.is_empty()`)
|
||||
// so they only print when actually armed.
|
||||
if !quiet {
|
||||
println!("\n=== Thread diagnostics ===");
|
||||
for (hw_id, slot) in kernel.scheduler.slots.iter().enumerate() {
|
||||
if slot.runqueue.is_empty() {
|
||||
@@ -4193,6 +4382,7 @@ fn dump_thread_diagnostic(
|
||||
println!(" cs={:#010x} waiters(tid)={:?}", cs_ptr, tids);
|
||||
}
|
||||
}
|
||||
} // end `if !quiet` (always-on thread/waiter table)
|
||||
|
||||
// Audit trails (only when --trace-handles flipped the flag). For each
|
||||
// tracked handle, emit a compact block: kind, creator, and the bounded
|
||||
@@ -4868,8 +5058,23 @@ fn cmd_dis(
|
||||
// pointer-validity oracle; runs over .rdata + .data.
|
||||
let function_starts: std::collections::BTreeSet<u32> =
|
||||
func_analysis.functions.keys().copied().collect();
|
||||
let vtables = xenia_analysis::vtables::analyze(
|
||||
&pe_image, base, §ions, &function_starts,
|
||||
// Anchor discovery: recover vtable bases from constructor vptr-write
|
||||
// stores so a vtable with non-function head words (null / pure-virtual /
|
||||
// unrecognised thunk slots) isn't fragmented away by the contiguity
|
||||
// heuristic. (Fixes e.g. the XMV engine vtable 0x8200a908.)
|
||||
let vptr_anchor_funcs: std::collections::BTreeMap<u32, (u32, bool)> = func_analysis
|
||||
.functions
|
||||
.iter()
|
||||
.map(|(&s, fi)| (s, (fi.end, fi.is_saverestore)))
|
||||
.collect();
|
||||
let vptr_block_boundaries: std::collections::HashSet<u32> =
|
||||
xref_result.labels.keys().copied().collect();
|
||||
let vtable_anchors = xenia_analysis::vtables::scan_vptr_write_constants(
|
||||
&pe_image, base, &vptr_anchor_funcs, §ions, &vptr_block_boundaries,
|
||||
);
|
||||
info!(vtable_anchors = vtable_anchors.len(), "vptr-write anchor scan complete");
|
||||
let vtables = xenia_analysis::vtables::analyze_with_anchors(
|
||||
&pe_image, base, §ions, &function_starts, &vtable_anchors,
|
||||
);
|
||||
let rtti_count = vtables.iter().filter(|v| v.rtti_present).count();
|
||||
info!(
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
{
|
||||
"instructions": 50000110,
|
||||
"imports": 243387,
|
||||
"instructions": 50000200,
|
||||
"imports": 189264,
|
||||
"unimpl": 0,
|
||||
"draws": 1279,
|
||||
"swaps": 260,
|
||||
"draws": 768,
|
||||
"swaps": 157,
|
||||
"unique_render_targets": 2,
|
||||
"shader_blobs_live": 6,
|
||||
"texture_cache_entries": 1
|
||||
|
||||
Reference in New Issue
Block a user