[iterate-4A] Milestone-2: XMA audio decoder + RE tooling (dispatch recorder, analyzer vtable-fix, non-perturbing probes)

Milestone-2 (intro video dat/movie/ADV.wmv) audio path + major RE tooling.

XMA AUDIO (built, working, deterministic, tested):
- APU MMIO 0x7FEA0000 + 320x64B register-mapped context array; real XMACreateContext/Release
  (xma.rs); real FFmpeg xma2 decoder XMA_CONTEXT_DATA->S16BE PCM (xma_decode.rs, xma2_codec.rs,
  ffmpeg-sys-next). Decode runs synchronously on the CPU thread (deterministic, no host thread).
- Audio-worker scheduler fix (main.rs LR_HALT restore + scheduler.rs): the XAudio render-callback
  worker was wrongly exited after ~2 deliveries; now survives -> guest drives XMA decode (70 kicks).
- XAudioSubmitRenderDriverFrame made faithful. Golden sylpheed_n50m re-baselined; tests pass.

RE TOOLING:
- Runtime indirect-dispatch recorder (dispatch_rec.rs): records (call-site->target, r3, lr);
  env-gated XENIA_DISPATCH_REC, filters XENIA_DISPATCH_REC_TARGETS/_SITES; deterministic, observe-only.
- Repaired static analyzer (vtables.rs): vtable extraction silently fragmented vtables with
  non-function head slots (missed the XMV engine vtable). Fixed via vptr-write-anchoring -> engine
  fully typed (vtables 722->1150 on rebuild).
- Fixed probe HEISENBUG (main.rs run_superblock): --audit-pc-probe-hex/--mem-watch no longer disable
  superblock chaining; probes fire inside the chain loop -> scheduling identical armed-vs-unarmed,
  movie subsystem now observable. Fixed a --quiet bug swallowing armed trace reports.

VIDEO still doesn't play (B, guest-side): the XMV engine never issues begin-playback (sub_825076F0,
vtable 0x8200a1e8 slot21) -> never primes -> 2000ms timeout. Narrowed to the ARM2 engine-setup
wrappers; no honest our-side gate-fix (masking forbidden). See HANDOFF-iterate-4A-milestone2.md for
new-machine setup (incl. the FFmpeg apt deps + sylpheed.db regeneration) and continuation pointers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-21 21:38:19 +02:00
parent acb29db444
commit 23189b95af
19 changed files with 3106 additions and 46 deletions

View File

@@ -26,6 +26,14 @@ use xenia_xex::pe::PeSection;
use crate::demangle;
/// Maximum number of consecutive non-function slots tolerated inside an
/// anchor-recovered vtable before the run is considered terminated. MSVC
/// vtables can carry null / pure-virtual / unrecognised-thunk slots in their
/// head or interior; a small budget lets those through without merging two
/// physically-adjacent vtables. Kept small to avoid bridging the gap between
/// distinct tables.
const MAX_ANCHOR_GAP: usize = 2;
/// One detected vtable.
#[derive(Debug, Clone)]
pub struct Vtable {
@@ -56,6 +64,35 @@ pub fn analyze(
image_base: u32,
sections: &[PeSection],
function_starts: &std::collections::BTreeSet<u32>,
) -> Vec<Vtable> {
analyze_with_anchors(pe, image_base, sections, function_starts, &std::collections::BTreeSet::new())
}
/// Like [`analyze`], but additionally recovers vtables whose base address is
/// known a-priori from a constructor vptr-write store (an "anchor"). The
/// contiguity heuristic in pass 1 fragments any vtable whose head region
/// contains words that don't resolve to recognised function entries (null /
/// pure-virtual / unrecognised thunk slots); those vtables are never emitted
/// and the downstream typed-dispatch resolver can't type objects of that
/// class. An anchor is a *content-independent* vtable signal — the ctor
/// literally installs `vtable_base` into `this+0` via
/// `addis/addi (or lis/ori) → stw rX, 0(rThis)` — so for every anchor not
/// already covered by a pass-1 run we synthesise a vtable starting at that
/// base, reading the fnptr-array run while *tolerating* up to
/// [`MAX_ANCHOR_GAP`] consecutive non-function slots before terminating.
///
/// `anchors` are absolute VAs of vtable bases (from
/// [`scan_vptr_write_constants`]). Existing pass-1 vtables are kept unchanged
/// (no regression): an anchor that already coincides with a detected vtable
/// base is skipped, and an anchor that lands *inside* an existing run is also
/// skipped (it's a sub-object pointer, not a fresh table).
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base)))]
pub fn analyze_with_anchors(
pe: &[u8],
image_base: u32,
sections: &[PeSection],
function_starts: &std::collections::BTreeSet<u32>,
anchors: &std::collections::BTreeSet<u32>,
) -> Vec<Vtable> {
let started = std::time::Instant::now();
// Sections we'll scan for vtable bodies.
@@ -117,6 +154,120 @@ pub fn analyze(
let _ = (va_start, va_end);
}
// --- Anchor-driven recovery (vptr-write-anchored vtables) ---
//
// Build a coverage interval set from pass-1 runs so we don't re-emit a
// table for an anchor that already lies within an extracted vtable.
let mut covered: Vec<(u32, u32)> = candidates
.iter()
.map(|v| (v.address, v.address + v.length * 4))
.collect();
covered.sort_unstable();
let is_covered = |addr: u32, covered: &[(u32, u32)]| -> bool {
covered.iter().any(|&(s, e)| addr >= s && addr < e)
};
// Section lookup for "which scan target contains this VA?"
let scan_targets_va: Vec<(u32, u32, usize, usize)> = sections
.iter()
.filter(|s| matches!(s.name.as_str(), ".rdata" | ".data"))
.map(|s| {
let va = image_base + s.virtual_address;
(
va,
va + s.virtual_size,
s.virtual_address as usize,
(s.virtual_address + s.virtual_size) as usize,
)
})
.collect();
// Cap a recovered run at the *next anchor* so two physically-adjacent
// anchored vtables don't merge. We deliberately do NOT cap at pass-1
// fragments: a fragment is a sub-run the contiguity scan carved out of a
// larger table, and the anchor legitimately re-absorbs it (subsumed
// fragments are removed afterwards).
let anchor_bases: std::collections::BTreeSet<u32> = anchors.iter().copied().collect();
let mut recovered = 0usize;
let mut newly: Vec<Vtable> = Vec::new();
for &anchor in anchors {
if is_covered(anchor, &covered) { continue; }
// Locate the containing .rdata/.data section.
let Some(&(va_lo, va_hi, raw_lo, raw_hi)) =
scan_targets_va.iter().find(|&&(lo, hi, _, _)| anchor >= lo && anchor < hi)
else { continue };
if anchor % 4 != 0 { continue; }
let raw_hi = raw_hi.min(pe.len());
// Read the fnptr-array run starting at the anchor. Tolerate small
// gaps of non-function slots (null / pure-virtual / unrecognised),
// but require the run to actually contain at least one real function
// (otherwise it's just data, not a vtable).
let next_base = anchor_bases.range((anchor + 4)..).next().copied();
let mut methods: Vec<u32> = Vec::new();
let mut gap = 0usize;
let mut real_fns = 0usize;
let mut off = (anchor - va_lo) as usize + raw_lo;
let mut va = anchor;
while off + 4 <= raw_hi && va < va_hi {
if let Some(nb) = next_base && va >= nb { break; }
let val = u32::from_be_bytes([pe[off], pe[off + 1], pe[off + 2], pe[off + 3]]);
if function_starts.contains(&val) {
methods.push(val);
real_fns += 1;
gap = 0;
} else {
// A non-function slot. Keep the slot (so downstream slot
// indexing stays aligned) but count toward the gap budget.
gap += 1;
if gap > MAX_ANCHOR_GAP {
// Drop the trailing gap slots — they belong past the
// table's end.
methods.truncate(methods.len().saturating_sub(gap - 1));
break;
}
methods.push(val);
}
off += 4;
va += 4;
}
// Trim any trailing non-function slots (the table ends at its last
// real method).
while methods.last().is_some_and(|&m| !function_starts.contains(&m)) {
methods.pop();
}
if real_fns == 0 || methods.is_empty() { continue; }
let length = methods.len() as u32;
newly.push(Vtable {
address: anchor,
length,
col_address: None,
class_name: synth_anon_name(&methods),
rtti_present: false,
base_classes_json: None,
methods,
});
recovered += 1;
}
if recovered > 0 {
// Drop pass-1 fragments fully subsumed by a recovered (anchored)
// vtable — the anchor base is authoritative and the fragment was a
// contiguity-scan artifact of the same table. Keep fragments that
// only partially overlap (defensive; shouldn't happen for true
// sub-runs) so we never lose method coverage.
let recovered_spans: Vec<(u32, u32)> =
newly.iter().map(|v| (v.address, v.address + v.length * 4)).collect();
candidates.retain(|v| {
!recovered_spans
.iter()
.any(|&(s, e)| v.address >= s && v.address + v.length * 4 <= e)
});
candidates.extend(newly);
tracing::info!(recovered, "vtables recovered from vptr-write anchors");
}
let _ = &covered;
// RTTI walk: for each candidate, look at vtable[-1].
let pe_image_base = image_base;
for v in &mut candidates {
@@ -268,6 +419,98 @@ fn read_class_hierarchy(
serde_json::to_string(&names).ok()
}
/// Pre-pass: discover candidate vtable *bases* from constructor vptr-write
/// stores, independent of the static contiguity heuristic. A vptr install is
/// the canonical `addis/addi` (or `lis/ori`) immediate build of a constant
/// pointing into `.rdata` / `.data`, followed by `stw rX, 0(rThis)` — i.e. the
/// ctor writing the vtable pointer to `this+0`. We return the set of such
/// constants; these are fed to [`analyze_with_anchors`] so a vtable with
/// non-function head words isn't lost.
///
/// We only consider stores at displacement 0 (the primary vptr; secondary
/// MI vptrs land at non-zero offsets and are handled by the existing
/// contiguity scan / typed-dispatch resolver well enough). The register
/// tracker mirrors the lis+addi propagation used elsewhere and is reset at
/// every basic-block boundary (`block_boundaries`).
pub fn scan_vptr_write_constants(
pe: &[u8],
image_base: u32,
functions: &std::collections::BTreeMap<u32, (u32, bool)>, // start -> (end, is_saverestore)
sections: &[PeSection],
block_boundaries: &std::collections::HashSet<u32>,
) -> std::collections::BTreeSet<u32> {
// Ranges that a vtable base may legitimately live in.
let data_ranges: Vec<(u32, u32)> = sections
.iter()
.filter(|s| matches!(s.name.as_str(), ".rdata" | ".data"))
.map(|s| (image_base + s.virtual_address, image_base + s.virtual_address + s.virtual_size))
.collect();
let in_data = |a: u32| data_ranges.iter().any(|&(s, e)| a >= s && a < e);
const OP_ADDI: u32 = 14;
const OP_ADDIS: u32 = 15;
const OP_ORI: u32 = 24;
const OP_STW: u32 = 36;
const OP_X_FORM: u32 = 31;
let read = |addr: u32| -> Option<u32> {
let off = addr.wrapping_sub(image_base) as usize;
if off + 4 > pe.len() { return None; }
Some(u32::from_be_bytes([pe[off], pe[off + 1], pe[off + 2], pe[off + 3]]))
};
let mut anchors: std::collections::BTreeSet<u32> = std::collections::BTreeSet::new();
for (&fn_start, &(fn_end, is_saverestore)) in functions {
if is_saverestore { continue; }
let mut reg: [Option<u32>; 32] = [None; 32];
let mut pc = fn_start;
while pc < fn_end {
if pc != fn_start && block_boundaries.contains(&pc) {
reg = [None; 32];
}
let Some(instr) = read(pc) else { break };
let op = instr >> 26;
let rd = ((instr >> 21) & 0x1F) as usize;
let ra = ((instr >> 16) & 0x1F) as usize;
let simm = ((instr & 0xFFFF) as i16) as i32;
let uimm = instr & 0xFFFF;
match op {
OP_ADDIS if ra == 0 => reg[rd] = Some(uimm << 16),
OP_ADDIS => reg[rd] = reg[ra].map(|b| b.wrapping_add(uimm << 16)),
OP_ADDI if ra != 0 => reg[rd] = reg[ra].map(|b| b.wrapping_add(simm as u32)),
OP_ADDI => reg[rd] = Some(simm as u32),
OP_ORI => {
let rs = rd;
reg[ra] = reg[rs].map(|b| b | uimm);
}
OP_STW => {
// `stw rS, off(rA)` with displacement 0 = primary vptr install.
if ra != 0
&& simm == 0
&& let Some(val) = reg[rd]
&& in_data(val)
{
anchors.insert(val);
}
}
32..=35 | 40..=43 | 48..=51 => reg[rd] = None,
OP_X_FORM => {
let xo = (instr >> 1) & 0x3FF;
if xo != 444 && xo != 467 { reg[rd] = None; } // keep `or`(444=mr)/`mtspr`-ish
}
18 | 16 => {
if (instr & 1) != 0 {
for r in 0..=12 { reg[r] = None; }
}
}
_ => {}
}
pc = pc.wrapping_add(4);
}
}
anchors
}
/// Synthetic name for an RTTI-stripped vtable, derived from a stable hash of
/// the sorted method-PC list. Two vtables with identical method ordering
/// collapse to the same anonymous name.
@@ -385,6 +628,112 @@ mod tests {
assert!(!vtables[0].rtti_present);
}
#[test]
fn anchor_recovers_vtable_with_nonfn_head() {
// A vtable whose head has a null + an unrecognised word, so the
// contiguity scan (≥3 contiguous known fns) fragments it. The anchor
// (from a ctor vptr-write) must recover the whole table from its base.
let image_base = 0x82000000u32;
let rdata_va = 0x1000u32;
let text_va = 0x2000u32;
let rdata_size = 0x40u32;
let text_size = 0x100u32;
let total = (text_va + text_size) as usize;
let mut pe = vec![0u8; total];
let f0 = image_base + text_va;
let f1 = image_base + text_va + 0x10;
let f2 = image_base + text_va + 0x20;
// Slots: [null, NONFN(0xDEAD), f0, f1, f2]
let slots: [u32; 5] = [0, 0xDEADBEEF, f0, f1, f2];
for (i, val) in slots.iter().enumerate() {
pe[rdata_va as usize + i * 4..rdata_va as usize + (i + 1) * 4]
.copy_from_slice(&val.to_be_bytes());
}
let sections = vec![
PeSection {
name: ".rdata".into(),
virtual_address: rdata_va,
virtual_size: rdata_size,
raw_offset: rdata_va,
raw_size: rdata_size,
flags: 0x4000_0040,
},
PeSection {
name: ".text".into(),
virtual_address: text_va,
virtual_size: text_size,
raw_offset: text_va,
raw_size: text_size,
flags: 0x6000_0020,
},
];
let mut function_starts = std::collections::BTreeSet::new();
for &pc in &[f0, f1, f2] { function_starts.insert(pc); }
// Without an anchor: the head gap (null + nonfn = 2 slots) means the
// contiguous run is only [f0,f1,f2]=3 starting at +0x08, so pass-1
// still finds it but at the WRONG base (0x...1008), not the true base.
let no_anchor = analyze(&pe, image_base, &sections, &function_starts);
assert!(
!no_anchor.iter().any(|v| v.address == image_base + rdata_va),
"without anchor the table is not recovered at its true base"
);
// With the anchor at the true base:
let mut anchors = std::collections::BTreeSet::new();
anchors.insert(image_base + rdata_va);
let with_anchor =
analyze_with_anchors(&pe, image_base, &sections, &function_starts, &anchors);
let v = with_anchor
.iter()
.find(|v| v.address == image_base + rdata_va)
.expect("anchor must recover vtable at its true base");
// length spans through f2 (slot 4): 5 slots.
assert_eq!(v.length, 5, "table spans null/nonfn head through last fn");
assert_eq!(v.methods[2], f0);
assert_eq!(v.methods[4], f2);
}
#[test]
fn scan_vptr_write_constants_finds_ctor_store() {
// Encode a ctor: addis r11,r0,0x8201; addi r11,r11,lo; stw r11,0(r31)
// installing vtable base 0x8200A908 into this+0.
let image_base = 0x82000000u32;
let ctor = 0x82001000u32;
let mut pe = vec![0u8; 0x4000];
// Lay out a tiny .rdata at 0x...A900 so the constant lands in-range.
let vt_base = 0x8200A908u32; // 0x82010000 - 22264
let addis = (15u32 << 26) | (11 << 21) | (0 << 16) | 0x8201;
let lo = (vt_base & 0xFFFF) as i16; // -22264
let addi = (14u32 << 26) | (11 << 21) | (0 << 16) | ((lo as u16) as u32);
// addi r11,r0,lo would set r11=lo (sign-extended); we need addis+addi
// chained. Re-encode addis into r11 from r0, then addi r11,r11,lo.
let addi2 = (14u32 << 26) | (11 << 21) | (11 << 16) | ((lo as u16) as u32);
let stw = (36u32 << 26) | (11 << 21) | (31 << 16) | 0; // stw r11,0(r31)
let at = (ctor - image_base) as usize;
pe[at..at + 4].copy_from_slice(&addis.to_be_bytes());
pe[at + 4..at + 8].copy_from_slice(&addi2.to_be_bytes());
pe[at + 8..at + 12].copy_from_slice(&stw.to_be_bytes());
let _ = addi;
let sections = vec![PeSection {
name: ".rdata".into(),
virtual_address: 0xA900,
virtual_size: 0x200,
raw_offset: 0xA900,
raw_size: 0x200,
flags: 0x4000_0040,
}];
let mut funcs: std::collections::BTreeMap<u32, (u32, bool)> = std::collections::BTreeMap::new();
funcs.insert(ctor, (ctor + 0x40, false));
let anchors = scan_vptr_write_constants(
&pe, image_base, &funcs, &sections, &std::collections::HashSet::new(),
);
assert!(anchors.contains(&vt_base), "ctor vptr store must yield anchor {vt_base:#x}, got {anchors:?}");
}
#[test]
fn rejects_2_method_run() {
let image_base = 0x82000000u32;