[iterate-4A] Milestone-2: XMA audio decoder + RE tooling (dispatch recorder, analyzer vtable-fix, non-perturbing probes)

Milestone-2 (intro video dat/movie/ADV.wmv) audio path + major RE tooling.

XMA AUDIO (built, working, deterministic, tested):
- APU MMIO 0x7FEA0000 + 320x64B register-mapped context array; real XMACreateContext/Release
  (xma.rs); real FFmpeg xma2 decoder XMA_CONTEXT_DATA->S16BE PCM (xma_decode.rs, xma2_codec.rs,
  ffmpeg-sys-next). Decode runs synchronously on the CPU thread (deterministic, no host thread).
- Audio-worker scheduler fix (main.rs LR_HALT restore + scheduler.rs): the XAudio render-callback
  worker was wrongly exited after ~2 deliveries; now survives -> guest drives XMA decode (70 kicks).
- XAudioSubmitRenderDriverFrame made faithful. Golden sylpheed_n50m re-baselined; tests pass.

RE TOOLING:
- Runtime indirect-dispatch recorder (dispatch_rec.rs): records (call-site->target, r3, lr);
  env-gated XENIA_DISPATCH_REC, filters XENIA_DISPATCH_REC_TARGETS/_SITES; deterministic, observe-only.
- Repaired static analyzer (vtables.rs): vtable extraction silently fragmented vtables with
  non-function head slots (missed the XMV engine vtable). Fixed via vptr-write-anchoring -> engine
  fully typed (vtables 722->1150 on rebuild).
- Fixed probe HEISENBUG (main.rs run_superblock): --audit-pc-probe-hex/--mem-watch no longer disable
  superblock chaining; probes fire inside the chain loop -> scheduling identical armed-vs-unarmed,
  movie subsystem now observable. Fixed a --quiet bug swallowing armed trace reports.

VIDEO still doesn't play (B, guest-side): the XMV engine never issues begin-playback (sub_825076F0,
vtable 0x8200a1e8 slot21) -> never primes -> 2000ms timeout. Narrowed to the ARM2 engine-setup
wrappers; no honest our-side gate-fix (masking forbidden). See HANDOFF-iterate-4A-milestone2.md for
new-machine setup (incl. the FFmpeg apt deps + sylpheed.db regeneration) and continuation pointers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-21 21:38:19 +02:00
parent acb29db444
commit 23189b95af
19 changed files with 3106 additions and 46 deletions

View File

@@ -6,5 +6,12 @@ license.workspace = true
[dependencies]
xenia-types = { workspace = true }
xenia-memory = { workspace = true }
tracing = { workspace = true }
thiserror = { workspace = true }
# Raw FFmpeg FFI for the XMA2 audio decoder (stage 3). The system libs are
# FFmpeg 6.1 (libavcodec 60), so we pin the matching `6.1` series. The `build`
# feature regenerates bindings via bindgen against the installed headers, so
# the FFI matches the distro FFmpeg exactly. We only need avcodec + avutil.
ffmpeg-sys-next = { version = "6.1", default-features = false, features = ["avcodec"] }

View File

@@ -1,3 +1,9 @@
pub mod xma;
pub mod xma2_codec;
pub mod xma_decode;
pub use xma::{build_mmio_region, XmaDecoder, XMA_CONTEXT_COUNT, XMA_CONTEXT_SIZE};
/// Audio processing unit stub. Logging only for now.
pub struct AudioSystem {
pub enabled: bool,

932
crates/xenia-apu/src/xma.rs Normal file
View File

@@ -0,0 +1,932 @@
//! Register-mapped XMA context system — a faithful port of xenia-canary's
//! `apu/xma_decoder.cc` context-array + MMIO machinery, MINUS the audio
//! decoder itself (stage 3).
//!
//! The guest allocates XMA contexts via `XMACreateContext` (which hands back a
//! pointer into our 320-entry context array in physical guest memory), writes
//! the 64-byte `XMA_CONTEXT_DATA` struct, then *kicks* decode by writing the
//! per-context bit into the `0x7FEA0000` register aperture. This module
//! satisfies all of that without faulting and records which contexts the guest
//! kicked; stage 3 will consume the recorded `pending` flags to actually
//! produce PCM.
//!
//! ## Byte order
//! The guest accesses the aperture byte-reversed (`stwbrx`/`lwbrx`), so the raw
//! `u32` our MMIO boundary delivers is byte-swapped relative to the logical
//! register value — exactly the situation canary handles with `xe::byte_swap`.
//! So `write_register` swaps the incoming value before decoding and the
//! register file holds host-order values; `read_register` swaps on the way out.
//! This was proven empirically: the guest's Clear writes arrive as
//! `0x01000000`/`0x02000000`/`0x04000000`, i.e. byte-reversed `1`/`2`/`4`,
//! targeting contexts 0/1/2 (which it had just allocated) — NOT 24/25/26. The
//! register-index math (`(addr & 0xFFFF) / 4`) is the same as canary's.
use std::sync::atomic::{AtomicU32, Ordering};
use std::sync::{Arc, Mutex};
use xenia_memory::access::MemoryAccess;
use xenia_memory::{GuestMemory, MmioRegion};
use crate::xma_decode::{self, ContextDecodeState, XmaContextData};
/// Size in bytes of an `XMA_CONTEXT_DATA` struct (canary `xma_context.h`).
/// Stage 1 does not decode the fields — only the stride matters.
pub const XMA_CONTEXT_SIZE: u32 = 64;
/// Number of XMA contexts the hardware exposes (canary `kContextCount`).
pub const XMA_CONTEXT_COUNT: usize = 320;
/// Register aperture base (guest physical). Canary maps the XMA decoder at
/// `0x7FEA0000` in `XmaDecoder::Setup`.
pub const APERTURE_BASE: u32 = 0x7FEA_0000;
/// Mask used by `MmioRegion::contains` so any `0x7FEAxxxx` address hits.
pub const APERTURE_MASK: u32 = 0xFFFF_0000;
/// Total aperture size in bytes (the low 16-bit register window).
pub const APERTURE_SIZE: u32 = 0x0001_0000;
// ----- Register indices (canary `XmaRegister` enum / xma_register_table.inc).
// Indices are dword indices: byte offset = index * 4.
/// `ContextArrayAddress` — physical base of the context array. byte 0x1800.
const REG_CONTEXT_ARRAY_ADDRESS: u32 = 0x600;
/// `CurrentContextIndex` — the context the HW is currently servicing. byte
/// 0x1818. Polled by the guest; we rotate it so a poll never sticks.
const REG_CURRENT_CONTEXT_INDEX: u32 = 0x606;
/// First of the 10 `ContextNKick` registers (`Context0Kick`..`Context9Kick`).
/// byte 0x1940. Each register's bit N kicks context `base*32 + N`.
const REG_CONTEXT_KICK_BASE: u32 = 0x650;
/// First of the 10 `ContextNLock` registers. byte 0x1A40.
const REG_CONTEXT_LOCK_BASE: u32 = 0x690;
/// First of the 10 `ContextNClear` registers. byte 0x1A80.
const REG_CONTEXT_CLEAR_BASE: u32 = 0x6A0;
/// Each group spans 10 registers (320 contexts / 32-per-register).
const CONTEXT_GROUP_LEN: u32 = 10;
/// Number of 32-bit words backing the register file. The highest index we
/// touch is `0x6A9`; round up generously so any in-aperture index is in range
/// (64 KB aperture / 4).
const REGISTER_FILE_WORDS: usize = 0x4000;
/// Register-mapped XMA context array. Owns the allocation bitmap, the register
/// file, and the per-context kick/enable bookkeeping that stage 3 consumes.
pub struct XmaDecoder {
/// Guest virtual address of the context array (handed back by
/// `allocate_context`).
context_array_guest_va: u32,
/// Physical address stored into `ContextArrayAddress` (reg 0x600).
context_array_phys: u32,
/// 320-slot allocation bitmap, one bit per context (`bitmap[i>>6]` bit
/// `i & 63`). A set bit means *allocated*.
bitmap: [u64; (XMA_CONTEXT_COUNT + 63) / 64],
/// Flat register file, host-native values. Indexed by dword register index.
registers: Vec<u32>,
/// Per-context "decode requested" flag, set on Kick, cleared on Clear.
/// Stage 3 drains this to produce PCM.
pending: [bool; XMA_CONTEXT_COUNT],
/// Per-context enable flag. A Lock disables; a Kick (re-)enables. Mirrors
/// canary's "is_enabled" notion loosely — exact decode semantics are
/// stage 3.
enabled: [bool; XMA_CONTEXT_COUNT],
/// Total kicks observed (diagnostic; lets headless logs show progress).
kick_count: u64,
/// Rotating value served for `CurrentContextIndex` reads so a guest poll
/// can't spin forever on a fixed value. Atomic so the read path can stay
/// `&self`.
current_context_index: AtomicU32,
/// Per-context stage-3 decode state (FFmpeg codec, staged PCM frame, ring
/// bookkeeping). Lazily populated as contexts are decoded.
decode_state: Vec<ContextDecodeState>,
/// Total PCM bytes written to guest output buffers (diagnostic).
pcm_bytes_total: u64,
/// Stable pointer to the guest memory mapping, captured at init. Used to run
/// `Work()` SYNCHRONOUSLY inside the kick MMIO write — exactly as canary's
/// default `!use_dedicated_xma_thread` path does (`context.Work()` right in
/// `WriteRegister`), so the game sees the updated context the instant its
/// kick store retires. The mapping lives for the whole run; decode is
/// deterministic and happens on the CPU thread, so this is determinism-safe.
mem_ptr: *const GuestMemory,
}
// The decoder is owned behind an `Arc<Mutex<..>>` and only ever touched from the
// CPU scheduler thread (kick MMIO writes + the per-round pump). The raw `mem_ptr`
// is a stable whole-run mapping; access is single-threaded.
unsafe impl Send for XmaDecoder {}
impl XmaDecoder {
/// Construct an un-initialized decoder. Call [`Self::init`] once the
/// context-array memory has been reserved.
pub fn new() -> Self {
Self {
context_array_guest_va: 0,
context_array_phys: 0,
bitmap: [0; (XMA_CONTEXT_COUNT + 63) / 64],
registers: vec![0; REGISTER_FILE_WORDS],
pending: [false; XMA_CONTEXT_COUNT],
enabled: [false; XMA_CONTEXT_COUNT],
kick_count: 0,
current_context_index: AtomicU32::new(0),
decode_state: (0..XMA_CONTEXT_COUNT).map(|_| ContextDecodeState::new()).collect(),
pcm_bytes_total: 0,
mem_ptr: std::ptr::null(),
}
}
/// Capture the stable guest-memory mapping so the kick MMIO path can run
/// `Work()` synchronously (canary semantics). Call once at boot, after the
/// final `mem` is in its long-lived location.
pub fn set_memory(&mut self, mem: &GuestMemory) {
self.mem_ptr = mem as *const GuestMemory;
}
/// Wire in the context-array addresses (after the app reserves the buffer)
/// and publish the physical base into `ContextArrayAddress` (reg 0x600),
/// exactly as canary's `XmaDecoder::Setup` does.
pub fn init(&mut self, context_array_guest_va: u32, context_array_phys: u32) {
self.context_array_guest_va = context_array_guest_va;
self.context_array_phys = context_array_phys;
self.registers[REG_CONTEXT_ARRAY_ADDRESS as usize] = context_array_phys;
tracing::info!(
va = format_args!("{context_array_guest_va:#010x}"),
phys = format_args!("{context_array_phys:#010x}"),
"xma: context array initialized"
);
}
/// Acquire a free context slot and return its guest pointer
/// (`context_array_guest_va + i*64`), or 0 if all 320 slots are in use.
/// Mirrors canary's `XmaDecoder::AllocateContext`.
pub fn allocate_context(&mut self) -> u32 {
for i in 0..XMA_CONTEXT_COUNT {
let word = i >> 6;
let bit = 1u64 << (i & 63);
if self.bitmap[word] & bit == 0 {
self.bitmap[word] |= bit;
let ptr = self.context_array_guest_va + (i as u32) * XMA_CONTEXT_SIZE;
tracing::info!(
index = i,
ptr = format_args!("{ptr:#010x}"),
"xma: allocate_context"
);
return ptr;
}
}
tracing::warn!("xma: allocate_context — all {} slots in use", XMA_CONTEXT_COUNT);
0
}
/// Free the slot backing `guest_ptr`. Mirrors canary's
/// `XmaDecoder::ReleaseContext`. Out-of-range / unaligned pointers are
/// ignored (the guest never faults).
pub fn release_context(&mut self, guest_ptr: u32) {
if guest_ptr < self.context_array_guest_va {
return;
}
let offset = guest_ptr - self.context_array_guest_va;
let i = (offset / XMA_CONTEXT_SIZE) as usize;
if i >= XMA_CONTEXT_COUNT {
return;
}
let word = i >> 6;
let bit = 1u64 << (i & 63);
self.bitmap[word] &= !bit;
self.pending[i] = false;
self.enabled[i] = false;
tracing::info!(index = i, ptr = format_args!("{guest_ptr:#010x}"), "xma: release_context");
}
/// Read a register. Returns the stored value, except `CurrentContextIndex`
/// (0x606) which rotates `0..XMA_CONTEXT_COUNT` per read so a polling guest
/// always sees forward progress. Out-of-range indices read 0.
pub fn read_register(&self, reg_index: u32) -> u32 {
// The guest accesses the aperture byte-reversed (`lwbrx`), so the
// register file holds host-order values and we swap on the way out —
// exactly as canary's `ReadRegister` returns `xe::byte_swap(reg)`.
let host = if reg_index == REG_CURRENT_CONTEXT_INDEX {
// Rotate mod context count on each read so a poll never sticks.
let prev = self.current_context_index.fetch_add(1, Ordering::Relaxed);
prev % XMA_CONTEXT_COUNT as u32
} else {
self.registers.get(reg_index as usize).copied().unwrap_or(0)
};
host.swap_bytes()
}
/// Write a register, then apply the side-effect of the Kick / Lock / Clear
/// register groups. Each register in a group covers 32 contexts; bit N maps
/// to `context_id = (reg_index - group_base) * 32 + N`. We iterate set bits
/// with `trailing_zeros` + clear-lowest-bit, mirroring canary's
/// `std::countr_zero` loop. The incoming value is byte-swapped first (see
/// below).
pub fn write_register(&mut self, reg_index: u32, value: u32) {
// The guest writes the aperture byte-reversed (`stwbrx`); undo it so the
// register file holds host-order values, mirroring canary's
// `WriteRegister` which does `value = xe::byte_swap(value)` first. Proven
// by the guest's Clear writes (`0x01000000` == context 0, not 24).
let value = value.swap_bytes();
if let Some(slot) = self.registers.get_mut(reg_index as usize) {
*slot = value;
}
if (REG_CONTEXT_KICK_BASE..REG_CONTEXT_KICK_BASE + CONTEXT_GROUP_LEN).contains(&reg_index) {
let base = (reg_index - REG_CONTEXT_KICK_BASE) * 32;
let mut bits = value;
while bits != 0 {
let b = bits.trailing_zeros();
bits &= bits - 1;
let context_id = (base + b) as usize;
if context_id < XMA_CONTEXT_COUNT {
self.pending[context_id] = true;
self.enabled[context_id] = true;
self.kick_count += 1;
tracing::debug!(
context_id,
kick_count = self.kick_count,
"xma: kick (decode requested)"
);
// Canary `!use_dedicated_xma_thread`: run Work() right here so
// the game observes the updated context when its kick store
// retires. Safe — `mem_ptr` is a stable whole-run mapping and
// we're on the CPU thread.
if !self.mem_ptr.is_null() {
let mem: &GuestMemory = unsafe { &*self.mem_ptr };
self.enabled[context_id] = false;
self.work_one(mem, context_id);
}
}
}
} else if (REG_CONTEXT_LOCK_BASE..REG_CONTEXT_LOCK_BASE + CONTEXT_GROUP_LEN)
.contains(&reg_index)
{
let base = (reg_index - REG_CONTEXT_LOCK_BASE) * 32;
let mut bits = value;
while bits != 0 {
let b = bits.trailing_zeros();
bits &= bits - 1;
let context_id = (base + b) as usize;
if context_id < XMA_CONTEXT_COUNT {
self.enabled[context_id] = false;
tracing::debug!(context_id, "xma: lock (context disabled)");
}
}
} else if (REG_CONTEXT_CLEAR_BASE..REG_CONTEXT_CLEAR_BASE + CONTEXT_GROUP_LEN)
.contains(&reg_index)
{
let base = (reg_index - REG_CONTEXT_CLEAR_BASE) * 32;
let mut bits = value;
while bits != 0 {
let b = bits.trailing_zeros();
bits &= bits - 1;
let context_id = (base + b) as usize;
if context_id < XMA_CONTEXT_COUNT {
self.pending[context_id] = false;
self.enabled[context_id] = false;
tracing::debug!(context_id, "xma: clear (context state reset)");
}
}
}
}
/// Total kicks observed so far (diagnostic; stage 3 will consume `pending`).
pub fn kick_count(&self) -> u64 {
self.kick_count
}
/// Whether context `i` has a pending (un-serviced) kick. Stage-3 hook.
pub fn is_pending(&self, i: usize) -> bool {
self.pending.get(i).copied().unwrap_or(false)
}
/// Total PCM bytes the decoder has written to guest output buffers.
pub fn pcm_bytes_total(&self) -> u64 {
self.pcm_bytes_total
}
/// Stage-3 entry point. Called once per scheduler round from the CPU
/// thread's per-round coordinator. For each context with a pending kick,
/// run one `Work()` pass (canary `XmaContextNew::Work`): read the context,
/// decode available input into PCM, drain into the output ring, and write
/// the decoder-owned fields back. Deterministic — no host thread, no clock.
pub fn decode_pending(&mut self, mem: &GuestMemory) {
if self.context_array_guest_va == 0 {
return;
}
for i in 0..XMA_CONTEXT_COUNT {
if !self.pending[i] || !self.enabled[i] {
continue;
}
// Canary `Work` clears is_enabled at entry; a fresh kick re-enables.
self.enabled[i] = false;
self.work_one(mem, i);
}
}
/// One `Work()` pass for context `i`. Faithful to canary's orchestration but
/// uses the mainline xma2 decoder (whole-packet driven) for the actual
/// frame decode in place of canary's per-frame `Decode()`.
fn work_one(&mut self, mem: &GuestMemory, i: usize) {
let ctx_va = self.context_array_guest_va + (i as u32) * XMA_CONTEXT_SIZE;
let data = XmaContextData::read(mem, ctx_va);
let initial = data;
if data.output_buffer_valid == 0 {
return;
}
let mut data = data;
self.decode_into_output(mem, i, ctx_va, &mut data, &initial);
}
/// Decode available input packets into PCM and drain into the output ring.
fn decode_into_output(
&mut self,
mem: &GuestMemory,
i: usize,
ctx_va: u32,
data: &mut XmaContextData,
initial: &XmaContextData,
) {
use xma_decode::*;
let output_capacity = data.output_buffer_block_count * OUTPUT_BYTES_PER_BLOCK;
if output_capacity == 0 {
return;
}
let out_backing = xma_phys_to_backing(data.output_buffer_ptr);
let mut write_off = data.output_buffer_write_offset * OUTPUT_BYTES_PER_BLOCK;
let read_off = data.output_buffer_read_offset * OUTPUT_BYTES_PER_BLOCK;
// write_count: free space in the ring from write to read.
let free_bytes = ring_write_count(read_off, write_off, output_capacity);
self.decode_state[i].remaining_subframe_blocks_in_output =
(free_bytes / OUTPUT_BYTES_PER_BLOCK) as i32;
let effective_sdc = data.subframe_decode_count.max(1);
let min_blocks = effective_sdc as i32 + data.output_buffer_padding as i32;
if min_blocks > self.decode_state[i].remaining_subframe_blocks_in_output {
// No room — write back unchanged and wait for the game to drain.
store_merged_pub(mem, ctx_va, data, initial);
return;
}
let mut produced_any = false;
// Ensure codec configured for current rate/channels.
let rate = sample_rate_hz(data.sample_rate);
let channels = if data.is_stereo != 0 { 2 } else { 1 };
self.ensure_codec(i, rate, channels);
// Main decode loop: while there's output ring room and valid input.
loop {
if self.decode_state[i].remaining_subframe_blocks_in_output < min_blocks {
break;
}
// If we still have undrained subframes from a prior decode, consume
// them first (canary Consume before next Decode).
if self.decode_state[i].current_frame_remaining_subframes == 0 {
// Need a fresh decoded frame. Pull from the codec, feeding input
// packets as required.
if !self.produce_frame(mem, i, data) {
break;
}
}
// Consume: write up to `effective_sdc` subframes (256B blocks) of
// the staged raw_frame into the output ring.
let total_subframes =
((BYTES_PER_FRAME_CHANNEL / OUTPUT_BYTES_PER_BLOCK) << data.is_stereo) as u8;
let remaining = self.decode_state[i].current_frame_remaining_subframes;
let to_write = remaining.min(effective_sdc as u8);
let frame_read_off = (total_subframes - remaining) as usize * OUTPUT_BYTES_PER_BLOCK as usize;
let nbytes = to_write as u32 * OUTPUT_BYTES_PER_BLOCK;
// Write into the output ring (handle wrap).
let raw = &self.decode_state[i].raw_frame;
write_off = ring_write(
mem,
out_backing,
output_capacity,
write_off,
&raw[frame_read_off..frame_read_off + nbytes as usize],
);
self.pcm_bytes_total += nbytes as u64;
produced_any = true;
let headroom = if remaining - to_write == 0 {
data.output_buffer_padding as i32
} else {
0
};
self.decode_state[i].remaining_subframe_blocks_in_output -=
to_write as i32 + headroom;
self.decode_state[i].current_frame_remaining_subframes -= to_write;
}
// Writeback offsets.
data.output_buffer_write_offset = write_off / OUTPUT_BYTES_PER_BLOCK;
if self.decode_state[i].remaining_subframe_blocks_in_output == 0
&& write_off == read_off
{
data.output_buffer_valid = 0;
}
if !produced_any && !data.is_any_input_buffer_valid() {
data.output_buffer_valid = 0;
}
store_merged_pub(mem, ctx_va, data, initial);
}
/// Configure (or reconfigure) the FFmpeg xma2 codec for this context.
fn ensure_codec(&mut self, i: usize, rate: u32, channels: u32) {
let st = &mut self.decode_state[i];
if st.codec.is_some() && st.codec_rate == rate && st.codec_channels == channels {
return;
}
match crate::xma2_codec::Xma2Codec::new(rate, channels) {
Ok(c) => {
st.codec = Some(c);
st.codec_rate = rate;
st.codec_channels = channels;
tracing::info!(ctx = i, rate, channels, "xma: xma2 codec configured");
}
Err(e) => {
tracing::error!(ctx = i, rate, channels, error = %e, "xma: xma2 codec init failed");
st.codec = None;
}
}
}
/// Produce one decoded 512-sample frame into `raw_frame` (interleaved S16BE).
///
/// Input-consumption model (faithful to canary's packet/buffer contract).
///
/// The mainline xma2 decoder consumes whole 2 KB packets via `send_packet`
/// and emits frames in bursts (internal FIFO + lookahead), so its intake
/// position can't be read per-frame. We therefore keep TWO cursors:
///
/// 1. A private FFmpeg *feed* cursor (`feed_buffer`/`feed_packet_index`)
/// that hands raw packets to FFmpeg only far enough ahead to keep the
/// PCM queue stocked. This follows the same buffer ping-pong as the
/// guest but is NOT what the guest observes.
/// 2. The guest-visible `input_buffer_read_offset`, advanced by exactly
/// ONE compressed frame each time we emit a 512-sample frame to the
/// guest — via `advance_read_offset_one_frame`, a faithful port of the
/// offset arithmetic in canary's `Decode()`. This crosses packet and
/// buffer boundaries (and fires SwapInputBuffer, clearing the drained
/// buffer's valid bit) at canary's true per-frame cadence, which is
/// what the WMV demuxer polls to refill ADV.wmv.
///
/// Decoupling the two means FFmpeg's whole-packet burst framing no longer
/// freezes the guest-visible offset: the offset now tracks emitted output,
/// so the input buffer is consumed and swapped as the movie actually plays.
fn produce_frame(&mut self, mem: &GuestMemory, i: usize, data: &mut XmaContextData) -> bool {
use xma_decode::*;
let channels = if data.is_stereo != 0 { 2u32 } else { 1u32 };
let frame_bytes = (BYTES_PER_FRAME_CHANNEL * channels) as usize;
// Top up FFmpeg's internal FIFO (and our queue) just enough to satisfy
// one frame, feeding raw packets via the private feed cursor.
if self.decode_state[i].pcm_queue.len() < frame_bytes {
self.feed_codec(mem, i, data);
}
// Pop exactly one 512-sample frame from the queue into raw_frame.
if self.decode_state[i].pcm_queue.len() < frame_bytes {
return false;
}
{
let st = &mut self.decode_state[i];
st.raw_frame.iter_mut().for_each(|b| *b = 0);
for b in st.raw_frame[..frame_bytes].iter_mut() {
*b = st.pcm_queue.pop_front().unwrap();
}
st.current_frame_remaining_subframes = (4u8) << data.is_stereo;
}
// We just emitted one frame to the guest — advance its visible read
// offset by one compressed frame at canary's cadence (may swap buffer).
self.advance_read_offset_one_frame(mem, data);
true
}
/// Feed raw 2 KB packets to FFmpeg from the private feed cursor until the
/// PCM queue holds at least one frame or the codec stops accepting input.
/// The feed cursor follows the guest's `current_buffer` ping-pong but keeps
/// its own packet index (`feed_packet_index`), so feeding ahead of the
/// guest-visible read offset is fine — the offset advances separately per
/// emitted frame.
fn feed_codec(&mut self, mem: &GuestMemory, i: usize, data: &XmaContextData) {
use xma_decode::*;
let channels = if data.is_stereo != 0 { 2u32 } else { 1u32 };
let frame_bytes = (BYTES_PER_FRAME_CHANNEL * channels) as usize;
// Re-sync the feed buffer to the guest's current buffer if the guest has
// swapped past us (the buffer we were feeding was consumed).
if self.decode_state[i].feed_buffer != data.current_buffer
&& !data.is_input_buffer_valid(self.decode_state[i].feed_buffer)
{
self.decode_state[i].feed_buffer = data.current_buffer;
self.decode_state[i].feed_packet_index = 0;
}
const MAX_FEED: u32 = 8;
let mut fed = 0u32;
while self.decode_state[i].pcm_queue.len() < frame_bytes && fed < MAX_FEED {
let fb = self.decode_state[i].feed_buffer;
if !data.is_input_buffer_valid(fb) {
// Nothing to feed from this buffer; try the other if valid.
let other = fb ^ 1;
if data.is_input_buffer_valid(other) {
self.decode_state[i].feed_buffer = other;
self.decode_state[i].feed_packet_index = 0;
continue;
}
break;
}
let pkt_count = data.input_buffer_packet_count(fb);
let pidx = self.decode_state[i].feed_packet_index;
if pidx >= pkt_count {
// Exhausted this buffer's packets at the feed cursor; advance to
// the other buffer if it's valid (it was refilled), else wait.
let other = fb ^ 1;
if data.is_input_buffer_valid(other) {
self.decode_state[i].feed_buffer = other;
self.decode_state[i].feed_packet_index = 0;
continue;
}
break;
}
let backing = xma_phys_to_backing(data.input_buffer_address(fb));
let pkt_va = backing + pidx * BYTES_PER_PACKET;
let mut packet = vec![0u8; BYTES_PER_PACKET as usize];
mem.read_bytes(pkt_va, &mut packet);
let send_res = match self.decode_state[i].codec.as_mut() {
Some(codec) => codec.send_packet(&packet),
None => break,
};
match send_res {
Ok(()) => {
self.decode_state[i].feed_packet_index += 1;
fed += 1;
self.drain_codec_frames(i);
}
// Decoder full — drain what it has and stop; re-offer this same
// packet next time (don't advance the feed cursor).
Err(ref e) if e == "EAGAIN" => {
self.drain_codec_frames(i);
break;
}
Err(e) => {
tracing::warn!(ctx = i, error = %e, "xma: send_packet failed");
break;
}
}
}
}
/// Pull all currently-available decoded frames from the codec and append
/// their interleaved S16BE PCM to the context's queue.
fn drain_codec_frames(&mut self, i: usize) {
loop {
let out = match self.decode_state[i].codec.as_mut() {
Some(c) => c.receive_frame(),
None => None,
};
let Some((nb, bytes)) = out else { break };
let st = &mut self.decode_state[i];
st.frames_decoded += 1;
if !st.first_frame_logged {
st.first_frame_logged = true;
tracing::info!(
ctx = i,
samples = nb,
pcm_bytes = bytes.len(),
"xma: first PCM frame decoded"
);
}
st.pcm_queue.extend(bytes);
}
}
/// Advance `input_buffer_read_offset` by exactly ONE compressed frame,
/// faithfully mirroring the offset arithmetic in canary's
/// `XmaContextNew::Decode` (frame-size parse + packet-boundary handling +
/// SwapInputBuffer when the buffer's packets are exhausted). Called once per
/// 512-sample frame we emit to the guest, so the guest-visible read offset
/// crosses packet/buffer boundaries at canary's true cadence — independent
/// of the mainline xma2 decoder's whole-packet burst framing. This is what
/// lets `input_buffer_0_valid` toggle and the WMV demuxer refill ADV.wmv.
fn advance_read_offset_one_frame(&mut self, mem: &GuestMemory, data: &mut XmaContextData) {
use xma_decode::*;
if !data.is_any_input_buffer_valid() {
return;
}
if !data.is_current_input_buffer_valid() {
self.swap_input_buffer(data);
if !data.is_current_input_buffer_valid() {
return;
}
}
// Clamp a header-region offset (canary's Dirt-2 guard).
if data.input_buffer_read_offset < BITS_PER_PACKET_HEADER {
data.input_buffer_read_offset = BITS_PER_PACKET_HEADER;
}
let pkt_count = data.current_input_buffer_packet_count();
let input_size = pkt_count * BYTES_PER_PACKET;
let Some(packet_index) = packet_number(input_size, data.input_buffer_read_offset) else {
return;
};
let buf_backing = xma_phys_to_backing(data.current_input_buffer_address());
let pkt_va = buf_backing + packet_index * BYTES_PER_PACKET;
let mut packet = vec![0u8; BYTES_PER_PACKET as usize];
mem.read_bytes(pkt_va, &mut packet);
let first_frame_offset = packet_frame_offset(&packet);
let mut relative_offset = data.input_buffer_read_offset % BITS_PER_PACKET;
if relative_offset < first_frame_offset {
// Tail of a split frame — skip to this packet's first frame.
data.input_buffer_read_offset =
packet_index * BITS_PER_PACKET + first_frame_offset;
relative_offset = first_frame_offset;
}
let skip_count = packet_skip_count(&packet);
// Full-packet skip (0xFF): no frames begin here — advance to the next
// packet that does, swapping the buffer if exhausted.
if skip_count == 0xFF {
let next_packet_index = packet_index + 1;
let next_off =
self.next_packet_read_offset(mem, data, next_packet_index, pkt_count);
if next_packet_index >= pkt_count || next_off == BITS_PER_PACKET_HEADER {
self.swap_input_buffer(data);
}
data.input_buffer_read_offset = next_off;
return;
}
let info = get_packet_info(&packet, relative_offset);
let packet_to_skip = (skip_count as u32) + 1;
let next_packet_index = packet_index + packet_to_skip;
// Frame size: clamp to the bits remaining in the packet stream (canary
// GetAmountOfBitsToRead over the (packet_index+1)*kBitsPerPacket stream).
let stream_remaining =
((packet_index + 1) * BITS_PER_PACKET).saturating_sub(data.input_buffer_read_offset);
let frame_size = if info.current_frame_size == 0 {
// Split header we can't resolve from this packet alone; fall back to
// advancing past the rest of this packet so we don't stall.
stream_remaining
} else {
info.current_frame_size
};
let bits_to_copy = amount_of_bits_to_read(stream_remaining, frame_size);
if !info.is_last_frame_in_packet() {
let next_frame_offset =
(data.input_buffer_read_offset + bits_to_copy) % BITS_PER_PACKET;
data.input_buffer_read_offset =
packet_index * BITS_PER_PACKET + next_frame_offset;
return;
}
// Last frame in this packet: move to the next packet's first frame, or
// swap the input buffer if the packets are exhausted (canary's
// `next_packet_index >= current_input_packet_count`).
let mut next_off =
self.next_packet_read_offset(mem, data, next_packet_index, pkt_count);
if next_packet_index >= pkt_count || next_off == BITS_PER_PACKET_HEADER {
self.swap_input_buffer(data);
}
if next_off == BITS_PER_PACKET_HEADER && data.is_any_input_buffer_valid() {
// At the start of the next buffer: jump to its first frame offset.
let nb_backing = xma_phys_to_backing(data.current_input_buffer_address());
let mut hdr = [0u8; 4];
mem.read_bytes(nb_backing, &mut hdr);
let fo = packet_frame_offset(&hdr);
if fo <= MAX_FRAME_SIZE_IN_BITS {
next_off = fo;
}
}
data.input_buffer_read_offset = next_off;
}
/// Scan forward from `next_packet_index` (possibly into the *next* buffer)
/// for the next packet that begins a frame and return its bit offset, or
/// `BITS_PER_PACKET_HEADER` if none (canary `GetNextPacketReadOffset`).
fn next_packet_read_offset(
&self,
mem: &GuestMemory,
data: &XmaContextData,
next_packet_index: u32,
current_input_packet_count: u32,
) -> u32 {
use xma_decode::*;
// Resolve which buffer the packet lives in (current or the other).
let (buffer_index, mut pidx) = if next_packet_index >= current_input_packet_count {
(data.current_buffer ^ 1, next_packet_index - current_input_packet_count)
} else {
(data.current_buffer, next_packet_index)
};
if !data.is_input_buffer_valid(buffer_index) {
return BITS_PER_PACKET_HEADER;
}
let addr = data.input_buffer_address(buffer_index);
if addr == 0 {
return BITS_PER_PACKET_HEADER;
}
let pkt_count = data.input_buffer_packet_count(buffer_index);
let backing = xma_phys_to_backing(addr);
while pidx < pkt_count {
let mut hdr = [0u8; 4];
mem.read_bytes(backing + pidx * BYTES_PER_PACKET, &mut hdr);
let fo = packet_frame_offset(&hdr);
if fo <= MAX_FRAME_SIZE_IN_BITS {
return pidx * BITS_PER_PACKET + fo;
}
pidx += 1;
}
BITS_PER_PACKET_HEADER
}
fn swap_input_buffer(&mut self, data: &mut XmaContextData) {
use xma_decode::*;
tracing::debug!(
from = data.current_buffer,
to = data.current_buffer ^ 1,
"xma: SwapInputBuffer (input buffer consumed)"
);
if data.current_buffer == 0 {
data.input_buffer_0_valid = 0;
} else {
data.input_buffer_1_valid = 0;
}
data.current_buffer ^= 1;
data.input_buffer_read_offset = BITS_PER_PACKET_HEADER;
}
}
impl Default for XmaDecoder {
fn default() -> Self {
Self::new()
}
}
/// Build the [`MmioRegion`] for the XMA register aperture at `0x7FEA0000`.
/// Mirrors the GPU's `build_region`: the closures lock the shared decoder,
/// compute the dword register index, and dispatch to `read`/`write_register`.
pub fn build_mmio_region(dec: Arc<Mutex<XmaDecoder>>) -> MmioRegion {
let read_dec = dec.clone();
let write_dec = dec;
MmioRegion {
base_address: APERTURE_BASE,
mask: APERTURE_MASK,
size: APERTURE_SIZE,
read_callback: Box::new(move |addr: u32| {
let reg_index = (addr & 0xFFFF) / 4;
read_dec.lock().unwrap().read_register(reg_index)
}),
write_callback: Box::new(move |addr: u32, value: u32| {
let reg_index = (addr & 0xFFFF) / 4;
write_dec.lock().unwrap().write_register(reg_index, value);
}),
}
}
#[cfg(test)]
mod tests {
use super::*;
fn inited() -> XmaDecoder {
let mut d = XmaDecoder::new();
// Pick a plausible physical-window VA/phys pair.
d.init(0xA010_0000, 0x0010_0000);
d
}
/// The guest writes/reads the aperture byte-reversed; `wire(v)` is the raw
/// bus value the guest sends to mean host-order `v` (and what a read of a
/// host-order `v` returns). Equivalent to `lwbrx`/`stwbrx` semantics.
fn wire(v: u32) -> u32 {
v.swap_bytes()
}
/// (a) `allocate_context` hands back distinct, increasing pointers spaced by
/// the 64-byte stride, exhausts at 320, and `release_context` frees the slot.
#[test]
fn allocate_distinct_then_exhaust_then_release() {
let mut d = inited();
let first = d.allocate_context();
let second = d.allocate_context();
assert_eq!(first, 0xA010_0000);
assert_eq!(second, 0xA010_0000 + XMA_CONTEXT_SIZE);
assert!(second > first);
// Drain the remaining slots (2 already taken).
for _ in 0..(XMA_CONTEXT_COUNT - 2) {
assert_ne!(d.allocate_context(), 0);
}
// 321st allocation fails.
assert_eq!(d.allocate_context(), 0);
// Free the first slot and re-acquire it.
d.release_context(first);
assert_eq!(d.allocate_context(), first);
}
/// (b) A Kick to `Context0Kick` with host value `0b101` marks contexts 0
/// and 2. The guest sends it byte-reversed (`wire`).
#[test]
fn kick_context0_marks_correct_contexts() {
let mut d = inited();
d.write_register(REG_CONTEXT_KICK_BASE, wire(0b101));
assert!(d.is_pending(0));
assert!(!d.is_pending(1));
assert!(d.is_pending(2));
assert_eq!(d.kick_count(), 2);
}
/// (c) A Kick to `Context1Kick` (0x651) bit 0 maps to context_id 32.
#[test]
fn kick_context1_bit0_is_context_32() {
let mut d = inited();
d.write_register(REG_CONTEXT_KICK_BASE + 1, wire(0b1));
assert!(d.is_pending(32));
assert!(!d.is_pending(0));
assert_eq!(d.kick_count(), 1);
}
/// Regression for the byte-order fix: the guest's real Clear writes were
/// `0x01000000`/`0x02000000`/`0x04000000` (bytes-reversed `1`/`2`/`4`),
/// meaning contexts 0/1/2 — NOT 24/25/26. Verify the raw bus values decode
/// to the low contexts.
#[test]
fn byte_reversed_clear_targets_low_contexts() {
let mut d = inited();
for i in 0..3 {
d.write_register(REG_CONTEXT_KICK_BASE, wire(1 << i));
}
assert!(d.is_pending(0) && d.is_pending(1) && d.is_pending(2));
// The exact bus values observed from the guest.
d.write_register(REG_CONTEXT_CLEAR_BASE, 0x0100_0000);
d.write_register(REG_CONTEXT_CLEAR_BASE, 0x0200_0000);
d.write_register(REG_CONTEXT_CLEAR_BASE, 0x0400_0000);
assert!(!d.is_pending(0) && !d.is_pending(1) && !d.is_pending(2));
}
/// (d) `read_register(0x600)` returns the base byte-reversed (the guest
/// `lwbrx`-reverses it back to the host-order base on its side).
#[test]
fn context_array_address_reads_phys() {
let d = inited();
assert_eq!(
d.read_register(REG_CONTEXT_ARRAY_ADDRESS),
wire(0x0010_0000)
);
}
/// (e) `CurrentContextIndex` rotates on each read and wraps at the count
/// (values returned byte-reversed).
#[test]
fn current_context_index_rotates() {
let d = inited();
assert_eq!(d.read_register(REG_CURRENT_CONTEXT_INDEX), wire(0));
assert_eq!(d.read_register(REG_CURRENT_CONTEXT_INDEX), wire(1));
assert_eq!(d.read_register(REG_CURRENT_CONTEXT_INDEX), wire(2));
// Advance to the wrap boundary.
for _ in 3..XMA_CONTEXT_COUNT as u32 {
d.read_register(REG_CURRENT_CONTEXT_INDEX);
}
// Next read wraps back to 0.
assert_eq!(d.read_register(REG_CURRENT_CONTEXT_INDEX), wire(0));
}
/// Clear must drop a previously-kicked pending flag.
#[test]
fn clear_resets_pending() {
let mut d = inited();
d.write_register(REG_CONTEXT_KICK_BASE, wire(0b1));
assert!(d.is_pending(0));
d.write_register(REG_CONTEXT_CLEAR_BASE, wire(0b1));
assert!(!d.is_pending(0));
}
/// The MMIO region routes a guest write at `BASE + 0x600*4` to reg 0x600
/// and a read back through the same byte address, applying the byte swap.
#[test]
fn mmio_region_round_trips_register() {
let dec = Arc::new(Mutex::new(inited()));
let region = build_mmio_region(dec.clone());
let kick_byte = APERTURE_BASE + REG_CONTEXT_KICK_BASE * 4;
(region.write_callback)(kick_byte, wire(0b1));
assert!(dec.lock().unwrap().is_pending(0));
// ContextArrayAddress read-back via the bus (byte-reversed).
let addr_byte = APERTURE_BASE + REG_CONTEXT_ARRAY_ADDRESS * 4;
assert_eq!((region.read_callback)(addr_byte), wire(0x0010_0000));
}
}

View File

@@ -0,0 +1,217 @@
//! Thin unsafe wrapper around the mainline FFmpeg `AV_CODEC_ID_XMA2` decoder.
//!
//! Unlike canary's vendored `XMAFRAMES` (one frame per packet, custom padding
//! header), the distro xma2 decoder consumes whole 2 KB XMA2 packets
//! (`block_align == 2048`), needs `extradata` declaring the channel/stream
//! layout, and buffers samples internally across packets. We drive it with the
//! guest's raw 2 KB packets and pull whatever 512-sample float-planar frames it
//! emits, returning them as interleaved S16 big-endian PCM (canary `ConvertFrame`).
use std::os::raw::c_int;
use std::ptr;
use ffmpeg_sys_next as ff;
/// One xma2 decoder instance, configured for a fixed (sample_rate, channels).
pub struct Xma2Codec {
codec: *const ff::AVCodec,
ctx: *mut ff::AVCodecContext,
frame: *mut ff::AVFrame,
packet: *mut ff::AVPacket,
extradata: Vec<u8>,
channels: u32,
}
// FFmpeg objects are not Send/Sync by default; the decoder is only ever touched
// on the CPU scheduler thread (decode_pending), so this is sound for our use.
unsafe impl Send for Xma2Codec {}
impl Xma2Codec {
/// Build XMA2WAVEFORMATEX extradata (34 bytes) for a single XMA2 stream.
/// Layout (little-endian, per FFmpeg `xma_decode_init` / xma2defs.h):
/// [0..2] NumStreams (u16) = 1
/// [2..6] ChannelMask (u32) = mono/stereo mask
/// [6..34] remaining XMA2WAVEFORMATEX fields (unused by the decoder)
fn build_extradata(channels: u32) -> Vec<u8> {
let mut e = vec![0u8; 34];
// NumStreams = 1
e[0..2].copy_from_slice(&1u16.to_le_bytes());
// ChannelMask: 0x3 (FL|FR) for stereo, 0x4 (FC) for mono.
let mask: u32 = if channels >= 2 { 0x3 } else { 0x4 };
e[2..6].copy_from_slice(&mask.to_le_bytes());
e
}
pub fn new(sample_rate: u32, channels: u32) -> Result<Self, String> {
unsafe {
let codec = ff::avcodec_find_decoder(ff::AVCodecID::AV_CODEC_ID_XMA2);
if codec.is_null() {
return Err("xma2 decoder not found in libavcodec".into());
}
let ctx = ff::avcodec_alloc_context3(codec);
if ctx.is_null() {
return Err("avcodec_alloc_context3 failed".into());
}
let mut extradata = Self::build_extradata(channels);
// FFmpeg requires extradata to be allocated with av_malloc and
// padded; copy our bytes into an av_malloc'd buffer.
let pad = ff::AV_INPUT_BUFFER_PADDING_SIZE as usize;
let raw = ff::av_mallocz(extradata.len() + pad) as *mut u8;
if raw.is_null() {
ff::avcodec_free_context(&mut (ctx as *mut _));
return Err("av_mallocz extradata failed".into());
}
ptr::copy_nonoverlapping(extradata.as_ptr(), raw, extradata.len());
(*ctx).extradata = raw;
(*ctx).extradata_size = extradata.len() as c_int;
(*ctx).sample_rate = sample_rate as c_int;
(*ctx).block_align = 2048;
ff::av_channel_layout_default(&mut (*ctx).ch_layout, channels as c_int);
let ret = ff::avcodec_open2(ctx, codec, ptr::null_mut());
if ret < 0 {
let mut ctxm = ctx;
ff::avcodec_free_context(&mut ctxm);
return Err(format!("avcodec_open2 failed: {}", av_err(ret)));
}
let frame = ff::av_frame_alloc();
let packet = ff::av_packet_alloc();
if frame.is_null() || packet.is_null() {
let mut ctxm = ctx;
ff::avcodec_free_context(&mut ctxm);
return Err("av_frame_alloc/av_packet_alloc failed".into());
}
// keep our Vec alive as the source of truth for length
extradata.shrink_to_fit();
Ok(Self {
codec,
ctx,
frame,
packet,
extradata,
channels,
})
}
}
pub fn channels(&self) -> u32 {
self.channels
}
/// Feed one raw 2 KB XMA2 packet (header + data) to the decoder. Returns the
/// number of bytes the decoder accepted (0 = buffered, needs no new packet
/// yet / EAGAIN). Decoded frames are pulled via [`receive_frame`].
pub fn send_packet(&mut self, packet: &[u8]) -> Result<(), String> {
unsafe {
// av_packet_from_data takes ownership of an av_malloc buffer; simpler
// to point at our own bytes via a stack packet with a padded copy.
let pad = ff::AV_INPUT_BUFFER_PADDING_SIZE as usize;
let buf = ff::av_malloc(packet.len() + pad) as *mut u8;
if buf.is_null() {
return Err("av_malloc packet failed".into());
}
ptr::copy_nonoverlapping(packet.as_ptr(), buf, packet.len());
ptr::write_bytes(buf.add(packet.len()), 0, pad);
ff::av_packet_unref(self.packet);
// Wrap buf so FFmpeg frees it.
let ret = ff::av_packet_from_data(self.packet, buf, packet.len() as c_int);
if ret < 0 {
ff::av_free(buf as *mut _);
return Err(format!("av_packet_from_data failed: {}", av_err(ret)));
}
let ret = ff::avcodec_send_packet(self.ctx, self.packet);
if ret == ff::AVERROR(ff::EAGAIN) {
// Decoder full — caller should drain frames first then retry.
return Err("EAGAIN".into());
}
if ret < 0 {
return Err(format!("avcodec_send_packet failed: {}", av_err(ret)));
}
Ok(())
}
}
/// Signal end-of-stream so the decoder flushes its internal FIFO.
pub fn send_eof(&mut self) {
unsafe {
let _ = ff::avcodec_send_packet(self.ctx, ptr::null());
}
}
/// Pull one decoded frame as interleaved S16 big-endian PCM, or None if the
/// decoder needs more input (EAGAIN) or is drained (EOF). Returns
/// (samples_per_channel, interleaved_s16be_bytes).
pub fn receive_frame(&mut self) -> Option<(u32, Vec<u8>)> {
unsafe {
let ret = ff::avcodec_receive_frame(self.ctx, self.frame);
if ret < 0 {
return None;
}
let nb = (*self.frame).nb_samples as u32;
if nb == 0 {
return None;
}
let ch = (*self.frame).ch_layout.nb_channels.max(1) as u32;
let out = convert_frame_planar_to_s16be(self.frame, ch, nb);
Some((nb, out))
}
}
}
impl Drop for Xma2Codec {
fn drop(&mut self) {
unsafe {
if !self.frame.is_null() {
ff::av_frame_free(&mut self.frame);
}
if !self.packet.is_null() {
ff::av_packet_free(&mut self.packet);
}
if !self.ctx.is_null() {
ff::avcodec_free_context(&mut self.ctx);
}
let _ = &self.codec;
let _ = &self.extradata;
}
}
}
/// Convert FFmpeg planar-float output to interleaved S16 big-endian PCM
/// (faithful to canary `XmaContext::ConvertFrame`: saturate to [-1,1], scale by
/// 2^15-1, byte-swap each sample). `channels` planes of `nb_samples` floats.
unsafe fn convert_frame_planar_to_s16be(
frame: *mut ff::AVFrame,
channels: u32,
nb_samples: u32,
) -> Vec<u8> {
const SCALE: f32 = ((1i32 << 15) - 1) as f32;
let mut out = Vec::with_capacity((nb_samples * channels * 2) as usize);
unsafe {
// extended_data[ch] points to a plane of f32 (AV_SAMPLE_FMT_FLTP).
let ext = (*frame).extended_data;
for i in 0..nb_samples as isize {
for ch in 0..channels as isize {
let plane = *ext.offset(ch) as *const f32;
let s = if plane.is_null() { 0.0 } else { *plane.offset(i) };
let clamped = s.clamp(-1.0, 1.0) * SCALE;
let v = clamped as i16;
out.extend_from_slice(&v.to_be_bytes());
}
}
}
out
}
fn av_err(code: c_int) -> String {
unsafe {
let mut buf = [0i8; ff::AV_ERROR_MAX_STRING_SIZE as usize];
ff::av_strerror(code, buf.as_mut_ptr(), buf.len());
let cstr = std::ffi::CStr::from_ptr(buf.as_ptr());
cstr.to_string_lossy().into_owned()
}
}

View File

@@ -0,0 +1,690 @@
//! Stage 3 — the real XMA2→PCM decoder.
//!
//! A faithful port of xenia-canary's `apu/xma_context_new.cc` decode pipeline
//! (`Work`/`Decode`/`Consume`/`StoreContextMerged`), adapted to the *mainline*
//! distro FFmpeg `AV_CODEC_ID_XMA2` decoder rather than canary's vendored
//! `AV_CODEC_ID_XMAFRAMES`.
//!
//! ## Determinism
//! There is no host decoder thread. [`super::xma::XmaDecoder::decode_pending`]
//! is invoked from the CPU scheduler's per-round coordinator
//! (`coord_post_round` in xenia-app). FFmpeg decode is itself deterministic
//! (same input bytes → same PCM), so the lockstep golden stays reproducible.
//!
//! ## FFmpeg framing — why this differs from canary
//! Canary feeds FFmpeg one *frame* at a time (it bit-extracts a single 512-
//! sample frame from the guest packet stream and hands it to the vendored
//! `XMAFRAMES` codec with a custom 1-byte padding header). The mainline
//! `xma2` decoder does NOT have `XMAFRAMES`; instead it consumes whole 2 KB
//! XMA2 *packets* (`block_align == 2048`), needs `extradata` declaring the
//! stream/channel layout, and manages frame splitting + a per-stream sample
//! FIFO internally. So this module keeps canary's *guest-facing* contract
//! (the `XMA_CONTEXT_DATA` packet/frame bookkeeping, the 256-byte-block output
//! ring buffer, the field writeback) but replaces canary's per-frame
//! `Decode()` body with: feed the current 2 KB packet to the xma2 decoder,
//! pull any 512-sample PCM frames it emits, convert them to interleaved S16BE,
//! and stage them as the "raw frame" that `Consume()` drains into the output
//! ring.
//!
//! See `xma2_codec.rs` for the unsafe FFmpeg wrapper.
use std::collections::VecDeque;
use xenia_memory::access::MemoryAccess;
use xenia_memory::GuestMemory;
use crate::xma2_codec::Xma2Codec;
// ---- Constants (canary `XmaContext` / `XmaContextNew`).
pub const BYTES_PER_PACKET: u32 = 2048;
pub const BYTES_PER_PACKET_HEADER: u32 = 4;
pub const BYTES_PER_PACKET_DATA: u32 = BYTES_PER_PACKET - BYTES_PER_PACKET_HEADER;
pub const BITS_PER_PACKET: u32 = BYTES_PER_PACKET * 8;
/// Canary `kBitsPerPacketHeader` (in the *new* context) is 32.
pub const BITS_PER_PACKET_HEADER: u32 = 32;
pub const BITS_PER_FRAME_HEADER: u32 = 15;
pub const SAMPLES_PER_FRAME: u32 = 512;
pub const BYTES_PER_SAMPLE: u32 = 2;
pub const BYTES_PER_FRAME_CHANNEL: u32 = SAMPLES_PER_FRAME * BYTES_PER_SAMPLE; // 1024
pub const OUTPUT_BYTES_PER_BLOCK: u32 = 256;
pub const OUTPUT_MAX_SIZE_BYTES: u32 = 31 * OUTPUT_BYTES_PER_BLOCK;
pub const MAX_FRAME_LENGTH: u32 = 0x7FFF;
pub const MAX_FRAME_SIZE_IN_BITS: u32 = 0x4000 - BITS_PER_PACKET_HEADER;
const ID_TO_SAMPLE_RATE: [u32; 4] = [24000, 32000, 44100, 48000];
/// Project a bare-physical XMA buffer pointer (`0x0xxxxxxx`) to the host-backed
/// guest VA used by the rest of the emulator. Identical formula to
/// `xenia_gpu::physical_to_backing` for the physical window; the input/output
/// buffer pointers in the context are always in the low physical window.
#[inline]
pub fn xma_phys_to_backing(p: u32) -> u32 {
0x4000_0000 | (p & 0x1FFF_FFFF)
}
// ---- XMA_CONTEXT_DATA (canary `xma_context.h`, 64 bytes, 16 dwords).
//
// Stored big-endian in guest memory. We load all 16 dwords (BE) and unpack the
// bitfields exactly per the canary layout (bitfields pack LSB-first within each
// host-order dword). All fields below are kept as plain integers.
#[derive(Clone, Copy, Debug, Default)]
pub struct XmaContextData {
// DWORD 0
pub input_buffer_0_packet_count: u32, // :12
pub loop_count: u32, // :8
pub input_buffer_0_valid: u32, // :1
pub input_buffer_1_valid: u32, // :1
pub output_buffer_block_count: u32, // :5
pub output_buffer_write_offset: u32, // :5
// DWORD 1
pub input_buffer_1_packet_count: u32, // :12
pub loop_subframe_start: u32, // :2
pub loop_subframe_end: u32, // :3
pub loop_subframe_skip: u32, // :3
pub subframe_decode_count: u32, // :4
pub output_buffer_padding: u32, // :3
pub sample_rate: u32, // :2
pub is_stereo: u32, // :1
pub unk_dword_1_c: u32, // :1
pub output_buffer_valid: u32, // :1
// DWORD 2
pub input_buffer_read_offset: u32, // :26
pub error_status: u32, // :5
pub error_set: u32, // :1
// DWORD 3
pub loop_start: u32, // :26
pub parser_error_status: u32, // :5
pub parser_error_set: u32, // :1
// DWORD 4
pub loop_end: u32, // :26
pub packet_metadata: u32, // :5
pub current_buffer: u32, // :1
// DWORD 5..8
pub input_buffer_0_ptr: u32,
pub input_buffer_1_ptr: u32,
pub output_buffer_ptr: u32,
pub work_buffer_ptr: u32,
// DWORD 9
pub output_buffer_read_offset: u32, // :5
pub stop_when_done: u32, // :1 (bit 30)
pub interrupt_when_done: u32, // :1 (bit 31)
}
#[inline]
fn bits(v: u32, shift: u32, width: u32) -> u32 {
(v >> shift) & ((1u32 << width) - 1)
}
impl XmaContextData {
/// Read the 64-byte context struct from guest VA `ctx_va` (already a VA,
/// not a physical ptr). Each dword is read big-endian via `read_u32`.
pub fn read(mem: &GuestMemory, ctx_va: u32) -> Self {
let mut d = [0u32; 16];
for (i, w) in d.iter_mut().enumerate() {
*w = mem.read_u32(ctx_va + (i as u32) * 4);
}
let mut c = Self::default();
// DWORD 0
c.input_buffer_0_packet_count = bits(d[0], 0, 12);
c.loop_count = bits(d[0], 12, 8);
c.input_buffer_0_valid = bits(d[0], 20, 1);
c.input_buffer_1_valid = bits(d[0], 21, 1);
c.output_buffer_block_count = bits(d[0], 22, 5);
c.output_buffer_write_offset = bits(d[0], 27, 5);
// DWORD 1
c.input_buffer_1_packet_count = bits(d[1], 0, 12);
c.loop_subframe_start = bits(d[1], 12, 2);
c.loop_subframe_end = bits(d[1], 14, 3);
c.loop_subframe_skip = bits(d[1], 17, 3);
c.subframe_decode_count = bits(d[1], 20, 4);
c.output_buffer_padding = bits(d[1], 24, 3);
c.sample_rate = bits(d[1], 27, 2);
c.is_stereo = bits(d[1], 29, 1);
c.unk_dword_1_c = bits(d[1], 30, 1);
c.output_buffer_valid = bits(d[1], 31, 1);
// DWORD 2
c.input_buffer_read_offset = bits(d[2], 0, 26);
c.error_status = bits(d[2], 26, 5);
c.error_set = bits(d[2], 31, 1);
// DWORD 3
c.loop_start = bits(d[3], 0, 26);
c.parser_error_status = bits(d[3], 26, 5);
c.parser_error_set = bits(d[3], 31, 1);
// DWORD 4
c.loop_end = bits(d[4], 0, 26);
c.packet_metadata = bits(d[4], 26, 5);
c.current_buffer = bits(d[4], 31, 1);
// DWORD 5..8
c.input_buffer_0_ptr = d[5];
c.input_buffer_1_ptr = d[6];
c.output_buffer_ptr = d[7];
c.work_buffer_ptr = d[8];
// DWORD 9
c.output_buffer_read_offset = bits(d[9], 0, 5);
c.stop_when_done = bits(d[9], 30, 1);
c.interrupt_when_done = bits(d[9], 31, 1);
c
}
/// Repack the bitfields back into the 16 dwords (host order). Only the
/// decoder-owned fields differ from what was read; callers use
/// [`store_merged`] to write back without clobbering game-owned fields.
fn pack(&self) -> [u32; 16] {
let mut d = [0u32; 16];
d[0] = (self.input_buffer_0_packet_count & 0xFFF)
| ((self.loop_count & 0xFF) << 12)
| ((self.input_buffer_0_valid & 1) << 20)
| ((self.input_buffer_1_valid & 1) << 21)
| ((self.output_buffer_block_count & 0x1F) << 22)
| ((self.output_buffer_write_offset & 0x1F) << 27);
d[1] = (self.input_buffer_1_packet_count & 0xFFF)
| ((self.loop_subframe_start & 0x3) << 12)
| ((self.loop_subframe_end & 0x7) << 14)
| ((self.loop_subframe_skip & 0x7) << 17)
| ((self.subframe_decode_count & 0xF) << 20)
| ((self.output_buffer_padding & 0x7) << 24)
| ((self.sample_rate & 0x3) << 27)
| ((self.is_stereo & 1) << 29)
| ((self.unk_dword_1_c & 1) << 30)
| ((self.output_buffer_valid & 1) << 31);
d[2] = (self.input_buffer_read_offset & 0x3FF_FFFF)
| ((self.error_status & 0x1F) << 26)
| ((self.error_set & 1) << 31);
d[3] = (self.loop_start & 0x3FF_FFFF)
| ((self.parser_error_status & 0x1F) << 26)
| ((self.parser_error_set & 1) << 31);
d[4] = (self.loop_end & 0x3FF_FFFF)
| ((self.packet_metadata & 0x1F) << 26)
| ((self.current_buffer & 1) << 31);
d[5] = self.input_buffer_0_ptr;
d[6] = self.input_buffer_1_ptr;
d[7] = self.output_buffer_ptr;
d[8] = self.work_buffer_ptr;
d[9] = (self.output_buffer_read_offset & 0x1F)
| ((self.stop_when_done & 1) << 30)
| ((self.interrupt_when_done & 1) << 31);
d
}
pub fn is_input_buffer_valid(&self, idx: u32) -> bool {
if idx == 0 {
self.input_buffer_0_valid != 0
} else {
self.input_buffer_1_valid != 0
}
}
pub fn is_current_input_buffer_valid(&self) -> bool {
self.is_input_buffer_valid(self.current_buffer)
}
pub fn is_any_input_buffer_valid(&self) -> bool {
self.input_buffer_0_valid != 0 || self.input_buffer_1_valid != 0
}
pub fn input_buffer_address(&self, idx: u32) -> u32 {
if idx == 0 {
self.input_buffer_0_ptr
} else {
self.input_buffer_1_ptr
}
}
pub fn current_input_buffer_address(&self) -> u32 {
self.input_buffer_address(self.current_buffer)
}
pub fn input_buffer_packet_count(&self, idx: u32) -> u32 {
if idx == 0 {
self.input_buffer_0_packet_count
} else {
self.input_buffer_1_packet_count
}
}
pub fn current_input_buffer_packet_count(&self) -> u32 {
self.input_buffer_packet_count(self.current_buffer)
}
}
/// Merge decoder-owned fields back into guest memory (canary `StoreContextMerged`).
/// Re-reads the current context (game may have raced an update), overwrites only
/// the fields the decoder owns, and writes all 16 dwords back BE.
fn store_merged(
mem: &GuestMemory,
ctx_va: u32,
data: &XmaContextData,
initial: &XmaContextData,
) {
let mut fresh = XmaContextData::read(mem, ctx_va);
// DWORD 0
fresh.loop_count = data.loop_count;
fresh.output_buffer_write_offset = data.output_buffer_write_offset;
if initial.input_buffer_0_valid != 0 && data.input_buffer_0_valid == 0 {
fresh.input_buffer_0_valid = 0;
}
if initial.input_buffer_1_valid != 0 && data.input_buffer_1_valid == 0 {
fresh.input_buffer_1_valid = 0;
}
// DWORD 1
if initial.output_buffer_valid != 0 && data.output_buffer_valid == 0 {
fresh.output_buffer_valid = 0;
}
// DWORD 2
fresh.input_buffer_read_offset = data.input_buffer_read_offset;
fresh.error_status = data.error_status;
// DWORD 4
fresh.current_buffer = data.current_buffer;
// DWORD 9
fresh.output_buffer_read_offset = data.output_buffer_read_offset;
let d = fresh.pack();
for (i, w) in d.iter().enumerate() {
mem.write_u32(ctx_va + (i as u32) * 4, *w);
}
}
/// Public wrapper for [`store_merged`] (called from the orchestrator in xma.rs).
pub fn store_merged_pub(
mem: &GuestMemory,
ctx_va: u32,
data: &XmaContextData,
initial: &XmaContextData,
) {
store_merged(mem, ctx_va, data, initial);
}
/// Free byte count in a ring buffer from `write_off` to `read_off`
/// (canary `RingBuffer::write_count`).
pub fn ring_write_count(read_off: u32, write_off: u32, capacity: u32) -> u32 {
if read_off == write_off {
capacity
} else if write_off < read_off {
read_off - write_off
} else {
(capacity - write_off) + read_off
}
}
/// Write `bytes` into the guest ring buffer at `backing + write_off`, wrapping
/// at `capacity`. Returns the new write offset (canary `RingBuffer::Write`).
pub fn ring_write(
mem: &GuestMemory,
backing: u32,
capacity: u32,
write_off: u32,
bytes: &[u8],
) -> u32 {
let count = (bytes.len() as u32).min(capacity);
if count == 0 {
return write_off;
}
if write_off + count < capacity {
mem.write_bytes(backing + write_off, &bytes[..count as usize]);
write_off + count
} else {
let left = capacity - write_off;
mem.write_bytes(backing + write_off, &bytes[..left as usize]);
let right = count - left;
mem.write_bytes(backing, &bytes[left as usize..(left + right) as usize]);
right
}
}
// ---- BitStream (port of canary `base/bit_stream.cc`). Big-endian source.
pub struct BitStream<'a> {
buf: &'a [u8],
offset_bits: usize,
size_bits: usize,
}
impl<'a> BitStream<'a> {
pub fn new(buf: &'a [u8], size_bits: usize) -> Self {
Self { buf, offset_bits: 0, size_bits }
}
pub fn offset_bits(&self) -> usize {
self.offset_bits
}
pub fn set_offset(&mut self, off: usize) {
self.offset_bits = off.min(self.size_bits);
}
pub fn advance(&mut self, n: usize) {
self.set_offset(self.offset_bits + n);
}
pub fn bits_remaining(&self) -> usize {
self.size_bits - self.offset_bits
}
/// Peek up to 57 bits (canary contract). Reads 8 bytes BE then shifts.
pub fn peek(&self, num_bits: usize) -> u64 {
debug_assert!(num_bits <= 57);
// offset_bytes = min(offset>>3, (size-64)>>3), matching canary so an
// 8-byte load near the buffer end stays in range.
let max_byte = if self.size_bits >= 64 {
(self.size_bits - 64) >> 3
} else {
0
};
let offset_bytes = (self.offset_bits >> 3).min(max_byte);
let rel = self.offset_bits - (offset_bytes << 3);
let mut tmp = [0u8; 8];
let avail = self.buf.len().saturating_sub(offset_bytes).min(8);
tmp[..avail].copy_from_slice(&self.buf[offset_bytes..offset_bytes + avail]);
let mut value = u64::from_be_bytes(tmp);
value >>= 64 - (rel + num_bits);
value &= (1u64 << num_bits) - 1;
value
}
pub fn read(&mut self, num_bits: usize) -> u64 {
let v = self.peek(num_bits);
self.advance(num_bits);
v
}
/// Copy `num_bits` from the stream into `dest` (bit-packed, MSB-first within
/// each byte). Returns the starting bit offset within the first byte
/// (canary returns `rel_offset_bits` — the frame's intra-byte alignment).
pub fn copy(&mut self, dest: &mut [u8], num_bits: usize) -> usize {
let offset_bytes = self.offset_bits >> 3;
let rel = self.offset_bits - (offset_bytes << 3);
let mut bits_left = num_bits;
let mut out = 0usize;
if rel != 0 {
let bits = self.peek(8 - rel) as u8;
let clear_mask = !(((1u8 << rel) - 1)) as u8;
dest[out] &= clear_mask;
dest[out] |= bits;
bits_left -= 8 - rel;
self.advance(8 - rel);
out += 1;
}
if bits_left >= 8 {
let nbytes = bits_left / 8;
let src_off = (self.offset_bits >> 3).min(self.buf.len());
let copy = nbytes.min(self.buf.len().saturating_sub(src_off));
dest[out..out + copy]
.copy_from_slice(&self.buf[src_off..src_off + copy]);
out += nbytes;
self.advance(nbytes * 8);
bits_left -= nbytes * 8;
}
if bits_left != 0 {
let mut b = self.peek(bits_left) as u8;
b <<= 8 - bits_left;
let clear_mask = ((1u16 << bits_left) - 1) as u8;
dest[out] &= clear_mask;
dest[out] |= b;
self.advance(bits_left);
}
rel
}
}
// ---- XMA packet header helpers (canary `xma_helpers.h`).
#[inline]
pub fn packet_frame_count(packet: &[u8]) -> u8 {
packet[0] >> 2
}
#[inline]
pub fn packet_metadata(packet: &[u8]) -> u8 {
packet[2] & 0x7
}
#[inline]
pub fn is_packet_xma2(packet: &[u8]) -> bool {
packet_metadata(packet) == 1
}
#[inline]
pub fn packet_skip_count(packet: &[u8]) -> u8 {
packet[3]
}
/// First frame offset in bits (canary `GetPacketFrameOffset`): a 15-bit value
/// across bytes 0..2, plus the 32-bit header.
#[inline]
pub fn packet_frame_offset(packet: &[u8]) -> u32 {
let val = (((packet[0] as u32 & 0x3) << 13)
| ((packet[1] as u32) << 5)
| ((packet[2] as u32) >> 3))
& 0xFFFF;
val + 32
}
/// Sample-rate id → Hz.
pub fn sample_rate_hz(id: u32) -> u32 {
ID_TO_SAMPLE_RATE[id.min(3) as usize]
}
// ---- Packet-walk for faithful input-offset advance (canary `GetPacketInfo`,
// `GetNextPacketReadOffset`, and the offset arithmetic at the tail of
// `XmaContextNew::Decode`). These let us advance `input_buffer_read_offset` one
// *frame* at a time at canary's exact cadence — independent of the mainline
// xma2 decoder's whole-packet/burst framing — so the offset crosses packet and
// buffer boundaries (and triggers SwapInputBuffer) at the true input-drain
// rate the guest's WMV demuxer polls.
/// Info about the frame at a given bit offset within a packet (canary
/// `kPacketInfo` / `GetPacketInfo`). `frame_count_` is the number of frames
/// that begin in the packet; `current_frame_size_` is the compressed bit size
/// of the frame at `frame_offset` (0 if it can't be resolved within this
/// packet — a split header).
#[derive(Default, Clone, Copy)]
pub struct PacketInfo {
pub frame_count: u32,
pub current_frame: u32,
pub current_frame_size: u32,
}
impl PacketInfo {
pub fn is_last_frame_in_packet(&self) -> bool {
self.current_frame + 1 == self.frame_count
}
}
/// Faithful port of canary `XmaContextNew::GetPacketInfo`.
pub fn get_packet_info(packet: &[u8], frame_offset: u32) -> PacketInfo {
let mut info = PacketInfo::default();
let first_frame_offset = packet_frame_offset(packet);
let mut stream = BitStream::new(packet, BITS_PER_PACKET as usize);
stream.set_offset(first_frame_offset as usize);
// Split frame from previous packet.
if frame_offset < first_frame_offset {
info.current_frame = 0;
info.current_frame_size = first_frame_offset - frame_offset;
}
loop {
if stream.bits_remaining() < BITS_PER_FRAME_HEADER as usize {
break;
}
let frame_size = stream.peek(BITS_PER_FRAME_HEADER as usize) as u32;
if frame_size == 0 || frame_size == MAX_FRAME_LENGTH {
break;
}
if stream.offset_bits() == frame_offset as usize {
info.current_frame = info.frame_count;
info.current_frame_size = frame_size;
}
info.frame_count += 1;
if frame_size as usize > stream.bits_remaining() {
// Last frame.
break;
}
stream.advance((frame_size - 1) as usize);
// Trailing continuation bit.
if stream.read(1) == 0 {
break;
}
}
if is_packet_xma2(packet) {
let xma2_frame_count = packet_frame_count(packet) as u32;
if xma2_frame_count > info.frame_count {
if info.current_frame_size == 0 {
info.current_frame = info.frame_count;
}
info.frame_count = xma2_frame_count;
}
}
info
}
/// Packet number for a bit offset (canary `GetPacketNumber`). Returns None when
/// the offset is in the header or past the buffer.
pub fn packet_number(size_bytes: u32, bit_offset: u32) -> Option<u32> {
if bit_offset < BITS_PER_PACKET_HEADER {
return None;
}
if bit_offset >= size_bytes * 8 {
return None;
}
Some((bit_offset >> 3) / BYTES_PER_PACKET)
}
/// min(remaining_stream_bits, frame_size) (canary `GetAmountOfBitsToRead`).
pub fn amount_of_bits_to_read(remaining_stream_bits: u32, frame_size: u32) -> u32 {
remaining_stream_bits.min(frame_size)
}
// ---- Per-context decode state (lives in the XmaDecoder, one per ctx).
#[derive(Default)]
pub struct ContextDecodeState {
/// FFmpeg xma2 codec for this context (lazily created / reconfigured).
pub codec: Option<Xma2Codec>,
pub codec_rate: u32,
pub codec_channels: u32,
/// Staged interleaved S16BE PCM for the current decoded frame
/// (`raw_frame_`), drained by Consume in 256-byte blocks.
pub raw_frame: Vec<u8>,
/// Decoded interleaved S16BE PCM not yet split into per-frame `raw_frame`s.
/// The mainline xma2 decoder emits bursts of many 512-sample frames at once
/// (internal FIFO + 4096-sample lookahead); we queue the bytes here and
/// hand the guest exactly one 512-sample frame per `produce_frame`.
pub pcm_queue: VecDeque<u8>,
pub current_frame_remaining_subframes: u8,
pub remaining_subframe_blocks_in_output: i32,
/// Total 512-sample frames decoded for this context (diagnostic).
pub frames_decoded: u64,
/// Whether a "first frame" diagnostic has been emitted.
pub first_frame_logged: bool,
/// FFmpeg feed cursor: the next packet index (within the *current* input
/// buffer at feed time) we will hand to FFmpeg. This is the decoder's
/// internal intake position and is intentionally decoupled from the
/// guest-visible `input_buffer_read_offset` (which advances per *emitted*
/// frame via the faithful packet-walk). We feed ahead so FFmpeg always has
/// enough buffered input to satisfy the guest's drain, while the guest sees
/// the read offset move at canary's true per-frame cadence.
pub feed_packet_index: u32,
/// `current_buffer` the feed cursor is reading from; reset on swap so the
/// feed follows the same ping-pong as the guest-visible buffer.
pub feed_buffer: u32,
}
#[cfg(test)]
mod tests {
use super::*;
/// The bitfield unpack/pack must round-trip every decoder-relevant field at
/// the exact canary offsets (regression against a shifted bit).
#[test]
fn context_bitfields_round_trip() {
let mut c = XmaContextData::default();
c.input_buffer_0_packet_count = 632;
c.loop_count = 0;
c.input_buffer_0_valid = 1;
c.input_buffer_1_valid = 0;
c.output_buffer_block_count = 30;
c.output_buffer_write_offset = 5;
c.subframe_decode_count = 8;
c.output_buffer_padding = 1;
c.sample_rate = 3;
c.is_stereo = 1;
c.output_buffer_valid = 1;
c.input_buffer_read_offset = 16416;
c.error_status = 4;
c.current_buffer = 1;
c.input_buffer_0_ptr = 0x0b9f_d000;
c.output_buffer_ptr = 0x01f6_6e00;
c.output_buffer_read_offset = 7;
c.interrupt_when_done = 1;
// pack → words → re-read via the same word layout.
let d = c.pack();
// Simulate read() decode from the packed words.
let mut c2 = XmaContextData::default();
c2.input_buffer_0_packet_count = bits(d[0], 0, 12);
c2.input_buffer_0_valid = bits(d[0], 20, 1);
c2.output_buffer_block_count = bits(d[0], 22, 5);
c2.output_buffer_write_offset = bits(d[0], 27, 5);
c2.subframe_decode_count = bits(d[1], 20, 4);
c2.output_buffer_padding = bits(d[1], 24, 3);
c2.sample_rate = bits(d[1], 27, 2);
c2.is_stereo = bits(d[1], 29, 1);
c2.output_buffer_valid = bits(d[1], 31, 1);
c2.input_buffer_read_offset = bits(d[2], 0, 26);
c2.error_status = bits(d[2], 26, 5);
c2.current_buffer = bits(d[4], 31, 1);
c2.output_buffer_read_offset = bits(d[9], 0, 5);
c2.interrupt_when_done = bits(d[9], 31, 1);
assert_eq!(c2.input_buffer_0_packet_count, 632);
assert_eq!(c2.input_buffer_0_valid, 1);
assert_eq!(c2.output_buffer_block_count, 30);
assert_eq!(c2.output_buffer_write_offset, 5);
assert_eq!(c2.subframe_decode_count, 8);
assert_eq!(c2.output_buffer_padding, 1);
assert_eq!(c2.sample_rate, 3);
assert_eq!(c2.is_stereo, 1);
assert_eq!(c2.output_buffer_valid, 1);
assert_eq!(c2.input_buffer_read_offset, 16416);
assert_eq!(c2.error_status, 4);
assert_eq!(c2.current_buffer, 1);
assert_eq!(c2.output_buffer_read_offset, 7);
assert_eq!(c2.interrupt_when_done, 1);
}
#[test]
fn phys_to_backing_projects_physical_window() {
assert_eq!(xma_phys_to_backing(0x0b9f_d000), 0x4b9f_d000);
assert_eq!(xma_phys_to_backing(0x01f6_6e00), 0x41f6_6e00);
}
#[test]
fn ring_write_count_matches_canary() {
// empty (read==write) → full capacity.
assert_eq!(ring_write_count(0, 0, 7680), 7680);
// write ahead of read.
assert_eq!(ring_write_count(0, 256, 7680), 7680 - 256);
// write wrapped behind read.
assert_eq!(ring_write_count(512, 256, 7680), 256);
}
#[test]
fn packet_header_helpers() {
// Matches the observed first packet word 0x08000000: byte0=0x08.
let pkt = [0x08u8, 0x00, 0x00, 0x00];
assert_eq!(packet_frame_count(&pkt), 2); // 0x08>>2 = 2
// frame offset: ((0x08&3)<<13 | 0<<5 | 0x00>>3) + 32 = 32.
assert_eq!(packet_frame_offset(&pkt), 32);
// A non-zero byte2 shifts the offset: 0x08>>3 = 1 → +1.
let pkt2 = [0x08u8, 0x00, 0x08, 0x00];
assert_eq!(packet_frame_offset(&pkt2), 33);
}
}
impl ContextDecodeState {
pub fn new() -> Self {
Self {
codec: None,
codec_rate: 0,
codec_channels: 0,
raw_frame: vec![0u8; (BYTES_PER_FRAME_CHANNEL * 2) as usize],
pcm_queue: VecDeque::new(),
current_frame_remaining_subframes: 0,
remaining_subframe_blocks_in_output: 0,
frames_decoded: 0,
first_frame_logged: false,
feed_packet_index: 0,
feed_buffer: 0,
}
}
}