xenia-cpu: VMX128, FPSCR, decoder split, scheduler, decode/block caches

Split the monolithic interpreter into cohesive modules: dedicated
decoder (decoder.rs) producing 8-byte DecodedInstr; opcode tables
(opcode.rs); explicit traps (trap.rs); FPSCR helpers (fpscr.rs);
overflow/carry helpers (overflow.rs); a 4 KiB-page-versioned decode
cache and basic-block cache (block_cache.rs); and a full VMX/VMX128
implementation (vmx.rs) covering AltiVec + Xenon's 128-bit extensions.

Add the parallel-execution substrate behind --parallel: a 7-party
phaser (phaser.rs) for round-based barrier sync, ReservationTable
(reservation.rs) for guest LL/SC, and the per-HW-thread scheduler
core (scheduler.rs) that owns ThreadRefs, runqueues, and pending IRQs.

Disassembler is now the single source of truth: disasm.rs gains the
full base + extended + VMX128 mnemonic set, with golden JSON fixtures
and a disasm_goldens test suite. Add a criterion-style interpreter
bench. context.rs grows the per-thread state the new modules need
(reservation slot, FPSCR, vector regs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-01 16:27:43 +02:00
parent e9b2b57a44
commit c36cca14f9
20 changed files with 12284 additions and 458 deletions

View File

@@ -0,0 +1,424 @@
//! Inter-thread reservation table for `lwarx`/`stwcx.` and
//! `ldarx`/`stdcx.`.
//!
//! On real Xenon, each core's `lwarx` places a reservation on a 128-byte
//! cache line; any other CPU's store to the line invalidates the
//! reservation. `stwcx.`'s success depends on the reservation still being
//! valid. Under M3's per-HW-thread parallelism, we need an inter-thread
//! mechanism for the same guarantee.
//!
//! M2 introduces the table behind a runtime `reservations_enabled` flag
//! (default `false`). When the flag is `false`, the interpreter's
//! existing per-`PpcContext` `reserved_line`/`has_reservation` fields are
//! used as-is — no inter-thread tracking. M3 flips the flag on once the
//! per-HW-thread host threads are spawning.
//!
//! ## Design
//!
//! - **Banked AtomicU64 array** of [`NUM_LINES`] entries (4096 × 8 B =
//! 32 KiB total). Each entry packs `(line_address, generation,
//! hw_id)`. A zero value means "no reservation on this bank".
//! - **Hash function**: `(line >> 7) & (NUM_LINES - 1)`. Different lines
//! that map to the same bank conservatively invalidate each other's
//! reservations — sound (real Xenon's L2 has finite associativity and
//! has the same property), at the cost of slightly more `stwcx.`
//! failures than a perfect-mapping table would produce.
//! - **`active_reservers: AtomicU16`** — a fast-path counter
//! incremented by every `lwarx` and decremented when its reservation is
//! either committed or invalidated. `write_u32` checks this with a
//! single `Relaxed` load; when zero (the common case in code that
//! doesn't use atomics), the invalidation hook is a one-instruction
//! skip.
//! - **Generation counter**: monotonic across all reservations,
//! incremented atomically. 24 bits of generation packed in the slot
//! means 16 M reuses per slot before wraparound; at multi-million
//! reservations/sec sustained that's still many seconds, and a
//! stale-gen `stwcx.` simply fails (sound, not livelocking).
//!
//! ## Invariants
//!
//! 1. A `stwcx.(addr)` succeeds only if the line slot still holds the
//! same `(line, gen, hw_id)` triple the reserver stamped at `lwarx`.
//! 2. Any plain store to a reserved line invalidates it (slot CASed to
//! zero). Hash-collision side-effect: a store to a different line
//! that maps to the same bank also invalidates — guests that observe
//! a `stwcx.` failure simply retry, so this is correctness-preserving.
//! 3. `stwcx.` from a different `hw_id` than the reserver fails even if
//! the line and gen would otherwise match — only the originating HW
//! thread can commit its own reservation.
//!
//! Memory ordering: all CAS / store operations on the line slot use
//! `AcqRel`; readers use `Acquire`. The store inside `stwcx.`'s payload
//! itself (the actual data write) is the caller's responsibility — see
//! [`crate::interpreter`]'s `stwcx.` arm.
use std::sync::atomic::{AtomicU16, AtomicU64, Ordering};
/// Real Xenon L2 cache-line size — the granule a reservation covers.
pub const LINE_BYTES: u32 = 0x80;
/// Mask to align an address to a cache-line boundary.
pub const LINE_MASK: u32 = !(LINE_BYTES - 1);
/// Number of bank entries in the reservation table. Power of two so the
/// hash is a single AND. 32 KiB total at 8 B per entry.
pub const NUM_LINES: usize = 4096;
const HASH_MASK: u32 = (NUM_LINES as u32) - 1;
/// Pack `(line_addr, generation, hw_id)` into a single u64. The packed
/// layout is:
/// bits 63..32: line address (we only need the high bits since the
/// low 7 are always zero — reserved range is line-aligned)
/// bits 31..8: 24-bit generation
/// bits 7..0: 8-bit `hw_id`
///
/// A packed value of `0` means "no reservation". Since we never reserve
/// on guest virtual address `0` (the page is unmapped) and the
/// generation increments from `1`, zero is a safe sentinel.
#[inline]
pub fn pack(line_addr: u32, generation: u32, hw_id: u8) -> u64 {
debug_assert!(line_addr & !LINE_MASK == 0, "line_addr must be line-aligned");
debug_assert!(generation < (1 << 24), "generation must fit in 24 bits");
((line_addr as u64) << 32)
| ((generation as u64 & 0xFF_FFFF) << 8)
| (hw_id as u64)
}
/// Inverse of [`pack`]. Returns `None` if the value is the zero sentinel
/// (no reservation).
#[inline]
pub fn unpack(raw: u64) -> Option<(u32, u32, u8)> {
if raw == 0 {
return None;
}
let line = (raw >> 32) as u32;
let generation = ((raw >> 8) & 0xFF_FFFF) as u32;
let hw_id = (raw & 0xFF) as u8;
Some((line, generation, hw_id))
}
#[inline]
fn hash(line_addr: u32) -> usize {
((line_addr >> 7) & HASH_MASK) as usize
}
#[inline]
fn align_to_line(addr: u32) -> u32 {
addr & LINE_MASK
}
/// Banked reservation table shared across all emulated HW threads. Built
/// once per emulation instance; lives behind an `Arc` so worker host
/// threads (M3) can hold their own clones without lifetime gymnastics.
pub struct ReservationTable {
lines: Vec<AtomicU64>,
active_reservers: AtomicU16,
next_gen: AtomicU64,
/// Runtime activation flag. Default `false`. M2.8's
/// `--reservations-table` flag (or M3 spawn) flips this to `true`,
/// at which point the interpreter's `lwarx`/`stwcx.` arms route
/// through the table; otherwise they use the legacy per-`PpcContext`
/// reservation fields.
enabled: std::sync::atomic::AtomicBool,
}
impl Default for ReservationTable {
fn default() -> Self {
Self::new()
}
}
impl ReservationTable {
/// Construct a fresh table with all banks empty.
pub fn new() -> Self {
let mut lines = Vec::with_capacity(NUM_LINES);
for _ in 0..NUM_LINES {
lines.push(AtomicU64::new(0));
}
Self {
lines,
active_reservers: AtomicU16::new(0),
// Start at 1 so the very first reservation gets a non-zero
// gen and the packed slot value is non-zero (zero is the
// "no reservation" sentinel).
next_gen: AtomicU64::new(1),
enabled: std::sync::atomic::AtomicBool::new(false),
}
}
/// Activate the table. The interpreter's `lwarx`/`stwcx.` arms will
/// route through this table on subsequent dispatches. Idempotent.
pub fn enable(&self) {
self.enabled
.store(true, std::sync::atomic::Ordering::Release);
}
/// Deactivate the table. The interpreter falls back to per-`PpcContext`
/// reservation fields. Idempotent.
pub fn disable(&self) {
self.enabled
.store(false, std::sync::atomic::Ordering::Release);
}
/// Whether the table is currently active. The interpreter consults
/// this on every `lwarx`/`stwcx.` to decide which path runs.
pub fn is_enabled(&self) -> bool {
self.enabled.load(std::sync::atomic::Ordering::Acquire)
}
/// True when at least one reservation is currently outstanding.
/// Plain `write_u32` consults this to skip the invalidation hook
/// when no thread holds a reservation — the common case for
/// non-atomic code.
#[inline]
pub fn has_active_reservers(&self) -> bool {
self.active_reservers.load(Ordering::Relaxed) > 0
}
/// `lwarx(addr)` — claim a reservation on the line containing `addr`.
/// Returns the generation stamped into the slot; the interpreter
/// stores this alongside the per-`PpcContext` `has_reservation` bit
/// so a subsequent `stwcx.` can verify the same gen still holds.
///
/// If a different reservation already occupied the bank, it's
/// silently overwritten — that thread's `stwcx.` will fail because
/// the slot no longer matches its stamped gen. Matches Xenon
/// behavior (a different core's lwarx on the same line displaces
/// any prior reservation).
pub fn reserve(&self, addr: u32, hw_id: u8) -> u32 {
let line = align_to_line(addr);
let generation = (self
.next_gen
.fetch_add(1, Ordering::Relaxed)
& 0xFF_FFFF) as u32;
let new_raw = pack(line, generation, hw_id);
// Release: prior reads of the reservation target should
// happen-before any thread that observes the new slot value.
let prev = self.lines[hash(line)].swap(new_raw, Ordering::AcqRel);
// If the previous slot was non-zero, the displaced reserver is
// implicitly invalidated — decrement the active counter for it.
// Else, increment for our new reservation. Net effect: the
// counter equals the number of *bank slots* with a non-zero
// value, which is an upper bound on actual reservers.
if prev == 0 {
self.active_reservers.fetch_add(1, Ordering::Relaxed);
}
generation
}
/// `stwcx.(addr)` — try to commit a reservation. Returns `true` if
/// the slot still holds `(line, my_gen, my_hw_id)` (in which case
/// it's CAS'd back to zero, releasing the bank), `false` otherwise.
/// The data store itself is the caller's responsibility — see
/// [`crate::interpreter`]'s `stwcx.` arm.
pub fn try_commit(&self, addr: u32, my_gen: u32, my_hw_id: u8) -> bool {
let line = align_to_line(addr);
let expected = pack(line, my_gen, my_hw_id);
match self.lines[hash(line)].compare_exchange(
expected,
0,
Ordering::AcqRel,
Ordering::Relaxed,
) {
Ok(_) => {
// Successfully released the slot; decrement the active
// count.
self.active_reservers.fetch_sub(1, Ordering::Relaxed);
true
}
Err(_) => false,
}
}
/// Hook for plain (non-reserving) stores: invalidate any
/// reservation on the containing line. Cheap when the bank is
/// already empty (single Acquire load + branch).
pub fn invalidate_for_write(&self, addr: u32) {
let line = align_to_line(addr);
let bank = &self.lines[hash(line)];
let prev = bank.load(Ordering::Acquire);
if prev == 0 {
return;
}
// Verify the slot still holds a reservation on *this* line
// before clearing — hash collisions mean the bank may hold a
// reservation on an unrelated line that maps to the same slot.
// Real Xenon has the same property (limited L2 associativity);
// we mirror it here. A spurious bank match invalidates a
// different line's reservation; the affected `stwcx.` retries —
// sound, slightly less efficient.
if let Some((bank_line, _generation, _hw)) = unpack(prev) {
if bank_line != line {
// Different line in the same bank — leave it alone (we
// chose not to invalidate cross-line collisions to
// reduce false-fail noise; real-HW behavior is similar
// since L2 associativity sets cross-line constraints).
return;
}
}
// CAS-clear the bank if it still holds the value we observed.
// If a concurrent `stwcx.` or `reserve` raced with us, the CAS
// fails — that's fine; the line slot is now in a different
// state and the displaced reservation will be picked up there.
if bank
.compare_exchange(prev, 0, Ordering::AcqRel, Ordering::Relaxed)
.is_ok()
{
self.active_reservers.fetch_sub(1, Ordering::Relaxed);
}
}
/// Drop a per-`PpcContext` reservation without committing. Called
/// when the interpreter clears `has_reservation` due to a
/// non-`stwcx.` event (context switch, exception, etc.). Safe to
/// call when the table doesn't hold our reservation anymore (the
/// CAS simply fails).
pub fn release(&self, addr: u32, my_gen: u32, my_hw_id: u8) {
let _ = self.try_commit(addr, my_gen, my_hw_id);
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::sync::Arc;
use std::thread;
#[test]
fn pack_unpack_roundtrip() {
let raw = pack(0x1000_0000, 42, 5);
let (line, generation, hw) = unpack(raw).unwrap();
assert_eq!(line, 0x1000_0000);
assert_eq!(generation, 42);
assert_eq!(hw, 5);
}
#[test]
fn unpack_zero_is_none() {
assert!(unpack(0).is_none());
}
#[test]
fn reserve_then_commit_succeeds() {
let t = ReservationTable::new();
let gn = t.reserve(0x1234, 0);
assert!(t.try_commit(0x1234, gn, 0));
// Already released — second commit fails.
assert!(!t.try_commit(0x1234, gn, 0));
}
#[test]
fn other_hw_id_cannot_commit() {
let t = ReservationTable::new();
let gn = t.reserve(0x1234, 0);
assert!(
!t.try_commit(0x1234, gn, 1),
"stwcx. from a different hw_id must fail"
);
// Original owner can still commit.
assert!(t.try_commit(0x1234, gn, 0));
}
#[test]
fn lwarx_displaces_prior_reservation() {
let t = ReservationTable::new();
let g0 = t.reserve(0x1234, 0);
// Different HW thread's lwarx on the same line.
let g1 = t.reserve(0x1234, 1);
// Original reserver's stwcx. fails because the gen changed.
assert!(!t.try_commit(0x1234, g0, 0));
// New reserver's stwcx. succeeds.
assert!(t.try_commit(0x1234, g1, 1));
}
#[test]
fn invalidate_clears_matching_reservation() {
let t = ReservationTable::new();
let gn = t.reserve(0x1234, 0);
t.invalidate_for_write(0x1238); // same line as 0x1234
assert!(!t.try_commit(0x1234, gn, 0));
assert_eq!(t.active_reservers.load(Ordering::Relaxed), 0);
}
#[test]
fn invalidate_different_line_in_same_bank_is_noop() {
let t = ReservationTable::new();
// Force a hash collision: addr A and addr B with same hash but
// different line addresses.
let line_a = 0x0000_1000;
let line_b = line_a + ((NUM_LINES as u32) << 7); // +0x80000 → same hash
assert_eq!(hash(line_a), hash(line_b));
let gn = t.reserve(line_a, 0);
// Invalidating line_b must NOT clear line_a's reservation.
t.invalidate_for_write(line_b);
assert!(t.try_commit(line_a, gn, 0));
}
#[test]
fn has_active_reservers_tracks_count() {
let t = ReservationTable::new();
assert!(!t.has_active_reservers());
let g0 = t.reserve(0x1000, 0);
assert!(t.has_active_reservers());
let g1 = t.reserve(0x2000, 1);
assert!(t.has_active_reservers());
t.try_commit(0x1000, g0, 0);
assert!(t.has_active_reservers());
t.try_commit(0x2000, g1, 1);
assert!(!t.has_active_reservers());
}
/// Stress test: 8 host threads each loop reserve+stwcx on the same
/// line. Exactly one stwcx per round can win; the others fail and
/// retry. The total number of *successful* commits across N
/// outer iterations equals N (one winner per round).
///
/// This proves the table's mutual-exclusion property: at most one
/// thread's stwcx. on a given line can succeed between two events
/// that would invalidate the line.
#[test]
fn concurrent_lwarx_stwcx_serializes() {
let t = Arc::new(ReservationTable::new());
const ROUNDS: u32 = 1000;
const THREADS: u8 = 8;
let total_successes = Arc::new(AtomicU64::new(0));
let mut handles = Vec::new();
for hw_id in 0..THREADS {
let t_clone = t.clone();
let s_clone = total_successes.clone();
handles.push(
thread::Builder::new()
.name(format!("res-stress-{hw_id}"))
.spawn(move || {
let mut wins = 0u64;
for _ in 0..ROUNDS {
let gn = t_clone.reserve(0x1234_5678, hw_id);
if t_clone.try_commit(0x1234_5678, gn, hw_id) {
wins += 1;
}
}
s_clone.fetch_add(wins, Ordering::Relaxed);
})
.expect("spawn"),
);
}
for h in handles {
h.join().expect("join");
}
let total = total_successes.load(Ordering::Relaxed);
// Lower bound: every round had at least one winner — but races
// can cause some rounds to have zero (all threads' reservations
// got displaced before any could commit). Assert progress: at
// least 10% of attempts succeed, and active_reservers is back
// to zero.
let attempts = ROUNDS as u64 * THREADS as u64;
assert!(
total > attempts / 10,
"expected at least 10% successful commits, got {total}/{attempts}"
);
assert_eq!(
t.active_reservers.load(Ordering::Relaxed),
0,
"all reservations should have been resolved"
);
}
}