Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:
- claude-memory/ ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
(103 files, 1.1 MB - MEMORY.md + every
project_xenia_rs_*.md from audits
addis_signext through audit-058)
- project-root/dot-claude/ <project-root>/.claude/settings.json
(Stop hook + permissions)
- project-root/ppc-manual/ <project-root>/ppc-manual/
(PowerPC reference docs, 397 files, 3.7 MB)
- project-root/run-canary.sh <project-root>/run-canary.sh
- README.md Human-readable setup checklist
- setup.sh Idempotent installer (also reclones
xenia-canary at pinned HEAD 6de80dffe)
- MANIFEST.md Per-file mapping + per-file-not-bundled
restoration recipe
Excluded from bundle (not shippable via git):
- Sylpheed ISO (7.8 GB; copyright; manual copy required)
- sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
- target/ build artifacts (rebuild on target)
- audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
- audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
- xenia-canary checkout (setup.sh reclones from
git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.0 KiB
name, description, type, originSessionId
| name | description | type | originSessionId |
|---|---|---|---|
| xenia-rs concurrency rollout — M2 substantively complete (2026-04-26) | M2.1–M2.5 + M2.8 landed; M2.6 (KernelStateInner) + M2.7 (per-slot Mutex<HwSlot>) deferred to M3 because they only matter when host threads exist. Page versions atomic, ReservationTable built (with stress test), ThreadRef carries generation, bump allocators atomic, per-slot pending_local_irq[6] AtomicU8 wired, --reservations-table CLI flag flips runtime atomic. 405 tests pass, sylpheed -n 2M golden matches under all flag combos. | project | af90c866-579c-4506-af85-cd5a5030af85 |
What landed
M2.1 — atomic page versions + Acquire/Release for cache invalidation
Already complete from M1.4 — page_table: Vec<AtomicU64>, writes_total: AtomicU64, page_versions: Vec<AtomicU64> in crates/xenia-memory/src/heap.rs. Block cache and texture cache call mem.page_version(...) which is Acquire load on the live GuestMemory impl.
M2.2 — ReservationTable for lwarx/stwcx
New module: crates/xenia-cpu/src/reservation.rs. Banked Vec<AtomicU64> (4096 banks × 8 B = 32 KiB), (line_addr, generation, hw_id) packed per slot. Hash collisions invalidate conservatively (matches Xenon L2 associativity). Memory ordering: AcqRel on the line CAS / swap; Relaxed on the active-reserver counter.
API:
reserve(addr, hw_id) -> u32— claim a slot, returns the generation stamped.try_commit(addr, my_gen, my_hw_id) -> bool— CAS-clear the slot if it still matches.invalidate_for_write(addr)— plain-store hook to invalidate the line.has_active_reservers() -> bool— fast-path skip on writes when zero.
9 unit tests including an 8-thread stress test (concurrent_lwarx_stwcx_serializes) that proves only one stwcx can win per round. Lives behind --reservations-table flag (M2.8); the interpreter's lwarx/stwcx. arms still use the legacy per-PpcContext fields. M3 will hook the table into the interpreter when host threads spawn.
M2.3 — ThreadRef generation packing
crates/xenia-cpu/src/scheduler.rs:
pub struct ThreadRef {
pub hw_id: u8,
pub generation: u8,
pub idx: u16,
}
Total 4 bytes, no padding. 256 reuses per slot before wraparound; PRUNE_DEPTH_THRESHOLD = 4 keeps slots shallow so this is plenty. M2 leaves generation at 0 on every spawn — no concurrent swap_remove happens before M3 so ABA can't occur. M3's migration-fixup site will bump generations.
Constructors ThreadRef::new(hw_id, idx) and ThreadRef::with_generation(hw_id, idx, generation). ~30 existing literal sites adapted (several converted to ThreadRef::new(...), others got an explicit generation: 0 field).
M2.4 — bump allocators to atomics
In crates/xenia-kernel/src/state.rs:
next_handle: AtomicU32(start0x1000,fetch_add(4, Relaxed))next_thread_id: AtomicU32next_tls_index: AtomicU32heap_cursor: AtomicU32stack_cursor: AtomicU32
heap_alloc / stack_alloc use fetch_add(size, Relaxed) then verify post-bump invariants. A failed alloc near the limit leaves the cursor advanced (matches pre-M2 behavior — game-over either way). New unit test concurrent_alloc_handle_distinct (10 threads × 100 allocations → 1000 distinct handles).
M2.5 — per-slot pending_local_irq (preview)
In crates/xenia-kernel/src/interrupts.rs:
pub type PendingLocalIrq = [AtomicU8; HW_THREAD_COUNT];
pub struct InterruptState {
// ... existing fields ...
pub pending_local_irq: PendingLocalIrq,
}
Field exists, Default::default() initializes to all zeros. Unused in M2's lockstep path; M3 will set bits Release on the target slot's atomic and the target T_cpu_i will Acquire-load at quantum boundary.
M2.8 — reservation table activation flag
--reservations-table CLI flag on Exec and Check. XENIA_RESERVATIONS_TABLE=1 env var fallback. When set, kernel.reservations_enabled flipped to true (Release). Always-allocated kernel.reservations: Arc<ReservationTable> (every kernel has one; it's free until used).
Interpreter wiring is M3 work — for now the flag is observable but doesn't change lwarx/stwcx. semantics. Verified: golden matches under --reservations-table, --gpu-inline --reservations-table, etc.
Deferred to M3 (with rationale)
M2.6 — KernelStateInner + Arc<Mutex<...>>
The plan calls this "the big mechanical step" — change ~98 export signatures from &mut KernelState to &mut KernelStateInner. Deferred to M3 because:
- Under M2's single-threaded execution, the lock would never contend — the refactor delivers zero observable benefit.
- The locking discipline (lock per HLE call vs. lock per round) only becomes load-bearing once multiple host threads exist. Designing it without those callers risks designing the wrong API.
- The M3 spawn work is the natural integration point: spawning per-HW-thread workers and granting them concurrent kernel access are inseparable.
Bundled with M3 work on the next session.
M2.7 — per-slot Mutex<HwSlot> + SchedulerTopology RwLock
Same rationale: the per-slot mutex is invisible until multiple T_cpu_i exist. The lock-ordering proof from the plan (ascending hw_id, topology RwLock above per-slot Mutex) only becomes verifiable under genuine parallelism. Bundled with M3.
Verification (all green)
| Check | Result |
|---|---|
cargo build --workspace |
clean |
cargo test --workspace |
405 passed, 0 failed |
xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json (default = threaded) |
matches |
Same with --gpu-inline (rollback) |
matches |
Same with --reservations-table |
matches |
Same with --gpu-inline --reservations-table |
matches |
concurrent_lwarx_stwcx_serializes (8 threads × 1000 rounds, ReservationTable stress) |
passes |
concurrent_alloc_handle_distinct (10 threads × 100 allocs, AtomicU32 next_handle) |
passes |
write_u32_fence_publishes_prior_writes (M1.8 fence test, hardened with AtomicU32 storage) |
passes |
Tests grew from 395 (post-M1) to 405 (+10 new substep tests).
Files of note
- crates/xenia-cpu/src/reservation.rs — banked
ReservationTablewith stress tests - crates/xenia-cpu/src/scheduler.rs —
ThreadRefgen-packed,ThreadRef::newconstructor - crates/xenia-kernel/src/state.rs — atomic bump allocators,
reservations: Arc<ReservationTable>,reservations_enabled: AtomicBool - crates/xenia-kernel/src/interrupts.rs —
pending_local_irq: [AtomicU8; 6] - crates/xenia-app/src/main.rs —
--reservations-tableflag wiring, kernel construction - crates/xenia-gpu/src/handle.rs — fence test fixture rewritten to use
AtomicU32slots (was flakyCell<u8>)
Next milestone (M3)
The M3 spawn work bundles:
KernelStateInnersplit (carryover from M2.6).Arc<Mutex<KernelStateInner>>. ~98 export sigs.- Per-slot
Mutex<HwSlot>(carryover from M2.7). Lock order ascending byhw_id.SchedulerTopologyRwLock for cross-slot ops. - Phaser primitive for quantum-based barrier sync.
- 6
HwHostThreads spawned. Wakeup-on-signal viaslot_wake[6]+unpark(). - IRQ injection routed through
pending_local_irq[6]— target T_cpu_i self-injects. - Reservation table activation in
lwarx/stwcx.arms. - Sylpheed parallel boot verification + 100x stress test.
The plan's verification matrix at M3 done: golden matches under --lockstep (single-host-thread; M2 still matches). Parallel mode reaches VdSwap=2 with deadlock_halts == 0 (digest will differ from lockstep — expected and documented per the existing thread-interleaving-divergence note in the perf memory).