Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:
- claude-memory/ ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
(103 files, 1.1 MB - MEMORY.md + every
project_xenia_rs_*.md from audits
addis_signext through audit-058)
- project-root/dot-claude/ <project-root>/.claude/settings.json
(Stop hook + permissions)
- project-root/ppc-manual/ <project-root>/ppc-manual/
(PowerPC reference docs, 397 files, 3.7 MB)
- project-root/run-canary.sh <project-root>/run-canary.sh
- README.md Human-readable setup checklist
- setup.sh Idempotent installer (also reclones
xenia-canary at pinned HEAD 6de80dffe)
- MANIFEST.md Per-file mapping + per-file-not-bundled
restoration recipe
Excluded from bundle (not shippable via git):
- Sylpheed ISO (7.8 GB; copyright; manual copy required)
- sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
- target/ build artifacts (rebuild on target)
- audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
- audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
- xenia-canary checkout (setup.sh reclones from
git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.8 KiB
name, description, type, originSessionId
| name | description | type | originSessionId |
|---|---|---|---|
| xenia-rs concurrency rollout — M3.1 + per-thread block-cache substrate landed (2026-04-26); M3.2–M3.8 deferred | Phaser primitive + per-HW-slot block caches landed (M3.1, M3.2a). The remaining seven substeps (per-slot Mutex<HwSlot>, KernelStateInner split, host-thread spawn, slot wakeups, IRQ routing, reservation interpreter wiring, parallel stress test) are interdependent and require focused dedicated sessions to land safely with per-step verification. Deferred work is precisely scoped below for the follow-up. | project | af90c866-579c-4506-af85-cd5a5030af85 |
What landed this session
M3.1 — Phaser primitive
crates/xenia-cpu/src/phaser.rs. Custom barrier-with-skip; 6 unit tests pass:
n_arrivers_all_advance— basic barrier semanticsskip_counts_toward_advance— skipping participants count toward advanceshutdown_wakes_arrivers— clean tear-down viaPhaser::shutdown()timeout_fires_when_peer_hangs— defensive timeout returnsPhaserOutcome::Timeoutmulti_phase_progress— 6 threads × 1000 phases, no deadlock, generation counter consistentmixed_skip_and_arrive_random— pseudo-random skip/arrive across 200 phases
Memory ordering: phase counter is Release/Acquire. Participant count under Mutex + Condvar. The skip-counts-toward-advance design lets idle slots park on their own wake mechanism without stalling the phaser.
M3.2a — Per-HW-slot block caches
crates/xenia-app/src/main.rs:1228:
let mut block_caches: [BlockCache; HW_THREAD_COUNT] =
std::array::from_fn(|_| BlockCache::new());
Dispatch site at main.rs:1651 routes through block_caches[hw_id as usize]. Bit-identical correctness in single-threaded mode (it's just 6 independent caches on one thread); eliminates cross-slot races for the eventual host-thread spawn.
Lockstep golden at -n 2M: matches.
Verification
cargo build --workspace: cleancargo test --workspace: 411 passed, 0 failed (was 405 post-M2; +6 from phaser tests)xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json(default = threaded GPU): matches- Same with
--gpu-inline,--reservations-table,--gpu-inline --reservations-table: all match
Why M3.2b–M3.8 are deferred
The remaining substeps are individually invasive and interdependent — none of them deliver observable end-to-end value without the others. Splitting them across separate sessions with focused verification is more responsible than racing through them in a single pass.
| Substep | Why it's a focused session of its own |
|---|---|
M3.2b Per-slot Mutex<HwSlot> |
The scheduler holds slots: [HwSlot; 6]; many internal accesses are &mut self patterns that don't compose with MutexGuard lifetimes. Refactor touches ~30 callsites in scheduler.rs + several external accessors that hold borrows across method boundaries. |
M3.3 Arc<Mutex<KernelState>> wrap |
Either wrap the whole struct (~98 export sigs unchanged but every callsite needs guard threading) or split into KernelStateShared + KernelStateInner (the plan's design — ~98 export sig changes mechanical but workspace-wide). Either path is a substantial single-purpose session. |
| M3.4 Spawn 6 host threads | Requires M3.2b + M3.3 as substrate. The spawn body itself is a 200–400 line replacement of the per-round portion of run_execution. |
| M3.5 Idle-slot wakeups | Requires M3.4. Adds slot_wake[6]: AtomicBool + Thread handles + unpark() calls at every KeSetEvent/KeReleaseSemaphore site. |
M3.6 IRQ via pending_local_irq |
Requires M3.4. M2.5 already wired the AtomicU8 array; M3.6 changes the producer side (T_main / GPU thread sets bits) and consumer side (T_cpu_i checks bits at quantum boundary, self-injects). |
| M3.7 Activate reservations in interpreter | Requires threading hw_id + Arc<ReservationTable> reference into the interpreter dispatch. PpcContext doesn't currently carry hw_id, and step/step_cached/step_block don't take a table. Each path needs a parameter, and there are many test callers. |
| M3.8 100× parallel stress test | Requires M3.4–M3.7. |
What's already in place from M1+M2 that M3 will use
- Page versions atomic (M1.4/M2.1):
Vec<AtomicU64>, Release/Acquire on per-page slots. - Page table atomic (M1.4):
Vec<AtomicU64>, lock-freealloc(&self). MemoryAccess::write_u32_fence/read_u32_fence(M1.8): Release/Acquire fence helpers used by EVENT_WRITE_SHD / RPTR writeback.- GPU on its own host thread (M1.4–M1.10):
Arc<Mutex<GpuDigestSnapshot>>+ parker viaArc<AtomicBool>wake_pending+unpark()from MMIO callback. ReservationTable(M2.2): banked AtomicU64, 4096 banks,(line, generation, hw_id). Stress-tested with 8 concurrent host threads. Lives atkernel.reservations: Arc<ReservationTable>. Activation flag atkernel.reservations_enabled: AtomicBool(settable via--reservations-tableorXENIA_RESERVATIONS_TABLE=1).ThreadRefgeneration packing (M2.3): 1+1+2 = 4 bytes;::new(hw_id, idx)constructor.- Atomic bump allocators (M2.4):
next_handle,next_thread_id,next_tls_index,heap_cursor,stack_cursorallAtomicU32. pending_local_irq: [AtomicU8; 6](M2.5): wired inInterruptState; M3.6 will start using bits.- Phaser primitive (M3.1):
arrive_and_wait/skip/shutdown/arrive_and_wait_timeoutAPI. - Per-HW-slot block caches (M3.2a):
[BlockCache; 6]indexed byhw_id.
Next session resumes at M3.2b
The natural ordering for the deferred work:
- M3.2b Per-slot
Mutex<HwSlot>. The scheduler internally locks per slot; external API stays method-based (no guard leakage). Verify lockstep golden bit-identical. - M3.3
Arc<Mutex<KernelState>>wrap (start coarse — single mutex around the whole struct; refine later if needed). Verify lockstep golden. - M3.4 Spawn N host threads under
--parallelflag. Each acquires the kernel lock for HLE, drops for instruction execution, syncs at the phaser. Verify sylpheed boots; halts==0. - M3.5 Slot wakeup primitives. M3.4 will park on idle; M3.5 unparks on signal.
- M3.6 IRQ routing per slot.
- M3.7 Reservation interpreter wiring. PpcContext gets
hw_idfield +Option<Arc<ReservationTable>>. - M3.8 100× sylpheed stress run.
Files of note
- crates/xenia-cpu/src/phaser.rs — phaser primitive (M3.1)
- crates/xenia-app/src/main.rs — per-thread block caches (M3.2a) at lines 1228 and 1651
- crates/xenia-cpu/src/lib.rs — re-exports
Phaser,PhaserOutcome