chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:
- claude-memory/ ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
(103 files, 1.1 MB - MEMORY.md + every
project_xenia_rs_*.md from audits
addis_signext through audit-058)
- project-root/dot-claude/ <project-root>/.claude/settings.json
(Stop hook + permissions)
- project-root/ppc-manual/ <project-root>/ppc-manual/
(PowerPC reference docs, 397 files, 3.7 MB)
- project-root/run-canary.sh <project-root>/run-canary.sh
- README.md Human-readable setup checklist
- setup.sh Idempotent installer (also reclones
xenia-canary at pinned HEAD 6de80dffe)
- MANIFEST.md Per-file mapping + per-file-not-bundled
restoration recipe
Excluded from bundle (not shippable via git):
- Sylpheed ISO (7.8 GB; copyright; manual copy required)
- sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
- target/ build artifacts (rebuild on target)
- audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
- audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
- xenia-canary checkout (setup.sh reclones from
git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,84 @@
|
||||
---
|
||||
name: xenia-rs concurrency rollout — M3.1 + per-thread block-cache substrate landed (2026-04-26); M3.2–M3.8 deferred
|
||||
description: Phaser primitive + per-HW-slot block caches landed (M3.1, M3.2a). The remaining seven substeps (per-slot Mutex<HwSlot>, KernelStateInner split, host-thread spawn, slot wakeups, IRQ routing, reservation interpreter wiring, parallel stress test) are interdependent and require focused dedicated sessions to land safely with per-step verification. Deferred work is precisely scoped below for the follow-up.
|
||||
type: project
|
||||
originSessionId: af90c866-579c-4506-af85-cd5a5030af85
|
||||
---
|
||||
## What landed this session
|
||||
|
||||
### M3.1 — Phaser primitive
|
||||
|
||||
[crates/xenia-cpu/src/phaser.rs](xenia-rs/crates/xenia-cpu/src/phaser.rs). Custom barrier-with-skip; 6 unit tests pass:
|
||||
|
||||
- `n_arrivers_all_advance` — basic barrier semantics
|
||||
- `skip_counts_toward_advance` — skipping participants count toward advance
|
||||
- `shutdown_wakes_arrivers` — clean tear-down via `Phaser::shutdown()`
|
||||
- `timeout_fires_when_peer_hangs` — defensive timeout returns `PhaserOutcome::Timeout`
|
||||
- `multi_phase_progress` — 6 threads × 1000 phases, no deadlock, generation counter consistent
|
||||
- `mixed_skip_and_arrive_random` — pseudo-random skip/arrive across 200 phases
|
||||
|
||||
Memory ordering: phase counter is `Release`/`Acquire`. Participant count under `Mutex` + `Condvar`. The skip-counts-toward-advance design lets idle slots park on their own wake mechanism without stalling the phaser.
|
||||
|
||||
### M3.2a — Per-HW-slot block caches
|
||||
|
||||
[crates/xenia-app/src/main.rs:1228](xenia-rs/crates/xenia-app/src/main.rs#L1228):
|
||||
```rust
|
||||
let mut block_caches: [BlockCache; HW_THREAD_COUNT] =
|
||||
std::array::from_fn(|_| BlockCache::new());
|
||||
```
|
||||
|
||||
Dispatch site at [main.rs:1651](xenia-rs/crates/xenia-app/src/main.rs#L1651) routes through `block_caches[hw_id as usize]`. Bit-identical correctness in single-threaded mode (it's just 6 independent caches on one thread); eliminates cross-slot races for the eventual host-thread spawn.
|
||||
|
||||
Lockstep golden at -n 2M: matches.
|
||||
|
||||
## Verification
|
||||
|
||||
- `cargo build --workspace`: clean
|
||||
- `cargo test --workspace`: 411 passed, 0 failed (was 405 post-M2; +6 from phaser tests)
|
||||
- `xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json` (default = threaded GPU): matches
|
||||
- Same with `--gpu-inline`, `--reservations-table`, `--gpu-inline --reservations-table`: all match
|
||||
|
||||
## Why M3.2b–M3.8 are deferred
|
||||
|
||||
The remaining substeps are individually invasive and **interdependent** — none of them deliver observable end-to-end value without the others. Splitting them across separate sessions with focused verification is more responsible than racing through them in a single pass.
|
||||
|
||||
| Substep | Why it's a focused session of its own |
|
||||
|---|---|
|
||||
| **M3.2b** Per-slot `Mutex<HwSlot>` | The scheduler holds `slots: [HwSlot; 6]`; many internal accesses are `&mut self` patterns that don't compose with `MutexGuard` lifetimes. Refactor touches ~30 callsites in `scheduler.rs` + several external accessors that hold borrows across method boundaries. |
|
||||
| **M3.3** `Arc<Mutex<KernelState>>` wrap | Either wrap the whole struct (~98 export sigs unchanged but every callsite needs guard threading) or split into `KernelStateShared` + `KernelStateInner` (the plan's design — ~98 export sig changes mechanical but workspace-wide). Either path is a substantial single-purpose session. |
|
||||
| **M3.4** Spawn 6 host threads | Requires M3.2b + M3.3 as substrate. The spawn body itself is a 200–400 line replacement of the per-round portion of `run_execution`. |
|
||||
| **M3.5** Idle-slot wakeups | Requires M3.4. Adds `slot_wake[6]: AtomicBool` + Thread handles + `unpark()` calls at every `KeSetEvent`/`KeReleaseSemaphore` site. |
|
||||
| **M3.6** IRQ via `pending_local_irq` | Requires M3.4. M2.5 already wired the AtomicU8 array; M3.6 changes the producer side (T_main / GPU thread sets bits) and consumer side (T_cpu_i checks bits at quantum boundary, self-injects). |
|
||||
| **M3.7** Activate reservations in interpreter | Requires threading `hw_id` + `Arc<ReservationTable>` reference into the interpreter dispatch. PpcContext doesn't currently carry `hw_id`, and `step`/`step_cached`/`step_block` don't take a table. Each path needs a parameter, and there are many test callers. |
|
||||
| **M3.8** 100× parallel stress test | Requires M3.4–M3.7. |
|
||||
|
||||
## What's already in place from M1+M2 that M3 will use
|
||||
|
||||
- **Page versions atomic** (M1.4/M2.1): `Vec<AtomicU64>`, Release/Acquire on per-page slots.
|
||||
- **Page table atomic** (M1.4): `Vec<AtomicU64>`, lock-free `alloc(&self)`.
|
||||
- **`MemoryAccess::write_u32_fence` / `read_u32_fence`** (M1.8): Release/Acquire fence helpers used by EVENT_WRITE_SHD / RPTR writeback.
|
||||
- **GPU on its own host thread** (M1.4–M1.10): `Arc<Mutex<GpuDigestSnapshot>>` + parker via `Arc<AtomicBool>` `wake_pending` + `unpark()` from MMIO callback.
|
||||
- **`ReservationTable`** (M2.2): banked AtomicU64, 4096 banks, `(line, generation, hw_id)`. Stress-tested with 8 concurrent host threads. Lives at `kernel.reservations: Arc<ReservationTable>`. Activation flag at `kernel.reservations_enabled: AtomicBool` (settable via `--reservations-table` or `XENIA_RESERVATIONS_TABLE=1`).
|
||||
- **`ThreadRef` generation packing** (M2.3): 1+1+2 = 4 bytes; `::new(hw_id, idx)` constructor.
|
||||
- **Atomic bump allocators** (M2.4): `next_handle`, `next_thread_id`, `next_tls_index`, `heap_cursor`, `stack_cursor` all `AtomicU32`.
|
||||
- **`pending_local_irq: [AtomicU8; 6]`** (M2.5): wired in `InterruptState`; M3.6 will start using bits.
|
||||
- **Phaser primitive** (M3.1): `arrive_and_wait` / `skip` / `shutdown` / `arrive_and_wait_timeout` API.
|
||||
- **Per-HW-slot block caches** (M3.2a): `[BlockCache; 6]` indexed by `hw_id`.
|
||||
|
||||
## Next session resumes at M3.2b
|
||||
|
||||
The natural ordering for the deferred work:
|
||||
|
||||
1. **M3.2b** Per-slot `Mutex<HwSlot>`. The scheduler internally locks per slot; external API stays method-based (no guard leakage). Verify lockstep golden bit-identical.
|
||||
2. **M3.3** `Arc<Mutex<KernelState>>` wrap (start coarse — single mutex around the whole struct; refine later if needed). Verify lockstep golden.
|
||||
3. **M3.4** Spawn N host threads under `--parallel` flag. Each acquires the kernel lock for HLE, drops for instruction execution, syncs at the phaser. Verify sylpheed boots; halts==0.
|
||||
4. **M3.5** Slot wakeup primitives. M3.4 will park on idle; M3.5 unparks on signal.
|
||||
5. **M3.6** IRQ routing per slot.
|
||||
6. **M3.7** Reservation interpreter wiring. PpcContext gets `hw_id` field + `Option<Arc<ReservationTable>>`.
|
||||
7. **M3.8** 100× sylpheed stress run.
|
||||
|
||||
## Files of note
|
||||
|
||||
- [crates/xenia-cpu/src/phaser.rs](xenia-rs/crates/xenia-cpu/src/phaser.rs) — phaser primitive (M3.1)
|
||||
- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) — per-thread block caches (M3.2a) at lines 1228 and 1651
|
||||
- [crates/xenia-cpu/src/lib.rs](xenia-rs/crates/xenia-cpu/src/lib.rs) — re-exports `Phaser`, `PhaserOutcome`
|
||||
Reference in New Issue
Block a user