chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:
- claude-memory/ ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
(103 files, 1.1 MB - MEMORY.md + every
project_xenia_rs_*.md from audits
addis_signext through audit-058)
- project-root/dot-claude/ <project-root>/.claude/settings.json
(Stop hook + permissions)
- project-root/ppc-manual/ <project-root>/ppc-manual/
(PowerPC reference docs, 397 files, 3.7 MB)
- project-root/run-canary.sh <project-root>/run-canary.sh
- README.md Human-readable setup checklist
- setup.sh Idempotent installer (also reclones
xenia-canary at pinned HEAD 6de80dffe)
- MANIFEST.md Per-file mapping + per-file-not-bundled
restoration recipe
Excluded from bundle (not shippable via git):
- Sylpheed ISO (7.8 GB; copyright; manual copy required)
- sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
- target/ build artifacts (rebuild on target)
- audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
- audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
- xenia-canary checkout (setup.sh reclones from
git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,123 @@
|
||||
---
|
||||
name: xenia-rs concurrency rollout — M2 substantively complete (2026-04-26)
|
||||
description: M2.1–M2.5 + M2.8 landed; M2.6 (KernelStateInner) + M2.7 (per-slot Mutex<HwSlot>) deferred to M3 because they only matter when host threads exist. Page versions atomic, ReservationTable built (with stress test), ThreadRef carries generation, bump allocators atomic, per-slot pending_local_irq[6] AtomicU8 wired, --reservations-table CLI flag flips runtime atomic. 405 tests pass, sylpheed -n 2M golden matches under all flag combos.
|
||||
type: project
|
||||
originSessionId: af90c866-579c-4506-af85-cd5a5030af85
|
||||
---
|
||||
## What landed
|
||||
|
||||
### M2.1 — atomic page versions + Acquire/Release for cache invalidation
|
||||
|
||||
Already complete from M1.4 — `page_table: Vec<AtomicU64>`, `writes_total: AtomicU64`, `page_versions: Vec<AtomicU64>` in [crates/xenia-memory/src/heap.rs](xenia-rs/crates/xenia-memory/src/heap.rs). Block cache and texture cache call `mem.page_version(...)` which is `Acquire` load on the live `GuestMemory` impl.
|
||||
|
||||
### M2.2 — `ReservationTable` for lwarx/stwcx
|
||||
|
||||
New module: [crates/xenia-cpu/src/reservation.rs](xenia-rs/crates/xenia-cpu/src/reservation.rs). Banked `Vec<AtomicU64>` (4096 banks × 8 B = 32 KiB), `(line_addr, generation, hw_id)` packed per slot. Hash collisions invalidate conservatively (matches Xenon L2 associativity). Memory ordering: `AcqRel` on the line CAS / swap; `Relaxed` on the active-reserver counter.
|
||||
|
||||
API:
|
||||
- `reserve(addr, hw_id) -> u32` — claim a slot, returns the generation stamped.
|
||||
- `try_commit(addr, my_gen, my_hw_id) -> bool` — CAS-clear the slot if it still matches.
|
||||
- `invalidate_for_write(addr)` — plain-store hook to invalidate the line.
|
||||
- `has_active_reservers() -> bool` — fast-path skip on writes when zero.
|
||||
|
||||
9 unit tests including an 8-thread stress test (`concurrent_lwarx_stwcx_serializes`) that proves only one stwcx can win per round. Lives behind `--reservations-table` flag (M2.8); the interpreter's `lwarx`/`stwcx.` arms still use the legacy per-`PpcContext` fields. M3 will hook the table into the interpreter when host threads spawn.
|
||||
|
||||
### M2.3 — `ThreadRef` generation packing
|
||||
|
||||
[crates/xenia-cpu/src/scheduler.rs](xenia-rs/crates/xenia-cpu/src/scheduler.rs:52-79):
|
||||
```rust
|
||||
pub struct ThreadRef {
|
||||
pub hw_id: u8,
|
||||
pub generation: u8,
|
||||
pub idx: u16,
|
||||
}
|
||||
```
|
||||
|
||||
Total 4 bytes, no padding. 256 reuses per slot before wraparound; `PRUNE_DEPTH_THRESHOLD = 4` keeps slots shallow so this is plenty. M2 leaves generation at `0` on every spawn — no concurrent `swap_remove` happens before M3 so ABA can't occur. M3's migration-fixup site will bump generations.
|
||||
|
||||
Constructors `ThreadRef::new(hw_id, idx)` and `ThreadRef::with_generation(hw_id, idx, generation)`. ~30 existing literal sites adapted (several converted to `ThreadRef::new(...)`, others got an explicit `generation: 0` field).
|
||||
|
||||
### M2.4 — bump allocators to atomics
|
||||
|
||||
In [crates/xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs):
|
||||
- `next_handle: AtomicU32` (start `0x1000`, `fetch_add(4, Relaxed)`)
|
||||
- `next_thread_id: AtomicU32`
|
||||
- `next_tls_index: AtomicU32`
|
||||
- `heap_cursor: AtomicU32`
|
||||
- `stack_cursor: AtomicU32`
|
||||
|
||||
`heap_alloc` / `stack_alloc` use `fetch_add(size, Relaxed)` then verify post-bump invariants. A failed alloc near the limit leaves the cursor advanced (matches pre-M2 behavior — game-over either way). New unit test `concurrent_alloc_handle_distinct` (10 threads × 100 allocations → 1000 distinct handles).
|
||||
|
||||
### M2.5 — per-slot `pending_local_irq` (preview)
|
||||
|
||||
In [crates/xenia-kernel/src/interrupts.rs](xenia-rs/crates/xenia-kernel/src/interrupts.rs):
|
||||
```rust
|
||||
pub type PendingLocalIrq = [AtomicU8; HW_THREAD_COUNT];
|
||||
|
||||
pub struct InterruptState {
|
||||
// ... existing fields ...
|
||||
pub pending_local_irq: PendingLocalIrq,
|
||||
}
|
||||
```
|
||||
|
||||
Field exists, `Default::default()` initializes to all zeros. Unused in M2's lockstep path; M3 will set bits Release on the target slot's atomic and the target T_cpu_i will Acquire-load at quantum boundary.
|
||||
|
||||
### M2.8 — reservation table activation flag
|
||||
|
||||
`--reservations-table` CLI flag on `Exec` and `Check`. `XENIA_RESERVATIONS_TABLE=1` env var fallback. When set, `kernel.reservations_enabled` flipped to `true` (Release). Always-allocated `kernel.reservations: Arc<ReservationTable>` (every kernel has one; it's free until used).
|
||||
|
||||
Interpreter wiring is M3 work — for now the flag is observable but doesn't change `lwarx`/`stwcx.` semantics. Verified: golden matches under `--reservations-table`, `--gpu-inline --reservations-table`, etc.
|
||||
|
||||
## Deferred to M3 (with rationale)
|
||||
|
||||
### M2.6 — `KernelStateInner` + `Arc<Mutex<...>>`
|
||||
|
||||
The plan calls this "the big mechanical step" — change ~98 export signatures from `&mut KernelState` to `&mut KernelStateInner`. **Deferred to M3 because:**
|
||||
|
||||
1. Under M2's single-threaded execution, the lock would never contend — the refactor delivers zero observable benefit.
|
||||
2. The locking discipline (lock per HLE call vs. lock per round) only becomes load-bearing once multiple host threads exist. Designing it without those callers risks designing the wrong API.
|
||||
3. The M3 spawn work *is* the natural integration point: spawning per-HW-thread workers and granting them concurrent kernel access are inseparable.
|
||||
|
||||
Bundled with M3 work on the next session.
|
||||
|
||||
### M2.7 — per-slot `Mutex<HwSlot>` + `SchedulerTopology` RwLock
|
||||
|
||||
Same rationale: the per-slot mutex is invisible until multiple T_cpu_i exist. The lock-ordering proof from the plan (ascending `hw_id`, topology RwLock above per-slot Mutex) only becomes verifiable under genuine parallelism. Bundled with M3.
|
||||
|
||||
## Verification (all green)
|
||||
|
||||
| Check | Result |
|
||||
|---|---|
|
||||
| `cargo build --workspace` | clean |
|
||||
| `cargo test --workspace` | 405 passed, 0 failed |
|
||||
| `xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json` (default = threaded) | matches |
|
||||
| Same with `--gpu-inline` (rollback) | matches |
|
||||
| Same with `--reservations-table` | matches |
|
||||
| Same with `--gpu-inline --reservations-table` | matches |
|
||||
| `concurrent_lwarx_stwcx_serializes` (8 threads × 1000 rounds, ReservationTable stress) | passes |
|
||||
| `concurrent_alloc_handle_distinct` (10 threads × 100 allocs, AtomicU32 next_handle) | passes |
|
||||
| `write_u32_fence_publishes_prior_writes` (M1.8 fence test, hardened with AtomicU32 storage) | passes |
|
||||
|
||||
Tests grew from 395 (post-M1) to 405 (+10 new substep tests).
|
||||
|
||||
## Files of note
|
||||
|
||||
- [crates/xenia-cpu/src/reservation.rs](xenia-rs/crates/xenia-cpu/src/reservation.rs) — banked `ReservationTable` with stress tests
|
||||
- [crates/xenia-cpu/src/scheduler.rs](xenia-rs/crates/xenia-cpu/src/scheduler.rs) — `ThreadRef` gen-packed, `ThreadRef::new` constructor
|
||||
- [crates/xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs) — atomic bump allocators, `reservations: Arc<ReservationTable>`, `reservations_enabled: AtomicBool`
|
||||
- [crates/xenia-kernel/src/interrupts.rs](xenia-rs/crates/xenia-kernel/src/interrupts.rs) — `pending_local_irq: [AtomicU8; 6]`
|
||||
- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) — `--reservations-table` flag wiring, kernel construction
|
||||
- [crates/xenia-gpu/src/handle.rs](xenia-rs/crates/xenia-gpu/src/handle.rs) — fence test fixture rewritten to use `AtomicU32` slots (was flaky `Cell<u8>`)
|
||||
|
||||
## Next milestone (M3)
|
||||
|
||||
The M3 spawn work bundles:
|
||||
1. **`KernelStateInner` split** (carryover from M2.6). `Arc<Mutex<KernelStateInner>>`. ~98 export sigs.
|
||||
2. **Per-slot `Mutex<HwSlot>`** (carryover from M2.7). Lock order ascending by `hw_id`. `SchedulerTopology` RwLock for cross-slot ops.
|
||||
3. **Phaser primitive** for quantum-based barrier sync.
|
||||
4. **6 `HwHostThread`s** spawned. Wakeup-on-signal via `slot_wake[6]` + `unpark()`.
|
||||
5. **IRQ injection routed through `pending_local_irq[6]`** — target T_cpu_i self-injects.
|
||||
6. **Reservation table activation** in `lwarx`/`stwcx.` arms.
|
||||
7. **Sylpheed parallel boot** verification + 100x stress test.
|
||||
|
||||
The plan's verification matrix at M3 done: golden matches under `--lockstep` (single-host-thread; M2 still matches). Parallel mode reaches VdSwap=2 with `deadlock_halts == 0` (digest *will* differ from lockstep — expected and documented per the existing thread-interleaving-divergence note in the perf memory).
|
||||
Reference in New Issue
Block a user