Files
xenia-rs/migration/claude-memory/project_xenia_rs_concurrency_m3_progress.md
MechaCat02 e6d43a23ac chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:

  - claude-memory/             ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
                               (103 files, 1.1 MB - MEMORY.md + every
                                project_xenia_rs_*.md from audits
                                addis_signext through audit-058)
  - project-root/dot-claude/   <project-root>/.claude/settings.json
                               (Stop hook + permissions)
  - project-root/ppc-manual/   <project-root>/ppc-manual/
                               (PowerPC reference docs, 397 files, 3.7 MB)
  - project-root/run-canary.sh <project-root>/run-canary.sh
  - README.md                  Human-readable setup checklist
  - setup.sh                   Idempotent installer (also reclones
                               xenia-canary at pinned HEAD 6de80dffe)
  - MANIFEST.md                Per-file mapping + per-file-not-bundled
                               restoration recipe

Excluded from bundle (not shippable via git):
  - Sylpheed ISO (7.8 GB; copyright; manual copy required)
  - sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
  - target/ build artifacts (rebuild on target)
  - audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
  - audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
  - xenia-canary checkout (setup.sh reclones from
    git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:38:38 +02:00

6.8 KiB
Raw Blame History

name, description, type, originSessionId
name description type originSessionId
xenia-rs concurrency rollout — M3.1 + per-thread block-cache substrate landed (2026-04-26); M3.2M3.8 deferred Phaser primitive + per-HW-slot block caches landed (M3.1, M3.2a). The remaining seven substeps (per-slot Mutex<HwSlot>, KernelStateInner split, host-thread spawn, slot wakeups, IRQ routing, reservation interpreter wiring, parallel stress test) are interdependent and require focused dedicated sessions to land safely with per-step verification. Deferred work is precisely scoped below for the follow-up. project af90c866-579c-4506-af85-cd5a5030af85

What landed this session

M3.1 — Phaser primitive

crates/xenia-cpu/src/phaser.rs. Custom barrier-with-skip; 6 unit tests pass:

  • n_arrivers_all_advance — basic barrier semantics
  • skip_counts_toward_advance — skipping participants count toward advance
  • shutdown_wakes_arrivers — clean tear-down via Phaser::shutdown()
  • timeout_fires_when_peer_hangs — defensive timeout returns PhaserOutcome::Timeout
  • multi_phase_progress — 6 threads × 1000 phases, no deadlock, generation counter consistent
  • mixed_skip_and_arrive_random — pseudo-random skip/arrive across 200 phases

Memory ordering: phase counter is Release/Acquire. Participant count under Mutex + Condvar. The skip-counts-toward-advance design lets idle slots park on their own wake mechanism without stalling the phaser.

M3.2a — Per-HW-slot block caches

crates/xenia-app/src/main.rs:1228:

let mut block_caches: [BlockCache; HW_THREAD_COUNT] =
    std::array::from_fn(|_| BlockCache::new());

Dispatch site at main.rs:1651 routes through block_caches[hw_id as usize]. Bit-identical correctness in single-threaded mode (it's just 6 independent caches on one thread); eliminates cross-slot races for the eventual host-thread spawn.

Lockstep golden at -n 2M: matches.

Verification

  • cargo build --workspace: clean
  • cargo test --workspace: 411 passed, 0 failed (was 405 post-M2; +6 from phaser tests)
  • xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json (default = threaded GPU): matches
  • Same with --gpu-inline, --reservations-table, --gpu-inline --reservations-table: all match

Why M3.2bM3.8 are deferred

The remaining substeps are individually invasive and interdependent — none of them deliver observable end-to-end value without the others. Splitting them across separate sessions with focused verification is more responsible than racing through them in a single pass.

Substep Why it's a focused session of its own
M3.2b Per-slot Mutex<HwSlot> The scheduler holds slots: [HwSlot; 6]; many internal accesses are &mut self patterns that don't compose with MutexGuard lifetimes. Refactor touches ~30 callsites in scheduler.rs + several external accessors that hold borrows across method boundaries.
M3.3 Arc<Mutex<KernelState>> wrap Either wrap the whole struct (~98 export sigs unchanged but every callsite needs guard threading) or split into KernelStateShared + KernelStateInner (the plan's design — ~98 export sig changes mechanical but workspace-wide). Either path is a substantial single-purpose session.
M3.4 Spawn 6 host threads Requires M3.2b + M3.3 as substrate. The spawn body itself is a 200400 line replacement of the per-round portion of run_execution.
M3.5 Idle-slot wakeups Requires M3.4. Adds slot_wake[6]: AtomicBool + Thread handles + unpark() calls at every KeSetEvent/KeReleaseSemaphore site.
M3.6 IRQ via pending_local_irq Requires M3.4. M2.5 already wired the AtomicU8 array; M3.6 changes the producer side (T_main / GPU thread sets bits) and consumer side (T_cpu_i checks bits at quantum boundary, self-injects).
M3.7 Activate reservations in interpreter Requires threading hw_id + Arc<ReservationTable> reference into the interpreter dispatch. PpcContext doesn't currently carry hw_id, and step/step_cached/step_block don't take a table. Each path needs a parameter, and there are many test callers.
M3.8 100× parallel stress test Requires M3.4M3.7.

What's already in place from M1+M2 that M3 will use

  • Page versions atomic (M1.4/M2.1): Vec<AtomicU64>, Release/Acquire on per-page slots.
  • Page table atomic (M1.4): Vec<AtomicU64>, lock-free alloc(&self).
  • MemoryAccess::write_u32_fence / read_u32_fence (M1.8): Release/Acquire fence helpers used by EVENT_WRITE_SHD / RPTR writeback.
  • GPU on its own host thread (M1.4M1.10): Arc<Mutex<GpuDigestSnapshot>> + parker via Arc<AtomicBool> wake_pending + unpark() from MMIO callback.
  • ReservationTable (M2.2): banked AtomicU64, 4096 banks, (line, generation, hw_id). Stress-tested with 8 concurrent host threads. Lives at kernel.reservations: Arc<ReservationTable>. Activation flag at kernel.reservations_enabled: AtomicBool (settable via --reservations-table or XENIA_RESERVATIONS_TABLE=1).
  • ThreadRef generation packing (M2.3): 1+1+2 = 4 bytes; ::new(hw_id, idx) constructor.
  • Atomic bump allocators (M2.4): next_handle, next_thread_id, next_tls_index, heap_cursor, stack_cursor all AtomicU32.
  • pending_local_irq: [AtomicU8; 6] (M2.5): wired in InterruptState; M3.6 will start using bits.
  • Phaser primitive (M3.1): arrive_and_wait / skip / shutdown / arrive_and_wait_timeout API.
  • Per-HW-slot block caches (M3.2a): [BlockCache; 6] indexed by hw_id.

Next session resumes at M3.2b

The natural ordering for the deferred work:

  1. M3.2b Per-slot Mutex<HwSlot>. The scheduler internally locks per slot; external API stays method-based (no guard leakage). Verify lockstep golden bit-identical.
  2. M3.3 Arc<Mutex<KernelState>> wrap (start coarse — single mutex around the whole struct; refine later if needed). Verify lockstep golden.
  3. M3.4 Spawn N host threads under --parallel flag. Each acquires the kernel lock for HLE, drops for instruction execution, syncs at the phaser. Verify sylpheed boots; halts==0.
  4. M3.5 Slot wakeup primitives. M3.4 will park on idle; M3.5 unparks on signal.
  5. M3.6 IRQ routing per slot.
  6. M3.7 Reservation interpreter wiring. PpcContext gets hw_id field + Option<Arc<ReservationTable>>.
  7. M3.8 100× sylpheed stress run.

Files of note