--- name: xenia-rs concurrency rollout — M1 complete (2026-04-26) description: All 10 M1 sub-steps landed. Default GPU backend is now threaded (worker thread on its own); `--gpu-inline` is the rollback. 395 workspace tests pass; sylpheed -n 2M golden matches in both modes; VdSwap=1/=2 fire end-to-end under threaded mode. type: project originSessionId: af90c866-579c-4506-af85-cd5a5030af85 --- ## What's landed (M1.1–M1.10) All 10 M1 sub-steps complete. Default GPU backend at runtime is **threaded** (`GpuBackend::Threaded`); `--gpu-inline` (or `--ui`, or `XENIA_GPU_INLINE=1`) selects the legacy synchronous path. ### Key types and modules - **`xenia_gpu::GpuBackend`** — enum `Inline(GpuSystem) | Threaded(GpuHandle)`. Forwarding methods: `mmio()`, `as_inline[_mut]()`, `initialize_ring_buffer`, `enable_rptr_writeback`, `extend_write_ptr_by`, `drain_to_current_wptr`, `notify_xe_swap`, `has_pending_interrupts`, `take_pending_interrupts`, `digest_snapshot`. ([crates/xenia-gpu/src/handle.rs](xenia-rs/crates/xenia-gpu/src/handle.rs)) - **`GpuCommand`** — `InitializeRing`, `EnableRptrWriteback`, `DrainFence{target_wptr, reply_tx}`, `NotifyXeSwap{frontbuffer_phys, width, height}`, `Shutdown`. - **`GpuHandle::send_cmd(cmd)`** wraps the raw `cmd_tx.send` with M1.7 parker discipline (set `wake_pending=true` Release + `unpark()` worker thread). - **`GpuWorker::run(Arc)`** — registers self as wake target, drains commands, syncs MMIO + executes packets in batches of 64, refreshes `Arc>` for the CPU-side digest, drains `pending_interrupts → int_tx`, parks via `park_timeout(16ms)` when idle. - **`spawn_gpu_worker(worker, Arc) -> JoinHandle`** spawns the worker; `shutdown_and_join_with_timeout` joins with 1 s defensive timeout. ### Memory model - **`GuestMemory.page_table: Vec`** with per-page Acquire/Release. `alloc`, `is_mapped`, `page_entry`, `write_bulk`, `translate_virtual_mut` all `&self`. - **`GuestMemory.writes_total: AtomicU64`** + **`page_versions: Vec`** with Release on bump, Acquire on read. - **`MemoryAccess::write_u32_fence` / `read_u32_fence`** (M1.8) — Release fence before the write / Acquire fence after the read. Migrated `EVENT_WRITE_SHD` and `writeback_read_ptr` to use the fenced variants. - **All `MemoryAccess` writes take `&self`** post the M1.4(b) handoff. ~140 `&mut GuestMemory` callsites swept across 10 files. `GuestMemoryPcr<'_>` callsites use `&mut` because `PcrWriter::write_pcr_id(&mut self, ...)`. ### Concurrency primitives (live in production) - **MMIO mailboxes** (`Arc` × 5): `cp_rb_wptr`, `cp_rb_rptr`, `cp_int_status`, `cp_int_ack`, `d1mode_vblank_vline_status`. Release on writer / Acquire on reader. - **`GpuMmio.wake_pending: Arc`** + **`worker_thread: Arc>>`**. WPTR write callback sets+`unpark()`s; worker swaps→park. - **`crossbeam_channel::unbounded`** for cmd_tx/cmd_rx and int_tx/int_rx. - **`bounded(1)`** reply channels for `DrainFence` (CPU's `recv_timeout(1s)` + worker's `Instant`-based 900 ms internal deadline). - **`Arc>`** refreshed once per worker iteration; CPU reads via `digest_snapshot()`. ### CLI / env defaults ``` default → threaded --gpu-inline (or XENIA_GPU_INLINE=1) → inline --gpu-thread (or XENIA_GPU_THREAD=1) → threaded (explicit) --ui → forces inline (UI worker not yet shared-mem-aware) ``` ### Verification (all green) | Check | Result | |---|---| | `cargo build --workspace` | clean | | `cargo test --workspace` | 395 passed, 0 failed | | `xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json` (default = threaded) | matches | | Same with `--gpu-inline` | matches | | `xenia-rs exec sylpheed.iso -n 30_000_000 --halt-on-deadlock` (default = threaded) | exit 0 | | VdSwap=1 + VdSwap=2 under threaded mode | both fire (~18M + ~28M cycles) | | GPU worker shutdown clean within 1 s | yes | Beyond ~50M instructions both threaded and inline modes hit the same `RtlRaiseException` pre-existing bug (unrelated to concurrency rollout). ### Known limitations / deferred - **`--ui` + threaded backend**: `cmd_exec_inner` panics if both are set; `--ui` auto-forces inline. Rationale: `run_with_ui` consumes `GuestMemory` by value; migrating it to `Arc` is a separate work item. - **Inline path retained**: kept as the rollback rail and the `--ui` path. M1.10 cleanup deferred to post-M3 per plan. - **Beyond ~50M instructions**: both modes hit a pre-existing `RtlRaiseException`. Not a regression. ### Next milestone (M2) `KernelStateInner + Arc>` refactor, per-slot `Mutex`, `ThreadRef` generation packing, `ReservationTable` for `lwarx`/`stwcx.`. Some M2 work was pulled forward by M1.4 (page_table atomization) — that's already complete. ### Files of note - [crates/xenia-gpu/src/handle.rs](xenia-rs/crates/xenia-gpu/src/handle.rs) — `GpuBackend`, `GpuCommand`, `GpuHandle::send_cmd`, `GpuWorker::run`, `GpuDigestSnapshot`, parker - [crates/xenia-gpu/src/gpu_system.rs](xenia-rs/crates/xenia-gpu/src/gpu_system.rs) — `GpuMmio` with `wake_pending` + `worker_thread`; `EVENT_WRITE_SHD` / `writeback_read_ptr` use fenced writes - [crates/xenia-gpu/src/mmio_region.rs](xenia-rs/crates/xenia-gpu/src/mmio_region.rs) — `CP_RB_WPTR` write callback sets `wake_pending` + `unpark()`s worker - [crates/xenia-memory/src/heap.rs](xenia-rs/crates/xenia-memory/src/heap.rs) — `Vec` page table, `&self` writes - [crates/xenia-memory/src/access.rs](xenia-rs/crates/xenia-memory/src/access.rs) — `write_u32_fence` / `read_u32_fence` - [crates/xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs) — `KernelState::with_gpu(GpuBackend)` - [crates/xenia-kernel/src/exports.rs](xenia-rs/crates/xenia-kernel/src/exports.rs) — `vd_swap` rewritten to use `GpuBackend` accessors; UI publish gated on `as_inline_mut()` - [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) — backend selection, worker spawn+join, `Arc` wrap