---
name: xenia-rs concurrency rollout — M1 complete (2026-04-26)
description: All 10 M1 sub-steps landed. Default GPU backend is now threaded (worker thread on its own); `--gpu-inline` is the rollback. 395 workspace tests pass; sylpheed -n 2M golden matches in both modes; VdSwap=1/=2 fire end-to-end under threaded mode.
type: project
originSessionId: af90c866-579c-4506-af85-cd5a5030af85
---
## What's landed (M1.1–M1.10)

All 10 M1 sub-steps complete. Default GPU backend at runtime is **threaded** (`GpuBackend::Threaded`); `--gpu-inline` (or `--ui`, or `XENIA_GPU_INLINE=1`) selects the legacy synchronous path.

### Key types and modules

- **`xenia_gpu::GpuBackend`** — enum `Inline(GpuSystem) | Threaded(GpuHandle)`. Forwarding methods: `mmio()`, `as_inline[_mut]()`, `initialize_ring_buffer`, `enable_rptr_writeback`, `extend_write_ptr_by`, `drain_to_current_wptr`, `notify_xe_swap`, `has_pending_interrupts`, `take_pending_interrupts`, `digest_snapshot`. ([crates/xenia-gpu/src/handle.rs](xenia-rs/crates/xenia-gpu/src/handle.rs))

- **`GpuCommand`** — `InitializeRing`, `EnableRptrWriteback`, `DrainFence{target_wptr, reply_tx}`, `NotifyXeSwap{frontbuffer_phys, width, height}`, `Shutdown`.

- **`GpuHandle::send_cmd(cmd)`** wraps the raw `cmd_tx.send` with M1.7 parker discipline (set `wake_pending=true` Release + `unpark()` worker thread).

- **`GpuWorker::run(Arc<GuestMemory>)`** — registers self as wake target, drains commands, syncs MMIO + executes packets in batches of 64, refreshes `Arc<Mutex<GpuDigestSnapshot>>` for the CPU-side digest, drains `pending_interrupts → int_tx`, parks via `park_timeout(16ms)` when idle.

- **`spawn_gpu_worker(worker, Arc<GuestMemory>) -> JoinHandle`** spawns the worker; `shutdown_and_join_with_timeout` joins with 1 s defensive timeout.

### Memory model

- **`GuestMemory.page_table: Vec<AtomicU64>`** with per-page Acquire/Release. `alloc`, `is_mapped`, `page_entry`, `write_bulk`, `translate_virtual_mut` all `&self`.
- **`GuestMemory.writes_total: AtomicU64`** + **`page_versions: Vec<AtomicU64>`** with Release on bump, Acquire on read.
- **`MemoryAccess::write_u32_fence` / `read_u32_fence`** (M1.8) — Release fence before the write / Acquire fence after the read. Migrated `EVENT_WRITE_SHD` and `writeback_read_ptr` to use the fenced variants.
- **All `MemoryAccess` writes take `&self`** post the M1.4(b) handoff. ~140 `&mut GuestMemory` callsites swept across 10 files. `GuestMemoryPcr<'_>` callsites use `&mut` because `PcrWriter::write_pcr_id(&mut self, ...)`.

### Concurrency primitives (live in production)

- **MMIO mailboxes** (`Arc<AtomicU32>` × 5): `cp_rb_wptr`, `cp_rb_rptr`, `cp_int_status`, `cp_int_ack`, `d1mode_vblank_vline_status`. Release on writer / Acquire on reader.
- **`GpuMmio.wake_pending: Arc<AtomicBool>`** + **`worker_thread: Arc<Mutex<Option<Thread>>>`**. WPTR write callback sets+`unpark()`s; worker swaps→park.
- **`crossbeam_channel::unbounded`** for cmd_tx/cmd_rx and int_tx/int_rx.
- **`bounded(1)`** reply channels for `DrainFence` (CPU's `recv_timeout(1s)` + worker's `Instant`-based 900 ms internal deadline).
- **`Arc<Mutex<GpuDigestSnapshot>>`** refreshed once per worker iteration; CPU reads via `digest_snapshot()`.

### CLI / env defaults

```
default                               → threaded
--gpu-inline (or XENIA_GPU_INLINE=1)  → inline
--gpu-thread (or XENIA_GPU_THREAD=1)  → threaded (explicit)
--ui                                  → forces inline (UI worker not yet shared-mem-aware)
```

### Verification (all green)

| Check | Result |
|---|---|
| `cargo build --workspace` | clean |
| `cargo test --workspace` | 395 passed, 0 failed |
| `xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json` (default = threaded) | matches |
| Same with `--gpu-inline` | matches |
| `xenia-rs exec sylpheed.iso -n 30_000_000 --halt-on-deadlock` (default = threaded) | exit 0 |
| VdSwap=1 + VdSwap=2 under threaded mode | both fire (~18M + ~28M cycles) |
| GPU worker shutdown clean within 1 s | yes |

Beyond ~50M instructions both threaded and inline modes hit the same `RtlRaiseException` pre-existing bug (unrelated to concurrency rollout).

### Known limitations / deferred

- **`--ui` + threaded backend**: `cmd_exec_inner` panics if both are set; `--ui` auto-forces inline. Rationale: `run_with_ui` consumes `GuestMemory` by value; migrating it to `Arc<GuestMemory>` is a separate work item.
- **Inline path retained**: kept as the rollback rail and the `--ui` path. M1.10 cleanup deferred to post-M3 per plan.
- **Beyond ~50M instructions**: both modes hit a pre-existing `RtlRaiseException`. Not a regression.

### Next milestone (M2)

`KernelStateInner + Arc<Mutex<...>>` refactor, per-slot `Mutex<HwSlot>`, `ThreadRef` generation packing, `ReservationTable` for `lwarx`/`stwcx.`. Some M2 work was pulled forward by M1.4 (page_table atomization) — that's already complete.

### Files of note

- [crates/xenia-gpu/src/handle.rs](xenia-rs/crates/xenia-gpu/src/handle.rs) — `GpuBackend`, `GpuCommand`, `GpuHandle::send_cmd`, `GpuWorker::run`, `GpuDigestSnapshot`, parker
- [crates/xenia-gpu/src/gpu_system.rs](xenia-rs/crates/xenia-gpu/src/gpu_system.rs) — `GpuMmio` with `wake_pending` + `worker_thread`; `EVENT_WRITE_SHD` / `writeback_read_ptr` use fenced writes
- [crates/xenia-gpu/src/mmio_region.rs](xenia-rs/crates/xenia-gpu/src/mmio_region.rs) — `CP_RB_WPTR` write callback sets `wake_pending` + `unpark()`s worker
- [crates/xenia-memory/src/heap.rs](xenia-rs/crates/xenia-memory/src/heap.rs) — `Vec<AtomicU64>` page table, `&self` writes
- [crates/xenia-memory/src/access.rs](xenia-rs/crates/xenia-memory/src/access.rs) — `write_u32_fence` / `read_u32_fence`
- [crates/xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs) — `KernelState::with_gpu(GpuBackend)`
- [crates/xenia-kernel/src/exports.rs](xenia-rs/crates/xenia-kernel/src/exports.rs) — `vd_swap` rewritten to use `GpuBackend` accessors; UI publish gated on `as_inline_mut()`
- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) — backend selection, worker spawn+join, `Arc<GuestMemory>` wrap