--- name: xenia-rs --ui architecture (stable facts) description: Threading/bridge design, shader pipeline, GPU integration, HUD — stable across sessions. History + live state in `project_xenia_rs_current_state.md`. type: project originSessionId: 1e348be4-7f53-438a-9c1b-e0c2fcb7ec0d --- ## Threading & bridge `exec --ui` runs winit on the main thread and the scheduler/interpreter on a worker thread. Cross-thread communication: `Arc`-shared atomics + `EventLoopProxy` user events. `KernelState::ui: Option` carries closures that (a) read host gamepad and (b) post `SwapInfo` + frontbuffer bytes to the UI. `GuestMemory` stays pinned to the interpreter thread; only cooked bytes cross. **Why:** winit 0.30+ `ApplicationHandler` requires the main thread and wgpu's Surface is tied to `Window`. The interpreter is single-threaded (6 cooperative HW slots); making it multithread-safe would require `Arc>` on every guest instruction. **How to apply:** when adding cross-thread UI state, extend `SwapInfo` (post-swap) or add an atomic on `UiHandles` — don't reach across threads directly. ## GPU pipeline (P2–P7 stable) - **`xenia-gpu::GpuSystem`** — one per `KernelState`. Owns the `RegisterFile`, the `RingBufferView` (+ IB stack for nested `PM4_INDIRECT_BUFFER`), the `TextureCache` / `RenderTargetCache` (P4/P5), and the `GpuMmio` atomic mailbox exposed via the `0x7FC8_0000` MMIO aperture (Canary `graphics_system.cc:141`). Per scheduler round: `sync_with_mmio()` then `execute_one()` of whatever's ready. - **Type-3 packet coverage**: every non-draw Type-3 opcode is implemented (NOP, INDIRECT_BUFFER[_PFD], WAIT_REG_MEM, REG_RMW, REG_TO_MEM, MEM_WRITE, COND_WRITE, EVENT_WRITE[_SHD/_EXT/_ZPD], SET_CONSTANT[2], SET_SHADER_CONSTANTS, LOAD_ALU_CONSTANT, IM_LOAD[_IMMEDIATE], CONTEXT_UPDATE, INVALIDATE_STATE, VIZ_QUERY, ME_INIT, SET_BIN_MASK/SELECT, INTERRUPT, XE_SWAP). `DRAW_INDX*` captures `DrawState` + `ProcessedPrimitive` + metrics. - **WGSL shader interpreter (P3b/c + P7)**: `xenia-gpu::ucode` decoder + `pack_for_wgsl` dense layout; `xenos_interp.wgsl` (~465 LOC) implements the CF walker + 13 vec ALU ops + 6 scalar ops + R32G32B32A32_FLOAT vertex fetch + texture sampling. `XenosPipeline::new` builds two bind groups; uploads shader+constants+vertex before each batch in `dispatch_xenos_draws`. P7 added a direct Xenos→WGSL translator for when shader-bug isolation is needed. - **Texture cache (P5)**: page-version invalidated via `GuestMemory::page_version`. Formats supported: `K8888`, `K565`, `Dxt1`, `Dxt2_3`, `Dxt4_5` (M5). Host side `texture_cache_host.rs` maps each to `Rgba8Unorm`/`Bc{1,2,3}RgbaUnorm` with format-aware `bytes_per_row`. - **Render target cache (P4)**: EDRAM resolve handler `handle_event_initiator` wired into all four `PM4_EVENT_WRITE*` variants. On event code 15 (`TILE_FLUSH`), snapshots `RB_COPY_*` into `last_resolve`, bumps `stats.resolves_total`. Actual EDRAM→memory byte copy still deferred. ## MMIO aperture (stable) - Base `0x7FC8_0000`, mask `0xFFFF_0000`, size `0x0001_0000`. Install via `MmioRegion` on `GuestMemory`. - Registers served (others trace+zero): `CP_RB_WPTR`, `CP_RB_RPTR`, `CP_INT_STATUS`, `CP_INT_ACK` (0x071D, write-echo), `D1MODE_VBLANK_VLINE_STATUS` (0x1951 / byte offset `0x6544`, W1TC on bit 0). - Bit 0 of `D1MODE_VBLANK_VLINE_STATUS` is set by the app main loop on every synthetic vsync tick; Sylpheed's callback `rlwinm. r,r,0,31,31; bc 12,2,skip` gates all vsync work on it. ## Scheduler + interrupts - **`HwState` variants**: `Idle`, `Ready`, `Blocked(BlockReason)`, `Exited(code)`, `ServicingIrq(BlockReason)`. `ServicingIrq` is used by the graphics-interrupt injector to stash a block reason while running the callback; `wake()` and `round_schedule` both treat `ServicingIrq` as runnable. - **Graphics interrupt injection** (post-M8): `try_inject_graphics_interrupt` picks any non-`Idle`/`Exited` HW slot (prefers `Ready`, falls back to `Blocked`). `InterruptState::injected_hw` tracks which slot ran the callback. The LR-sentinel return path restores pre-injection ctx and re-blocks with the stashed reason (unless a `wake()` during the callback cleared it). - **Deadlock recovery**: when all live threads are `Blocked/Idle/Exited` and no timer is pending, force-wake every blocked thread with `STATUS_TIMEOUT` in `gpr[3]`. `scheduler.deadlock_recoveries` counter tracks this. - **Main thread exit is NOT a halt**: when `tid=1` hits `LR_HALT_SENTINEL` we mark it `Exited` and continue; the outer loop halts only when `has_live_thread()` is false. Sylpheed's design spawns workers then returns from main. ## HLE primitives (stable) - **Pseudo-handle resolution** `resolve_pseudo_handle(state, h)`: `0xFFFFFFFE` → current thread handle, `0xFFFFFFFF` → 0, others pass through. Called at top of every `Ob*`/`Nt*Wait*` export. - **PKEVENT shim** `ensure_dispatcher_object(state, mem, ptr)`: `Ke*` sync functions take `PKEVENT` pointers; first touch reads Xenon DISPATCHER_HEADER (type byte + SignalState at +4 + Limit at +0x10 for semaphores) and mints a shadow `KernelObject` keyed by the pointer. `refresh_pkevent_shadow_from_guest` re-syncs `SignalState` on each wait. - **WaitAny handle-index return**: Canary's `WaitMultiple` returns `STATUS_WAIT_0 + index` for WaitAny. `do_wait_multiple` matches; `set_wake_status_for_waitany` updates `gpr[3]` on wake. - **I/O completion signaling**: `signal_io_completion_event(state, event_handle)` fires at every completion path of `NtReadFile`/`NtWriteFile` (r4 = event). - **Empty-path / root-device opens** (`NtCreateFile("game:\")` etc.): synth a zero-byte `KernelObject::File` with empty `path`. `NtQueryInformationFile` class 5 reports `Directory=1` for empty/`/`/`:`-tail paths; class 34 (`FileNetworkOpenInformation`, 56 B) reports `FILE_ATTRIBUTE_DIRECTORY` at offset +48. ## HUD 6 rows, well-spaced, cyan accents: 1. Title + uptime + instr/kIPS (live counter via `instructions_counter` atomic). 2. Swaps. 3. GPU stats (packets, draws_total, resolves_total, interrupts). 4. Last-draw prim/verts. 5. Pad state. 6. Render path: `xdispatch: xlated=N interp=M xlated-pipelines=P tex-cache=T fb=WxH`. One-shot `tracing::info!` latches: "first Xenos draw dispatched" and "first translator pipeline compiled". ## Observability defaults Silences wgpu/winit/naga/gilrs at `warn` (wgpu at `error`). Override via `--log-filter='info,wgpu_core=trace'` during bring-up. `--trace-chrome PATH` captures Chrome/Perfetto trace; `--profile PATH.svg` emits a flamegraph. ## Interpreter performance (post-Tier-3) ~10 MIPS end-to-end on Sylpheed. Three wins stacked: de-hot-patted `metrics::counter!` per instruction; direct-mapped 64k `DecodeCache` keyed by PC with page-version invalidation; `Debugger::wants_hooks()` short-circuit + `trace_enabled = false` default (previous O(n²) `Vec::remove(0)` on the trace log was the real bottleneck, not `metrics`). **Deferred Tier 4** — threaded-code dispatch / JIT. Only worth doing after the shader translator + HLE coverage gaps narrow; fast-but-wrong produces fast-wrong output. ## Phase history Complete roadmap P1–P8 + perf Tiers 1–3 + first-pixels M1–M9 all landed. Details deliberately elided here — they're in the individual commit messages and the `project_xenia_rs_current_state.md` next-steps file. This doc stays focused on stable facts a new session needs before touching the code.