xenia-rs

Author	SHA1	Message	Date
MechaCat02	cedee3c385	fix(cpu): PPCBUG-510 stvewx128 writes 16 bytes instead of 4 stvewx128 was aligning EA to 16 bytes and writing all 16 bytes of the vector, corrupting 12 adjacent bytes on every call. ISA semantics: word-align EA, extract word lane (EA & 0xF) >> 2, write 4 bytes only. The non-128 stvewx was already correct; stvewx128 was never updated. Mirror the stvewx body with instr.vs128() substituted for instr.rs(). The invalidate_for_write call from P1 now covers the correct word-aligned EA rather than the over-wide 16-byte range. interpreter.rs: stvewx128 arm (~line 2984) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 10:05:37 +02:00
MechaCat02	6b9de17925	fix(cpu): PPCBUG-363 PPCBUG-369 vpkd3d128 post-pack permutation vpkd3d128 was storing the pack codec output directly into vd128 without applying the MakePermuteMask permutation that merges the packed scalar(s) into the previous register value according to pack (slot layout) and shift (destination lane offset). PPCBUG-363: vpkd3d128 was missing the post-pack lane-placement step. PPCBUG-369: vpkd3d128 pack field not extracted; pack=0 still worked (identity), but pack=1/2/3 always wrote raw out instead of blending. Fix: extract `pack = uimm & 3` and `shift = instr.vx128_4_z()` from the VX128_4 IMM and z fields. For pack==0 (identity) store out directly as before. For pack 1-3, read the existing vd128 value and select 4 u32 words from {prev, out} using the 3×4 static permutation tables from canary ppc_emit_altivec.cc:2126-2188. Tables derived from canary MakePermuteMask(r0,l0,…r3,l3): pack=1 (VPACK_32): out[3] placed at lane (3-shift), prev elsewhere pack=2 (64-bit): out[2..3] placed at lanes (2-shift)..(3-shift) pack=3 (64-bit): same as pack=2 except shift=3 → out[2] at lane 3 Tests: vpkd3d128_pack0_legacy_unchanged, vpkd3d128_pack1_shift0_d3d_vertex_pack, vpkd3d128_pack1_shift3_puts_out3_at_lane0 interpreter.rs: vpkd3d128 arm (~line 3999) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-01 22:06:00 +02:00
MechaCat02	64e8ecbfd0	fix(cpu): PPCBUG-361 PPCBUG-565 fix vsldoi128 SH field extraction PPCBUG-565: Add vx128_5_sh() to decoder.rs — 4-bit shift at PPC bits 22-25 (host bits 6-9). The correct MSB is at PPC bit 22 (host bit 9). PPCBUG-361: vsldoi128 was reading the SH MSB from host bit 4 (PPC bit 27, reserved) instead of host bit 9 (PPC bit 22). All shift amounts >= 8 decoded incorrectly (e.g. shift=8 executed as shift=0). Replace the inline bit-shuffle with instr.vx128_5_sh(). Also fix vx128_p_perm_assembles_correctly test: replace nonexistent DecodedInstr::from_raw() calls with struct literal construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 21:29:12 +02:00
MechaCat02	197d76c44e	fix(cpu): PPCBUG-315 PPCBUG-563 fix vrlimi128 z and IMM field extraction PPCBUG-563: Add vx128_4_imm() (PPC bits 11-15) and vx128_4_z() (PPC bits 24-25) accessors to decoder.rs for VX128_4-form instructions. PPCBUG-315: vrlimi128 was reading z from host bits 16-17 (a subset of IMM) and mask from host bits 2-5 (a reserved/XO region). Replace with the correct accessors: z selects which word-lane to start the rotation from (0-3); IMM is the 5-bit per-lane blend mask. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 21:26:26 +02:00
MechaCat02	d51b9346df	fix(cpu): PPCBUG-275 276 420 421 422 423 562 600 fix vcmp Rc bit + decode dot forms PPCBUG-562: Add vc_rc_bit() (PPC bit 21) and vx128r_rc_bit() (PPC bit 27) to decoder.rs. The generic rc_bit() reads bit 0 (PPC bit 31); all vcmp XO values are even so bit 0 is always 0, making CR6 permanently dead. PPCBUG-275/276/420/421: Replace rc_bit() with vc_rc_bit() at all 8 pure VC-form vcmp arms (vcmpequb, vcmpequh, vcmpgtub, vcmpgtsb, vcmpgtuh, vcmpgtsh, vcmpgtuw, vcmpgtsw) and with the correct per-form accessor at the 4 combined arms (vcmpeqfp\|128, vcmpgefp\|128, vcmpgtfp\|128, vcmpequw\|128) and vcmpbfp\|128. PPCBUG-422: VX128_R-form 128-variants in combined arms now use vx128r_rc_bit() instead of vc_rc_bit(). PPCBUG-423/600: Add 5 dot-form key entries to decode_op6 so vcmp*fp128./vcmpequw128. decode as the correct opcode instead of Invalid. Uses a 5-bit key (bits22-24 + bit25 + bit27) for dot-forms to avoid aliasing against the shift/merge group (which sets bit25=1 when bit27=1). Interpreter uses vx128r_rc_bit() to conditionally update CR6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 21:15:06 +02:00
MechaCat02	75544fa9db	fix(cpu): PPCBUG-046 PPCBUG-561 add mb_md() accessor; fix all 6 rld* mb fields PPCBUG-561: Add DecodedInstr::mb_md() to decoder.rs — the correct MD-form 6-bit mask-begin reconstruction (MB[4:0] at PPC bits 21-25, MB[5] at PPC bit 26). The disassembler already had the correct local formula; this promotes it to a single source of truth on DecodedInstr. PPCBUG-046: All 6 doubleword-rotate arms (rldicl, rldicr, rldic, rldimi, rldcl, rldcr) inlined "(instr.mb() << 1) \| ((instr.raw >> 1) & 1)" which reads SH5 (host bit 1) instead of MB5 (host bit 5). For the canonical "clrldi r3, r4, 32" zero-extend idiom (mb=32 → MB5=1, MB[4:0]=0), the wrong formula produced mb=0, making the instruction a no-op and leaving upper 32 bits of the GPR polluted. Replace all 6 sites with instr.mb_md(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 21:01:03 +02:00
MechaCat02	147daa0721	fix(cpu): PPCBUG-040 PPCBUG-560 fix sh64() bit order and rldicl test helper PPCBUG-040: decoder.rs sh64() assembled the XS-form shift amount as (SH[4:0] << 1) \| SH[5] instead of (SH[5] << 5) \| SH[4:0]. Every `sradi` with shift N ∈ 1..=62 executed with a completely wrong shift count (e.g. shift=32 executed as shift=1). PPCBUG-560: disasm_goldens.rs rldicl() test helper was encoding sh[5:1] at PPC bits 16-20 and sh[0] at PPC bit 30 — exactly backwards. The wrong encoder and wrong decoder cancelled out, hiding PPCBUG-040 from tests. Fix both together so tests validate ISA-correct encodings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 20:54:40 +02:00
MechaCat02	c9f194dda1	fix(cpu): review fixes — stswi/stswx two-line guard, dcbz/dcbz128 invalidate PPCBUG-160 partial: stswi's single invalidate_for_write(ea) only covered the first cache line; with nb up to 32, the write span can cross a 128-byte line boundary. Replace with two-call guard: first_line = ea & !RESERVATION_MASK last_line = ea.wrapping_add(nb - 1) & !RESERVATION_MASK invalidate first; if last != first, invalidate last. PPCBUG-160 partial: stswx had the same single-call gap; nb from XER[0:6] can be up to 127 bytes. Same two-call guard applied; wrapped in `if nb > 0` to guard against nb==0 underflow (XER TBC field is 0 when no bytes to store). dcbz: zeroes 32 bytes at a 32-byte-aligned EA — touches exactly one 128-byte cache line; add canonical single-call invalidate guard (was entirely missing). dcbz128: zeroes 128 bytes at a 128-byte-aligned EA — one full reservation line; add canonical single-call invalidate guard (was entirely missing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 20:47:32 +02:00
MechaCat02	d75c4edf67	docs(cpu): PPCBUG-108 document legacy reservation path's strict-lockstep requirement Adds doc comments above lwarx/ldarx/stwcx./stdcx. clarifying that the legacy per-ctx reservation path is only correct in strict lockstep (single host thread); under --parallel the M3 scheduler must enable the cross-thread ReservationTable before spawning a second host thread. A debug_assert fires in the legacy stwcx./stdcx. branch if a non-primary HW slot (hw_id != 0) takes that path — surfacing ReservationTable-disabled misconfiguration early in debug builds. Note: the primary slot (hw_id==0) racing other parallel slots is not caught by the assert; that case requires the table to be enabled. Affected: PPCBUG-108 legacy per-ctx reservation path cannot invalidate cross-thread; informational — no behavioral change Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:55:13 +02:00
MechaCat02	a107ac9ae7	fix(cpu): PPCBUG-151 add reservation_width discriminator to stwcx./stdcx. Track lwarx vs ldarx reservation width in PpcContext as a u8 (4 = word, 8 = doubleword, 0 = none). stwcx. requires width==4; stdcx. requires width==8. Cross-width pairs (lwarx + stdcx., ldarx + stwcx.) now fail deterministically with CR0.EQ=0 instead of spuriously succeeding. The width is held per-thread; the cross-thread reservation table keeps its existing slot encoding because each host thread consults its own ctx.reservation_width before committing. Affected: PPCBUG-151 stwcx./stdcx. shared the same reservation slot without width discriminator; cross-width commits silently succeeded Tests: lwarx_then_stdcx_cross_width_fails, ldarx_then_stwcx_cross_width_fails Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:44:48 +02:00
MechaCat02	d4e227eeab	fix(cpu): PPCBUG-511 PPCBUG-512 PPCBUG-513 PPCBUG-514 add invalidate_for_write to VMX stores Continuation of the PPCBUG-107 cascade sweep. All 16 VMX store opcodes (stvx/stvxl, stvebx/stvehx/stvewx, stvlx/stvrx and 128 variants of each) now invalidate the reservation table before writing. stvlx/stvrx partial-vector stores can write at non-16-byte-aligned EAs; they invalidate both potentially-touched cache lines. stvewx128 currently writes 16 bytes at the wrong EA scope (PPCBUG-510); the invalidate guard fires at that over-wide EA today and will narrow automatically when PPCBUG-510 is fixed in P3. Affected: PPCBUG-511 stvx, stvx128, stvxl, stvxl128 PPCBUG-512 stvebx, stvehx, stvewx, stvewx128 PPCBUG-513 stvlx, stvlx128, stvlxl, stvlxl128 PPCBUG-514 stvrx, stvrx128, stvrxl, stvrxl128 Tests: lwarx_then_plain_stvx_invalidates_reservation, lwarx_then_plain_stvlx_invalidates_reservation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:36:17 +02:00
MechaCat02	af54eb28bd	fix(cpu): PPCBUG-160 PPCBUG-167 add invalidate_for_write to multiple/string + FP stores Continuation of the PPCBUG-107 cascade sweep. stmw/stswi/stswx (multiple and string stores) and the 9 floating-point stores now invalidate the reservation table before writing. stmw can span two cache lines when the writeback range crosses a line boundary; the guard iterates over all touched lines so multi-line atomic holds the same guarantee as single-line stores. Affected: PPCBUG-160 3 multiple/string stores: stmw, stswi, stswx PPCBUG-167 9 FP stores: stfs, stfsu, stfsx, stfsux, stfd, stfdu, stfdx, stfdux, stfiwx Tests: lwarx_then_plain_stmw_spans_two_lines_and_invalidates, lwarx_then_plain_stfd_invalidates_reservation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:24:46 +02:00
MechaCat02	24d347436a	fix(cpu): PPCBUG-130 PPCBUG-150 add invalidate_for_write to byte/halfword/doubleword stores Continuation of the PPCBUG-107 cascade sweep (batch 1: word stores landed in `4538fa9`). Plain stb/stbu/stbx/stbux, sth/sthu/sthx/sthux/sthbrx, and std/stdu/stdx/stdux/stdbrx now invalidate the reservation table before writing, so cross-thread lwarx/stwcx. atomicity holds when these widths are written by another host thread. Affected: PPCBUG-130 9 byte/halfword stores missing invalidate_for_write stb, stbu, stbx, stbux, sth, sthu, sthx, sthux, sthbrx PPCBUG-150 5 doubleword stores missing invalidate_for_write std, stdu, stdx, stdux, stdbrx Tests: lwarx_then_plain_stb_invalidates_reservation, lwarx_then_plain_std_invalidates_reservation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:13:31 +02:00
MechaCat02	4538fa9e70	fix(cpu): PPCBUG-107 PPCBUG-140-144 add invalidate_for_write to word stores Word stores (stw, stwu, stwx, stwux, stwbrx) now invalidate the reservation table for the target line before writing. Without this, plain stores by other host threads silently fail to clear reservations held by lwarx, causing stwcx. to spuriously succeed under --parallel. Affected: PPCBUG-107 ReservationTable::invalidate_for_write never called from any store PPCBUG-140 stw missing invalidate_for_write (interpreter.rs:1183) PPCBUG-141 stwu missing invalidate_for_write (interpreter.rs:1189) PPCBUG-142 stwx missing invalidate_for_write (interpreter.rs:1195) PPCBUG-143 stwux missing invalidate_for_write (interpreter.rs:1201) PPCBUG-144 stwbrx missing invalidate_for_write (interpreter.rs:1568) Tests: lwarx_then_plain_stw_invalidates_reservation, lwarx_then_stwcx_succeeds_without_intervening_store Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:57:05 +02:00
MechaCat02	bae9305982	xenia-app: observability subsystem, --parallel runtime, stress harness observability.rs installs the tracing subscriber stack (env-filter + JSON file appender + chrome trace + error layer) and the metrics recorder shared by the workspace. main.rs grows the new CLI surface: --parallel, --reservations-table, --trace-handles, --analyze= {rust,sql,both}, xenia dis --json, --ui, plus the wiring that runs the CPU through the new scheduler, drives the GPU's threaded backend, and surfaces the framebuffer + HUD via xenia-ui. Add tests/parallel_stress.rs (#[ignore]-gated long form, short form runs 20×@5M) and tests/golden/sylpheed_n2m.json — the digest the lockstep/parallel combos compare against. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:30:26 +02:00
MechaCat02	b1285ba560	xenia-hid + xenia-debugger: gamepad serializer; debugger fast-skip hook xenia-hid grows a guest-facing X_INPUT_GAMEPAD writer (big-endian on the wire, host-neutral GamepadState in memory) so XamInputGetState in the kernel and the UI input thread share one POD snapshot type. Adds the GUIDE button flag. xenia-debugger gains Debugger::wants_hooks(), a single-branch fast path the hot interpreter loop checks to skip the pre_step/post_step HashMap+match work when the debugger is in cold-run mode (no bps, no trace, StepMode::Run, not paused). Part of the Tier-3 perf landing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:30:03 +02:00
MechaCat02	79eb52c378	xenia-gpu: end-to-end Xenos pipeline (PM4, ucode, EDRAM, resolve) First real GPU implementation. Ring/PM4 frontend (ring_view, ring_drain, pm4) drains the command processor; gpu_system owns the threaded backend (DrainFence RPC + parker/fence helpers from M1) and the MMIO-mapped register block (mmio_region). Xenos shader frontend: ucode/{alu,control_flow,fetch,mod}.rs decode the Xbox 360 microcode, translator.rs lowers it onto the WGSL xenos_interp interpreter shader (shaders/xenos_interp.wgsl). shader_metrics.rs counts decode/translate work. Render state: draw_state, primitive, render_target_cache, texture_cache, tiled_address (Xenos's swizzled tiled-memory layout), xenos_constants (register field constants), edram (the 10 MiB EDRAM model with MSAA), and resolve.rs (TILE_FLUSH copy-out — clear-resolve plus bitwise-equivalent 32 bpp + 64 bpp paths landed). handle.rs owns the typed GPU-resource handles the kernel hands out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:29:38 +02:00
MechaCat02	5f0d6487ea	xenia-kernel: HLE expansion, scheduler integration, audit + UI bridge Major HLE buildout in exports.rs: KeInitializeSemaphore now seeds count/limit, XexGet{Module,Procedure}Address use distinct HMODULE_XBOXKRNL/HMODULE_XAM pseudo-handles with a reverse (ModuleId,ordinal)→thunk_addr map, plus sweeping additions across sync primitives, file I/O, semaphores, events, threads, and allocator paths needed to advance Sylpheed past VdSwap=2. New modules: - thread.rs — ThreadRef + per-thread suspension/wake plumbing - interrupts.rs — IRQ delivery, pending-IRQ slots, IPI helpers - path.rs — guest path normalization (D:\\, game:\\, etc.) - audit.rs — --trace-handles harness backing the handle audit - ui_bridge.rs — kernel-side endpoint of the xenia-ui bridge (input snapshots, framebuffer publish handles) state.rs grows to own the HW-slot scheduler state, the new audit / UI bridge handles, and the per-handle reverse maps. xam.rs and objects.rs follow suit for the HLE additions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:29:00 +02:00
MechaCat02	f1fadb5398	xenia-vfs/xex: cache full disc tree; instrument XEX load DiscImageDevice now walks the GDFX tree at open() and caches every file/dir entry by full relative path; the previous root-only scan returned ENOENT for any path under a subdirectory (dat/tables.pak, media/x.wav). Lookups become O(n) over the cached vec. xex::load_image gains a tracing span plus per-load metrics (xex.load_image_ms histogram, xex.bytes_{in,out} counters) so the observability subscriber the app installs can see decompression cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:28:32 +02:00
MechaCat02	45e15d7885	xenia-analysis: unify disasm via xenia-cpu, split ingest/analyze, add sinks The old src/ppc.rs that re-implemented PPC formatting collapses into a 30-line shim that delegates to xenia-cpu's single-source-of-truth disasm. A new disasm.rs wraps the shared iterator and feeds enriched items (analysis context: function membership, xrefs, mnemonics) into pluggable sinks. Sinks split: text.rs (objdump-like output), json.rs (JSONL stream matching the new xenia dis --json mode), duckdb.rs (the analysis DB ingest). db.rs is restructured into ingest_instructions + write_analysis_results so a run can stop after raw ingest, and a new target_hex column lands on the instructions table. sql_views.rs adds five additive views layered on top of the raw tables. Tests: assert-based JSON-fixture goldens (disasm_goldens) and a PRAGMA-table_info schema golden (db_schema_golden) covering all ingested tables and the SQL views. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:28:06 +02:00
MechaCat02	c36cca14f9	xenia-cpu: VMX128, FPSCR, decoder split, scheduler, decode/block caches Split the monolithic interpreter into cohesive modules: dedicated decoder (decoder.rs) producing 8-byte DecodedInstr; opcode tables (opcode.rs); explicit traps (trap.rs); FPSCR helpers (fpscr.rs); overflow/carry helpers (overflow.rs); a 4 KiB-page-versioned decode cache and basic-block cache (block_cache.rs); and a full VMX/VMX128 implementation (vmx.rs) covering AltiVec + Xenon's 128-bit extensions. Add the parallel-execution substrate behind --parallel: a 7-party phaser (phaser.rs) for round-based barrier sync, ReservationTable (reservation.rs) for guest LL/SC, and the per-HW-thread scheduler core (scheduler.rs) that owns ThreadRefs, runqueues, and pending IRQs. Disassembler is now the single source of truth: disasm.rs gains the full base + extended + VMX128 mnemonic set, with golden JSON fixtures and a disasm_goldens test suite. Add a criterion-style interpreter bench. context.rs grows the per-thread state the new modules need (reservation slot, FPSCR, vector regs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:27:43 +02:00
MechaCat02	e9b2b57a44	xenia-memory: interior-mutable writes, page versioning, fenced ops Re-shape MemoryAccess so write methods take &self and rely on interior mutability (atomics in GuestMemory, Cell in test mocks). This unblocks the &Arc<KernelState>-only execution model the CPU/HLE crates moved to. GuestMemory grows: per-4 KiB-page write-version counter (page_version) that the CPU's decode cache and the texture cache observe via Acquire, fenced 32-bit/64-bit read/write helpers (Release on writer / Acquire on reader) that PM4_EVENT_WRITE_SHD and the matching CPU consumers use to synchronize fence publication, and broader page-table / heap accounting needed by the new HLE allocators. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:27:13 +02:00
MechaCat02	e2b8860e10	Add xenia-ui crate; switch analysis store to DuckDB Workspace gains a new xenia-ui member that owns the winit/wgpu window, the Xenos display pipeline (xenos_pipeline + render + texture_cache_host), HUD font/blit shaders, and the input-bridge plumbing the app uses to surface guest framebuffers and overlays. Workspace dependencies grow accordingly: rusqlite is replaced with duckdb (analysis pipeline now writes DuckDB stores), and tracing / metrics / pprof / winit / wgpu / gilrs / pollster / crossbeam / bytemuck are added at workspace level so xenia-ui and xenia-app share versions. Cargo.lock regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:26:48 +02:00
MechaCat02	c694bb3f43	Initial commit: xenia-rs workspace for Xbox 360 RE Rust reimplementation of the xenia Xbox 360 emulator targeting reverse- engineering and preservation, initially scoped to Project Sylpheed. Includes: - XEX2 loader (LZX decompression, AES decryption, PE parsing) - XISO / XGD2 disc image VFS - PPC interpreter with 200+ opcodes and VMX128 decoding - Static analyzer: functions, cross-references, labels, asm + SQLite output - HLE kernel covering the xboxkrnl/xam subset used by Sylpheed init - Debugger with in-memory and SQLite-backed execution tracing - `xenia-rs` CLI with extract/dis/exec commands that produce cumulative, superset SQLite databases and opt-in instruction/import/branch traces Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-16 23:14:56 +02:00

1 2

74 Commits