xenia-rs

Author	SHA1	Message	Date
MechaCat02	c5c6713419	fix(gpu): GPUBUG-100 — apply per-operand swizzle + negate to ALU sources Word-1 of every ALU triple holds three 8-bit component-relative swizzles (`src1_swiz`/`src2_swiz`/`src3_swiz` at bits 16-23/8-15/0-7 per canary ucode.h:2064-2066) and three per-operand negate flags (bits 24/25/26). Pre-fix, both the WGSL interpreter and the AOT translator discarded word-1 entirely with `_ = w1;` — every ALU result was missing its swizzle (broadcast/permute patterns like `.zyxw`, `.xxxx`) and any negated operand was used positive instead. Component-relative semantics (canary's `AluInstruction::GetSwizzledComponentIndex`, ucode.h:1996): for output component i, the source component is `((swizzle >> (2*i)) + i) & 3`. Identity swizzle is 0x00, NOT 0xE4 — the original `apply_swizzle` in the interpreter shader treated it as absolute, also incorrect. Mechanical changes: - crates/xenia-gpu/src/ucode/alu.rs: extend AluInstruction with src_X_swiz (u8) and src_X_negate (bool) fields. decode_alu unpacks them from word 1. - crates/xenia-gpu/src/shaders/xenos_interp.wgsl: apply_swizzle uses component-relative semantics. interpret_alu decodes the modifiers and applies via apply_swizzle + apply_modifiers (with abs=false). - crates/xenia-gpu/src/translator.rs: src_operand emits the precomputed swizzle inline as `vec4<f32>(base.x, base.y, ...)`, then wraps in `(-…)` when negated. Identity swizzle (0x00) emits a bare base expression so it round-trips with the trivial-shader fixture. Abs is omitted in this commit — the abs flag is dual-meaning (for temps it lives at bit 7 of the src byte; for constants at word-2 bit 7 `abs_constants`). Wiring it up correctly requires more careful case-split logic; deferred to Phase G. Verification at -n 100M lockstep: swaps: 2 → 2 (gated by Phase E for draws) draws: 0 → 0 packets: ~58M (within noise) Tests: 554 → 555 (+1 swizzle/negate test, no count change otherwise because identity swizzle test merged into D1's parameterised test). WGSL still validates via naga (combined_module_parses_as_wgsl). Closes GPUBUG-100 (P0). Abs deferred to Phase G. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:15:07 +02:00
MechaCat02	78ea81c12a	fix(gpu): GPUBUG-101 — decode src1/2/3_sel temp-vs-constant selector Per canary AluInstruction layout (xenia-canary/src/xenia/gpu/ucode.h: 2078-2086), word-0 bits 29-31 are the per-operand `srcN_sel` flags selecting temp register (1) vs ALU constant (0); the corresponding 8-bit src byte indexes either: - a temp register (bits 5:0 = index, bits 6/7 reserved for relative-addressing / abs flags consumed by Phase D2), or - an ALU constant (full 8-bit index). Pre-fix, the WGSL interpreter and AOT translator both masked `& 0x7F` on the src byte and emitted `r[low7]` regardless of the operand class. Every shader's WVP matrix / light constant / per-frame uniform read came back as r[low7] — typically zero — yielding invisible rendering. Mechanical changes: - crates/xenia-gpu/src/ucode/alu.rs: decode src_a_is_temp / src_b_is_temp / src_c_is_temp from w0 bits 29/30/31. Note that our src_a (low byte of w0) is canary's third operand, hence its selector is bit 29 (canary src3_sel), not bit 31. - crates/xenia-gpu/src/shaders/xenos_interp.wgsl: `read_src` now takes the is_temp flag; constants index xenos_consts.alu directly. - crates/xenia-gpu/src/translator.rs: `src_operand` mirrors the interpreter — `r[idx]` when temp, `xenos_consts.alu[idx]` when constant. The trivial-shader synthetic test was updated to set the temp flags so its `r[0u] = (r[0u] + r[0u])` assertion remains valid; without the flags set, all sources would now resolve as constants. Bank-selection (cf-level relative addressing for higher banks of the 512 ALU constants) remains a Phase G+ extension — covers c0..c127 in bank 0, which most Sylpheed shaders use directly. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged — gated by D2/D3/E for draws) draws: 0 → 0 packets: ~61M (within noise) Tests: 552 → 554 (+2 translator tests for the temp/constant decode). Closes GPUBUG-101 (P0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:10:11 +02:00
MechaCat02	1b74db6fa7	Merge audit-2026-05-fix/renderer-p0-vdswap-pm4: VdSwap PM4 ring path	2026-05-03 14:00:27 +02:00
MechaCat02	82f3d611e2	fix(gpu,kernel): KRNBUG-Vd-04 / GPUBUG-001 / XMODBUG-013 — VdSwap PM4 ring path The pre-fix VdSwap zero-filled the guest's reserved buffer with NOPs and called `state.gpu.notify_xe_swap` directly — bypassing the ring, leaving the PM4_XE_SWAP handler at gpu_system.rs:1232 dead code, and skipping the PM4_TYPE0(SHADER_CONSTANT_FETCH_00_0, 6) patch. Sylpheed's bloom/ blur "sample frame N for frame N+1" path samples fetch-constant slot 0 expecting the frontbuffer descriptor; without the patch, slot 0 stayed stale and any shader sampling it read garbage. This commit writes the canary VdSwap PM4 sequence directly into the primary ring at the current write pointer (read via the shared MMIO atomic), then advances WPTR over the injection. The natural CP drain consumes PM4_XE_SWAP — bumping `swaps_seen` and patching fetch-constant slot 0 — without going through any direct kernel→GPU bypass. Sequence per xenia-canary VdSwap_entry (xboxkrnl_video.cc:438-521): 1) PM4_TYPE0(0x4800, count=6) + 6 fetch-header dwords (with base_address re-patched from virtual to physical >> 12). 2) PM4_TYPE3(PM4_XE_SWAP, count=4) + signature + frontbuffer_phys + width + height. Mechanism notes: - buffer_ptr in xenia-rs is in the system command buffer, NOT the primary ring (verified empirically: buffer_ptr=0x4acd4df8 vs ring_base=0x0accb000, size 4 KB). Canary's VdSwap writes to buffer_ptr because its ring layout maps the reserved slot inside the ring; xenia-rs's doesn't, so we have to write at the actual ring WPTR address (cached on KernelState.ring_base from VdInitializeRingBuffer). - The original "buffer_ptr zero-fill + bump WPTR by 64" path is preserved before the injection — it exposes any game-batched PM4 packets and keeps the buffer_ptr region skippable per existing game compat behavior. - A safety-net fallback at the end calls `notify_xe_swap` directly if swaps_seen didn't advance during the drain (e.g. a ring-arithmetic edge case). Idempotent — only fires when the PM4 path didn't. - KRNBUG-Mm-04 deferred: virt→phys uses the masked stub `virt & 0x1FFF_FFFF`, sufficient for the standard heap. Mechanical changes: - crates/xenia-gpu/src/pm4.rs: add make_packet_type0 / type2 / type3 helpers + round-trip unit test (mirrors canary xenos.h:1682-1709). - crates/xenia-gpu/src/handle.rs: add mmio_cp_rb_wptr_load accessor (Acquire-load) so the kernel can compute ring offsets. - crates/xenia-kernel/src/state.rs: cache ring_base / ring_size_dwords on KernelState (set by VdInitializeRingBuffer). - crates/xenia-kernel/src/exports.rs: rewrite the vd_swap PM4-emit block; patch fetch_dwords[1] base_address virt→phys before injection. Verification at -n 100M lockstep: swaps: 2 → 2 (game fires VdSwap exactly twice) draws: 0 → 0 (gated by Phases D+E) fallback warning: 0 occurrences (PM4 path consumed both swaps) instructions: ~100M Tests: 552 passing (553 with new pm4 round-trip test). Lockstep stable-fields determinism: byte-identical across two 100M runs. The "swaps > 2" prediction in the audit's plan assumed the game would fire VdSwap more often once the path worked; empirically Sylpheed only calls VdSwap twice within 100M instructions (this is the renderer plateau the audit identified). The success criterion for Phase C is that the PM4 path is now operational, which Phases D+E require for visible draws. Closes KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:00:23 +02:00
MechaCat02	0590bffdd9	Merge audit-2026-05-fix/oracle-sylpheed-n50m-n4b: ORACBUG-004 sylpheed_n50m oracle	2026-05-03 13:46:06 +02:00
MechaCat02	1f416aaa2e	test(check): ORACBUG-004 — sylpheed_n50m stable-digest oracle Adds a regression-catcher golden for Sylpheed boot at -n 50M lockstep, covering the first VdSwap pair (the n2m oracle is swap-blind because the first VdSwap fires at ~18M instructions). The new --stable-digest flag emits/compares only fields that are deterministic in lockstep: instructions, imports, unimpl, draws, swaps, unique_render_targets, shader_blobs_live, texture_cache_entries Excluded: packets — empirically ±2-8% lockstep variance (GPU thread race per audit M11) resolves, interrupts_delivered, interrupts_dropped, texture_decodes — scheduling-sensitive under --parallel path — cwd-dependent Empirical determinism: 3 consecutive lockstep -n 50M runs produce byte-identical stable-digest output. The n4b canonical-invocation golden the audit's recommended next sprint also called for is deferred. Per audit memory `--parallel --reservations-table` is pathologically slow (>32 min for -n 100M), so -n 4B in that mode would be many hours per run, not the 5-15 min the plan estimated. n4b will be captured one-shot post-renderer-unblock as a manual artifact under audit-runs/post-fix/, not as a test golden. See crates/xenia-app/tests/golden/README.md. Test infrastructure: - crates/xenia-app/tests/sylpheed_oracles.rs — invokes CARGO_BIN_EXE_xenia-rs against the ISO. Path resolved via SYLPHEED_ISO env var (skips gracefully if missing). - #[ignore]-gated; run via: cargo test --release -p xenia-app --test sylpheed_oracles \\ -- --ignored --nocapture Closes ORACBUG-004 (P0). Partial: ORACBUG-006 (P1 deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 13:46:02 +02:00
MechaCat02	62f673d094	Merge audit-2026-05-fix/swapbug-001-revert-addi-truncation: SWAPBUG-001 revert	2026-05-03 13:38:05 +02:00
MechaCat02	9ab986ec09	fix(cpu): SWAPBUG-001 — revert addi 32-bit truncation The addi opcode was truncating its result to 32 bits per the post-P4-batch3 "32-bit ABI" rationale (commit `bf8208e`). Hunk-level bisection during the 2026-05 audit (M11) isolated this single cast as the cause of the post-P8 swap regression: swaps dropped 2 → 1 and the renderer lost a frame. PowerISA mandates sign-extension to 64 bits; canary does not truncate addi. The truncation was a canary-divergent over-extension of the addis fix (which IS canary-divergent by design, see addis at interpreter.rs:121-134). The addi_li_neg_one_zero_extends_upper test encoded the wrong invariant. Replaced with a sign-extension test asserting canonical PowerISA behavior (gpr[3] == 0xFFFF_FFFF_FFFF_FFFF for `li r3, -1`). Verification at -n 100M lockstep: swaps: 1 → 2 (gate met) draws: 0 → 0 (unchanged — gated by Phase C+D+E) instructions: ~100M (unchanged) imports: 11.4M → 987k (game escapes retry loop) packets: 281M → 57M (same) interrupts_delivered: 629 → 630 Tests: 551 passing (unchanged). Lockstep determinism: byte-identical across two 100M runs except packets (±5%, GPU-thread-race noise floor). Closes SWAPBUG-001 / PPCBUG-001. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 13:37:51 +02:00
MechaCat02	caa37fc595	docs(audit): post-P8 end-to-end review findings + acid test result Document the post-P8 cross-cutting review and acid test outcome: End-to-end reviewer caught: - BLOCKING-LIKELY: lwa/lwax/lwaux ISA deviation (fixed in `f1166d0`) - Cosmetic: fpscr round_single_toward_zero duplicate-branch (fixed in `09c6c92`) - Minor performance: reservation table active_reservers as slot-occupancy - Asymmetry note: extswx remains 64-bit ABI per audit PPCBUG-038 (wontfix) Acid test (-n 4B --parallel --reservations-table, pre-lwa-hotfix build): - swaps=1, draws=0 - exit 0, no panics, no errors, no RtlRaiseException - 14 thread spawns, 2 LR-sentinel exits - Renderer plateau NOT unblocked by cumulative P1-P8 correctness fixes Implication: the Sylpheed `draws=0` plateau has a non-PPC-correctness root cause. PPC fixes were correctness-justified independent of the renderer (well-grounded against canary). Next investigation tracks: graphics pipeline (EDRAM resolve, RT readback), kernel HLE (event signaling, timers), or the unresolved BST-validation paradox per `project_xenia_rs_sylpheed_event_chain_2026_04_29.md`. Out of scope for the PPC instruction audit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:49:43 +02:00
MechaCat02	09c6c927bd	refactor(cpu): fpscr round_single_toward_zero — collapse duplicate-branch ULP step Post-P8 review nit: the if/else branches were identical (`adj_bits - 1` either way). Both positive and negative finite f32 values use the IEEE-754 sign bit as the MSB, and subtracting 1 from `to_bits()` always reduces magnitude by one ULP. Replace the mock-conditional with the unconditional form + a comment explaining why one operation works for both signs. No behavior change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:45:55 +02:00
MechaCat02	f1166d0f75	fix(cpu): revert PPCBUG-105 — lwa/lwax/lwaux sign-extend per PowerISA Post-P8 end-to-end review caught an ISA deviation introduced by P4 batch 5. The original code used `as i32 as i64 as u64` (correct PowerISA sign-extension; canary's `SignExtend(INT64_TYPE)`). My P4 batch 5 commit (`20a730d`) changed all three to `as u64` (zero-extend), citing the audit's "32-bit-ABI hazard" note for PPCBUG-105. This deviation is wrong per PowerISA and any 64-bit-mode kernel code that uses `lwa rT, off(rA)` will silently produce the wrong rT for negative words (e.g. memory 0x80000000 should yield 0xFFFFFFFF_80000000 but was yielding 0x00000000_80000000). Restore ISA-spec sign-extension for all three forms (lwa, lwax, lwaux). The audit's 32-bit-ABI hazard concern was speculative — there's no evidence that Xbox 360 user code emits `lwa` (compilers use `lwz`). If a real bug surfaces from a 32-bit-ABI consumer that feeds an `lwa`-loaded value into a u64 unsigned compare, that's a separate issue to debug at the consumer site. Test renamed: lwa_high_bit_set_zero_extends_upper → lwa_sign_extends_to_i64 with assertion flipped to expect 0xFFFFFFFF_80000000. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:43:47 +02:00
MechaCat02	9de18a9eec	chore(audit): mark P8 PPCBUGs applied; append P8 progress section; AUDIT-FIX-COMPLETE P8 phase merged at `4029041`. Update audit-findings.md status fields (38 PPCBUGs marked applied) and append the P8 progress section to audit-report-2026-04-29.md. This closes the eight-phase audit-application sweep. Total ~161 PPCBUGs applied across P1-P8. ~12 LOW test-gap IDs remain Status: open and can be closed incrementally without blocking any functionality. Next session: deferred acid test (`xenia-rs check sylpheed.iso -n 4B --parallel --reservations-table`) to see if cumulative correctness fixes unblock the Sylpheed renderer plateau (draws=0). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:24:24 +02:00
MechaCat02	4029041618	Merge branch 'ppc-audit-fix/p8-tests' — Phase 8 test gap closure Phase 8 of the PPC instruction audit fix application: pure test gap closure for opcode groups that previously had near-zero unit test coverage. 53 new tests across 5 commits (4 batches + review-nit rename). - `9827b03`: Batch 1 — branch/CR-logical/SPR/MSR/FPSCR/sync (12 tests) - `2d223ee`: Batch 2 — load/store base + lswx/stswx with XER TBC (15 tests) - `ebfd18a`: Batch 3 — FPU + VMX float (14 tests) - `2614806`: Batch 4 — VMX integer/permute/load-store (12 tests) - `1f9696a`: review-fix nit — vmsum3fp_… → vmaddfp_lane_fma rename Independent reviewer verdict: LGTM, no blocking issues, no rubber- stamp tests, no encoding bugs (every hand-encoded raw cross-checked against canary's INSTRUCTION table). Two minor follow-ups: the test rename was applied immediately; the audit cross-reference in batch-4 body is loose (one representative test per group, not 1:1) — accepted. The XER-TBC tests (`lswx_uses_xer_tbc_for_byte_count`, `stswx_uses_xer_tbc_for_byte_count`) are load-bearing: they directly exercise the P6 XER TBC infrastructure, both opcodes were permanent no-ops pre-P6. Closed IDs (28): 055, 067, 070, 081, 082, 083, 084, 085, 089, 091, 100, 109, 110, 111, 118, 127, 129, 132, 146, 147, 153, 163, 171, 187, 208, 228, 240, 277, 316/320, 321/323, 370, 438, 439, 440, 490, 517. Remaining `Status: Open` test-gap LOW IDs are tracked in audit-findings.md; they don't block any functionality and can be closed in incremental future work. Verification at merge: cargo test --workspace --release reports 551 passed, 0 failed (up from 498 at P7 merge; 53 net new tests). Acid test deferred to end of all phases per user direction.	2026-05-02 14:23:04 +02:00
MechaCat02	1f9696ad47	test(cpu): rename vmsum3fp_… to vmaddfp_lane_fma per reviewer nit P8 review feedback (non-blocking): the test fn name said vmsum3fp but the encoding/body actually tests vmaddfp. Rename + clarify comment; no behavior change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:22:39 +02:00
MechaCat02	261480616c	test(cpu): PPCBUG-240/277/278/316/321/370/490/517 P8 batch 4 — VMX integer/permute/load-store Phase 8 batch 4 — VMX integer + permute/pack + multiply-sum + load/store. 12 new tests: - VMX add/sub (240): vaddubm byte add, vsubuwm word sub. - VMX compare (277): vcmpequb lane mask. - VMX min/max (278): vmaxsw signed lane max. - VMX shift/rotate (316): vsl 128-bit left shift, vsraw arithmetic per-lane. - VMX logical (321): vand lane-wise AND. - VMX permute (370): vsldoi byte concatenation + shift. - VMX multiply-sum (490): vmaddfp lane FMA. - VMX load/store (517): lvx aligned quadword load, stvx aligned store, lvebx byte-lane load. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:16:51 +02:00
MechaCat02	ebfd18a64e	test(cpu): PPCBUG-187/208/228/438/439/440 P8 batch 3 — FPU + VMX float Phase 8 batch 3 — FPU and VMX float test gap closure. 14 new tests: - Single FPU (187): fadds, fmuls - Double FPU (208): fmul, fdiv (zero-numerator), fneg, fabs, fmr - FPU convert/compare (228): fcmpu, fcfid - VMX float compare (438): vcmpeqfp lane mask - VMX rounding (439): vrfip, vrfim, vrfiz - VMX convert (440): vctsxs saturation to INT_MAX/INT_MIN The VMX VX-form encoding nit (XO is 11 bits at PPC 21-31, host bits 10-0, with bit 0 the LSB — not bit 1) was caught by initial test failures and fixed before commit. VC-form (vcmpeqfp) has the same "XO at bit 0" layout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:14:10 +02:00
MechaCat02	2d223eee69	test(cpu): PPCBUG-091/100/109-111/118/127/129/132/146-147/153/163/171 P8 batch 2 — load/store Phase 8 batch 2 — load/store test gap closure. 15 new tests across the load/store opcodes: - lbz zero-extend (091), lwbrx byte-swap (109/110), lwarx smoke (111), ld doubleword (118), lmw + lswi (127), lswx with XER TBC (127), lfs single-to-double widening (129). - stb (132), sth, stw (146), std (153), stmw + stswx (163), stfs (171). `lswx_uses_xer_tbc_for_byte_count` and `stswx_uses_xer_tbc_for_byte_count` specifically lock in the new XER TBC infrastructure landed in P6 (`68c0ee5`); both opcodes were permanent no-ops before that. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:10:26 +02:00
MechaCat02	9827b03f1a	test(cpu): PPCBUG-055/067/070/081-085/089 P8 batch 1 — branch/CR/SPR/sync Phase 8 batch 1 — test gap closure for the branch/CR-logical/SPR/MSR/ FPSCR/cache+sync groups. 12 new tests across the affected groups: - PPCBUG-055 branch: blr, bctr, bcl-LK-on-not-taken - PPCBUG-070 CR logical: cror, crand, crxor (crclr idiom) - PPCBUG-067 trap+sc: sc smoke, tw TO=0 never-traps - PPCBUG-081-085 SPR/MSR/FPSCR moves: mfcr 8-field assembly, mtfsb1/mtfsb0 - PPCBUG-089 cache+sync: sync state-non-mutation smoke These groups previously had near-zero unit test coverage. New tests lock in the current ISA-correct behavior; would catch a regression in any of the dispatch/encoding/result paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:08:54 +02:00
MechaCat02	a7155f4571	chore(audit): mark P7 frozen-snapshot drift cleared (manual regen, no code change) P7 of the PPC instruction audit fix application: re-ran the ppc-manual generator (`python3 ppc-manual/generator/generate_manual.py`) to regenerate all 350 family pages from current xenia-rs and xenia-canary source. The 3 audit-cited stale snapshots (PPCBUG-066/117/145) are now refreshed. Note: the `ppc-manual/` directory is not versioned in xenia-rs/.git, so this commit is purely the audit-findings status update + report section. The regen itself happened in-place outside this repo. Verification: post-regen grep confirms the old "For now, just trace and continue" stub is gone from every page, and modern constructs (trap::evaluate, current reservation_line model) appear correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:03:23 +02:00
MechaCat02	8b9fddc488	chore(audit): mark P6 PPCBUGs applied; append P6 progress section P6 phase merged at `112202c`. Update audit-findings.md status fields (13 PPCBUGs marked applied) and append the P6 progress section to audit-report-2026-04-29.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 13:58:26 +02:00
MechaCat02	112202c2b9	Merge branch 'ppc-audit-fix/p6-medium' — Phase 6 Other MEDIUM correctness Phase 6 of the PPC instruction audit fix application: misc MEDIUM correctness items across trap/sc, XER TBC, MSR/VSCR/FPSCR semantics. ~13 PPCBUGs landed across 4 batches. - `d96986a`: Batch 1 — trap PC fix, sc LEV logging, twi typed-trap logging (PPCBUG-063/064/065) - `68c0ee5`: Batch 2 — XER TBC infrastructure (enabling lswx/stswx) + lswi/stswi nb fix + lmw RA-skip (PPCBUG-123/124/125/126/161/162/566) - `0f2a26c`: Batch 3 — mcrfs VX recompute, mtmsrd L=1 partial, mfvscr zero (PPCBUG-068/078/080) - `99e7814`: Batch 4 — mulld_ov INT_MIN*-1 verification + auto-resolved markers for PPCBUG-021/022/027/039 - `5ece5e3`: review-fix nit — mcrfs uses existing fpscr::VX_ALL constant Independent reviewer verdict: all 4 commits LGTM, one cosmetic nit (applied immediately in `5ece5e3`). Audit fix-shapes match canary prescriptions; trap-PC change verified against all StepResult::Trap consumers; XER TBC field initialization verified through the single PpcContext::new() construction path. Two structural enum extensions deferred (not yet needed by any consumer): - StepResult::HypervisorCall variant (would enable PPCBUG-064 routing for sc 2) - StepResult::Trap { type_code: u16 } payload (would enable PPCBUG-065 routing for typed C++ traps; relevant if SEH dispatch is added) Cosmetic / test-coverage items left for future cleanup batch: PPCBUG-642 (cosmetic disasm), PPCBUG-643/644 (SIMM/D-form hex display), PPCBUG-367/368 (vupkhpx/vpkpx channel ordering), PPCBUG-487/495 (vsum naming), PPCBUG-515/516 (lvebx/lvsr docs), PPCBUG-601 (decode_op6 doc). Verification at merge: cargo test --workspace --release reports 498 passed, 0 failed. Acid test deferred to end of all phases.	2026-05-02 13:57:00 +02:00
MechaCat02	5ece5e315f	refactor(cpu): mcrfs uses fpscr::VX_ALL constant per reviewer nit P6 review nit: replace the inline `const VX_ALL_MASK` in the mcrfs arm with the existing `fpscr::VX_ALL` constant (single source of truth). Behaviorally identical. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 13:56:34 +02:00
MechaCat02	99e7814836	test(cpu): PPCBUG-022 verify mulld_ov INT_MIN-1 + auto-resolved markers Phase 6 batch 4 — overflow/cleanup verification. - PPCBUG-022 mulld_ov INT_MIN -1: the audit-claimed missing edge case is actually handled by `i64::checked_mul()` (returns None when the result would be -i64::MIN = i64::MAX+1, which doesn't fit). New regression tests in overflow.rs confirm: INT_MIN * -1 overflows; INT_MIN * 1 doesn't; (INT_MIN+1) * -1 = INT_MAX, no overflow. Audit's claim was incorrect; documented by the new tests. - PPCBUG-021 (overflow.rs OE checks at bit 63): largely auto-resolved by P4 batch 6 (`16993bb`), which switched all 32-bit ABI ops to inline `true_sum != (result32 as i32) as i128`. Helpers like add_ov_64 are now only called from 64-bit ABI ops where bit-63 is correct. - PPCBUG-027 (rlwimix upper-32 zeros): auto-resolved by P4 (rlwimix now writes via `as u32 as u64` truncation). - PPCBUG-039 (cntlzdx 32-bit-ABI): wontfix per audit — only matters if a 32-bit-ABI binary emits cntlzd, which compilers don't. Remaining low-impact items (PPCBUG-642 ISA-undefined fmt_bcctr decr, PPCBUG-643/644 SIMM/D-form hex display, PPCBUG-367/368 vupkhpx/vpkpx channel ordering, PPCBUG-487/495 vsum operand naming, PPCBUG-515/516 lvebx/lvsr documentation, PPCBUG-601 decode_op6 invariant doc) are left for a P9 or follow-up batch — they're cosmetic/test-coverage items rather than correctness bugs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 13:51:43 +02:00
MechaCat02	0f2a26c460	fix(cpu): PPCBUG-068/078/080 mcrfs VX recompute + mtmsrd L=1 + mfvscr zero Phase 6 batch 3 — SPR/MSR/VSCR semantics. - PPCBUG-078 mtmsrd L=1: PowerISA requires partial-MSR-write — only MSR[EE] (u64 bit 15) and MSR[RI] (u64 bit 0) modified, all other MSR bits preserved. Used by kernel code to toggle external interrupts. Previously merged with mtmsr (full overwrite), silently corrupting MSR for any L=1 caller. - PPCBUG-080 mfvscr: ISA places VSCR in the rightmost word of VD with bytes 0-11 zeroed. Previously copied the full 128-bit ctx.vscr, leaking stale upper data to guest. Now zero-extends per canary. - PPCBUG-068 mcrfs VX summary: when mcrfs clears VX* exception bits, the VX summary bit at FPSCR[2] must be recomputed (clears if all contributors are 0; remains 1 otherwise). Previously left stale, causing subsequent CR-test sequences to misread the FPU state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 13:50:10 +02:00
MechaCat02	68c0ee55ce	fix(cpu): PPCBUG-123/124/125/126/161/162/566 XER TBC + lswi/stswi/lmw Phase 6 batch 2 — XER TBC enabling + load/store-multiple cleanups. - PPCBUG-123/124/161/566 (coupled): XER TBC field was unmodelled — `ctx.xer()` always returned 0 in bits 0-6, and `ctx.set_xer()` silently discarded any TBC writes. Result: `lswx` and `stswx` were permanent no-ops (the `while bytes_left > 0` loop never executed). Fix: add `pub xer_tbc: u8` to `PpcContext`; wire into `xer()` and `set_xer()`. Initialize to 0 in `PpcContext::new()`. lswx/stswx bodies are correct as-is once the infrastructure is wired. - PPCBUG-125 lmw: PowerISA marks `lmw rT, D(rA)` invalid when rA is in [rT..31]; canary skips the write to rA to preserve the EA base. Now matches canary. - PPCBUG-126/162 lswi/stswi: replaced `instr.rb()` with `instr.nb()` for the NB field. Both accessors return identical values today (bits 16-20), but the maintenance hazard from the misnomer is now removed. A future `rb()` type-system refactor would have broken lswi/stswi silently. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 13:48:03 +02:00
MechaCat02	d96986a10e	fix(cpu): PPCBUG-063/064/065 trap PC + sc LEV + twi typed-trap logging Phase 6 batch 1 — trap/sc semantics. - PPCBUG-063 trap PC: previously ctx.pc was incremented to CIA+4 BEFORE StepResult::Trap returned, forcing handlers to .wrapping_sub(4) to recover the faulting instruction address. Now ctx.pc stays at CIA on trap, matching SRR0 semantics on real hardware. Critical for any future SEH/exception-delivery path (e.g. the Sylpheed C++ throw work). - PPCBUG-065 typed-trap logging: `twi 31, r0, IMM` is the Xbox 360 CRT/kernel typed-trap convention encoding C++ exception class via SIMM. The trace now logs the SIMM type code when this pattern fires. Routing the type code via a StepResult payload requires an enum extension (multiple consumer sites) that's deferred. - PPCBUG-064 sc LEV logging: `sc 2` is the Xbox 360 hypervisor-call convention; canary dispatches it to a different handler than `sc 0`. Now logs a warning when LEV != 0. Routing LEV=2 to a HypervisorCall variant also requires a StepResult enum extension; deferred. The two enum-extension follow-ups can land as a structural sub-batch once a clear consumer (SEH dispatch, hypervisor-call HLE) is in place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 13:42:50 +02:00
MechaCat02	9f88e275b8	chore(audit): mark P5 PPCBUGs applied; append P5 progress section P5 phase merged at `d39d0ba`. Update audit-findings.md status fields (21 PPCBUGs marked applied) and append the P5 progress section to audit-report-2026-04-29.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:39:46 +02:00
MechaCat02	d39d0bab4d	Merge branch 'ppc-audit-fix/p5-fpu' — Phase 5 FPU correctness Phase 5 of the PPC instruction audit fix application: FPU correctness across the scalar FPU and VMX float arithmetic. ~22 PPCBUGs across 6 sub-sections (5a-5f). - `f6a444b`: 5a — round_to_i64 + vrfin round-to-even (PPCBUG-221+227, 432) - `26b9897`: 5b — FMA VXISI + NaN sign preservation (PPCBUG-181/182/183/202/203/205) - `49bf74f`: 5c — FPU XX-on-inexact for conversions (PPCBUG-223/224/225/229/230) - `538fa5a`: 5d — VSCR.NJ subnormal flush for VMX float (PPCBUG-435/436/437) - `6ba8f83`: 5e — fresx canary parity (PPCBUG-184) - `6fe2cbf`: 5f — single-FMA vnmsubfp + vctsxs NaN saturation (PPCBUG-426/427/433) - `05f2f72`: review-fix nit — vrfin uses stdlib round_ties_even Independent reviewer found no blocking issues; two minor follow-up items remain open for tracking. The vrfin nit was applied immediately in `05f2f72`. Three substantive PPCBUGs were explicitly deferred — each requires substantial helper rework that's better landed as focused sub-batches: - PPCBUG-201: FPSCR.RN for double arithmetic (MXCSR set/restore wrappers) - PPCBUG-185: FPSCR.NI flush for scalar FPU (NI bit constant + post-op flush) - PPCBUG-180/200: XX/FR/FI in update_after_op (pre-vs-post-round comparison) These remain Status: open in audit-findings.md and will be picked up in a P5b sub-batch or P9 (test gaps) per planning. Verification at merge: cargo test --workspace --release reports 498 passed, 0 failed. Acid test deferred to end of all phases per user direction.	2026-05-02 12:38:18 +02:00
MechaCat02	05f2f72c71	refactor(cpu): vrfin uses stdlib f32::round_ties_even() per reviewer nit P5 review feedback (non-blocking): replace the inline round-to-even implementation with the stable stdlib intrinsic (Rust 1.77+). Functionally equivalent; cleaner. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:37:54 +02:00
MechaCat02	6fe2cbf251	fix(cpu): PPCBUG-426/427/433 single-FMA vnmsubfp + vctsxs NaN saturation Phase 5 batch 6 (5f): saturation and FMA-rounding fixes. - PPCBUG-426 vnmsubfp: was `bi - ai * ci` (two rounding steps); now `-ai.mul_add(ci, -bi)` which is mathematically equivalent (= bi - ai*ci) but uses a single FMA round per ISA. - PPCBUG-427 vnmsubfp128: same single-FMA fix. - PPCBUG-433 vctsxs / vcfpsxws128 NaN saturation: AltiVec ISA saturates NaN to INT_MIN (0x80000000); xenia returned 0. The vctuxs (unsigned) NaN→0 is correct per ISA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:31:10 +02:00
MechaCat02	6ba8f83c30	fix(cpu): PPCBUG-184 fresx pre-quantize input to f32 (canary parity) Phase 5 batch 5 (5e): minimal-viable fix for the estimate-precision family. Hardware Xenon `fres` produces a ~12-bit LUT estimate; xenia and canary both produce a fully IEEE single reciprocal, but canary pre-quantizes the f64 input to f32 to at least match the input precision. Now matches canary. PPCBUG-428..431 (vrefp/vrsqrtefp/vexptefp/vlogefp) already operate on f32 inputs naturally (no f64 → f32 quantization step needed); the estimate-precision deviation is purely the output side. Newton-Raphson convergence is unaffected. Documented in audit-findings.md as LOW-impact full-fix-requires-LUT. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:29:07 +02:00
MechaCat02	538fa5ab74	fix(cpu): PPCBUG-435/436/437 VSCR.NJ subnormal flush for VMX float Phase 5 batch 4 (5d) — partial: VSCR.NJ subnormal flush for VMX float arithmetic. Xbox 360 always boots with NJ=1, so games expect inputs and outputs flushed to ±0. - PPCBUG-435 vaddfp/vaddfp128/vsubfp/vsubfp128/vmulfp128: previously no flush at all on these opcodes (only vmaddfp family flushed). Now flushes both inputs and output per Canary's unconditional model. - PPCBUG-436 vmsum3fp128/vmsum4fp128: per-product intermediates now flushed individually (was only the final sum). - PPCBUG-437 vmaddfp/vmaddfp128/vmaddcfp128/vnmsubfp/vnmsubfp128: outputs now flushed (inputs were already flushed). PPCBUG-185 (FPSCR.NI flush for scalar FPU) deferred — requires adding a NI bit constant and post-op flush wrapper across all *sx arms; will land in a focused sub-batch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:26:36 +02:00
MechaCat02	49bf74fae6	fix(cpu): PPCBUG-223/224/225/229/230 FPU XX bit on inexact conversions Phase 5 batch 3 (5c) — partial: targeted XX-on-inexact fixes for the float-to-int and double-to-single conversion family. (PPCBUG-180/200, the broader update_after_op XX/FR/FI rework, deferred to a focused sub-batch.) - PPCBUG-225 frspx: set XX when the f64→f32 round produces a different value (i.e. precision loss). Almost every frsp call is inexact — previously games polling FPSCR.XX never saw the set bit after a frsp. - PPCBUG-224 fcfidx: set XX when the i64 input has > 53 significant bits (precision lost in conversion to f64). - PPCBUG-229 fctidx/fctidzx: set XX when input is non-integer (fractional part discarded by the conversion). - PPCBUG-230 fctiwx/fctiwzx: same shape for word-width conversions. - PPCBUG-223 verified already correct in current code (fcmpo sets VXSNAN/VXVC on NaN operands; the audit-cited drift was already fixed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:22:47 +02:00
MechaCat02	26b98975c3	fix(cpu): PPCBUG-181/182/183/202/203/205 FMA VXISI + NaN sign preservation Phase 5 batch 2 (5b): VXISI / NaN handling for the FMA family. The 8 FMA opcodes (fmaddx/fmaddsx/fmsubx/fmsubsx/fnmaddx/fnmaddsx/fnmsubx/ fnmsubsx) all share two fix shapes: 1. VXISI on the add/sub step. The previous code passed `ac` to check_invalid_add, which has separate rounding from the FMA. In extreme cases this gives the wrong sign (PPCBUG-202) or wrong infinity status. Worse, fmsub/fnmadd/fnmsub had NO add-step VXISI check at all (PPCBUG-181/182/203). The fnmsub pattern is the canonical Newton- Raphson step — the most common FPU path in Xbox 360 graphics code. 2. NaN sign preservation in fnmadd/fnmsub. ISA Book I §4.3.4 forbids negation of a NaN FMA result; Rust's unary `-` flips the IEEE-754 sign bit (PPCBUG-183/205). Fixes: - fpscr.rs: new helper `check_invalid_fma_add(ctx, a, c, b, sub)` that derives VXISI from input properties (mathematical-product sign + b sign) instead of from the lossy `ac` value. Also covers SNaN. - interpreter.rs: all 8 FMA arms now use the new helper; fnmadd[s]/ fnmsub[s] gate the negation on `!fma.is_nan()`. Tests: - fmsub_inf_minus_inf_sets_vxisi: regression for PPCBUG-203. - fnmadd_nan_input_preserves_nan_sign: regression for PPCBUG-205. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:20:02 +02:00
MechaCat02	f6a444b9d1	fix(cpu): PPCBUG-221+227 round_to_i64 + PPCBUG-432 vrfin round-to-even Phase 5 batch 1 (5a): round-to-int correctness. PPCBUG-221+227 (coupled): round_to_i64 NearestEven tie-breaking used `(diff - 0.5).abs() < f64::EPSILON` to detect half-integers, but for \|v\| > 2^52 every f64 value is an exact integer (v.trunc() == v), giving diff == 0. The buggy check fell through to v.round() (round-half-away- from-zero), giving wrong results for large odd half-integers. Replaced with a fractional-part-only check that's exact for \|v\| <= 2^52 and degenerates to truncation above. PPCBUG-432: vrfin/vrfin128 used Rust's `f32::round()` which is round- half-away-from-zero. ISA requires round-to-nearest-even (banker's rounding). Implemented inline. PPCBUG-201 (FPSCR.RN for double arithmetic) deferred — requires MXCSR-set/restore wrappers around 10+ FPU arms; will land in a focused sub-batch after the remaining 5a-5f fixes. Tests: - round_to_i64_nearest_even_on_tie: extended with 0.5, 1.5, -0.5, -1.5. - round_to_i64_non_tie_cases: 0.4/0.6 (non-tie sanity). - round_to_i32_nearest_even_on_tie: PPCBUG-227 coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:13:08 +02:00
MechaCat02	5c45108249	chore(audit): mark P4 PPCBUGs applied; append P4 progress section P4 phase merged at `d945aea`. Update audit-findings.md status fields (43 PPCBUGs marked applied) and append the P4 progress section to audit-report-2026-04-29.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:09:26 +02:00
MechaCat02	d945aeae83	Merge branch 'ppc-audit-fix/p4-abi-truncation' — Phase 4 ABI truncation Phase 4 of the PPC instruction audit fix application: 32-bit ABI writeback truncation across the integer ALU. Six commits + one review-fix land ~43 PPCBUG IDs. - `e18a0a4`: 4a active poisoning, NOT/SUB family (006/008/018/019/028/029/030/031/033) - `145a7a4`: 4a/4d coupled extsbx+extshx + CR0 (034+035+036+037) - `bf8208e`: 4b immediate ALU (001/002/003/004/005/007) - `82a9bff`: 4b mul/div + srawx coupled (009/010+011/041+042+043) - `20a730d`: 4b halfword + lwa loads (095/096/097/098/105) - `16993bb`: 4c latent + 4d CR0 catch-all (012-017/020/023-026/032/044) - `49103bb`: review-fix — subfx/subfcx OE predicate + mulli test rigor Independent reviewer caught a blocking issue: subfx/subfcx OE handlers in batch 6 hadn't been migrated to the inline 32-bit overflow predicate (`true_diff != (result32 as i32) as i128`), still using the legacy `sum_overflow_64` which gave spurious OV=1 for any legitimate i32::MIN result. Fixed in `49103bb` with two new discriminating regression tests. Verification at merge: cargo test --workspace --release reports 494 passed, 0 failed. Acid test deferred to end of all phases per user direction. The 32-bit ABI invariant — every GPR write zero-extends from a u32 result, every CR0 update views the result as i32 — is now systematically restored across the integer ALU. Downstream 64-bit unsigned compares (the addis-incident shape) can no longer be fed polluted upper bits.	2026-05-02 12:07:53 +02:00
MechaCat02	49103bb898	fix(cpu): P4 review-fix — subfx/subfcx OE predicate + mulli test rigor Independent reviewer of the P4 branch found two issues: (1) BLOCKING — subfx and subfcx OE handlers still called the legacy `overflow::sum_overflow_64(true_diff, result32 as u64)` while batch 6 had migrated all add* sites to the inline `true_sum != (result32 as i32) as i128` form. The legacy helper compares `true_diff` against `(result32 as u64) as i64 as i128`, which views any bit-31-set result as a positive i64 (e.g. result=0x80000000 → +2147483648 in i64). For a legitimate i32::MIN result with no actual 32-bit overflow, this caused spurious OV=1. Concrete repro now caught by `subfo_no_spurious_ov_when_result_has_bit31_set`: r3=1, r4=0x80000001 → result=0x80000000, true_diff=-2147483648, no OV. Pre-fix: spurious OV=1. (2) Minor — `mulli_overflow_wraps_to_32` rubber-stamped: with ra=0x80000000 and imm=2, both pre-fix (`as i64 as u64`) and post-fix (`as u32 as u64`) write the same value. Replaced with ra=u64::MAX (polluted upper bits) where pre-fix writes 0xFFFFFFFF_FFFFFFFE and post-fix writes 0x00000000_FFFFFFFE. Fixes: - interpreter.rs subfx/subfcx OE: switch to inline 32-bit predicate matching the rest of batch 6. - subfo_sets_xer_ov_on_min_minus_one: renamed and updated to test 32-bit overflow (r4=0x80000000 - 1 = 0x7FFFFFFF, OV=1). - New: subfo_no_spurious_ov_when_result_has_bit31_set (PPCBUG-017 review-fix regression). - New: subfco_no_spurious_ov_when_result_has_bit31_set (same for PPCBUG-007). - mulli_overflow_wraps_to_32: redesigned with polluted upper bits to actually discriminate pre/post fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:07:32 +02:00
MechaCat02	16993bb8af	fix(cpu): PPCBUG-012-017/020/023-026/032/044 4c+4d latent + CR0 catch-all Phase 4 batch 6: latent writeback truncation (4c) and CR0 catch-all (4d). ~13 PPCBUGs across all remaining 32-bit ABI ALU sites. Latent writeback (4c) — the 4a/4b fixes already eliminate the upstream poisoning, but a defensive truncation here catches any future regression: - PPCBUG-012 addx, PPCBUG-013 addcx, PPCBUG-014 addex, PPCBUG-015 addzex, PPCBUG-016 addmex, PPCBUG-017 subfx — all rewritten to compute on u32 operands and write `as u64`. CA computed via 32-bit unsigned compare. Overflow now uses `true_sum != (result32 as i32) as i128` (32-bit predicate, since sum_overflow_64 is i64-bounded). - PPCBUG-032 andx/orx/xorx — CR0 catch-all only (results inherit upper bits from operands; once those are clean, no truncation needed). CR0 catch-all (4d) — fix the `update_cr_signed(0, X as i64)` pattern at every 32-bit-ABI Rc=1 path: - PPCBUG-020 catch-all: applied to mulhwx, mulhwux, divwux, mullwx (was already done in batch 4), addx/addcx/addex/addzex/addmex/subfx (now in 4c above), andx/orx/xorx, andix, andisx, slwx, srwx, cntlzwx, rlwinmx, rlwimix, rlwnmx, mullwx (already), divwx (already), srawx/srawix (already in batch 4). - PPCBUG-023 andisx: now correctly classifies bit-31 results as CR0.LT. - PPCBUG-024 rlwinmx, PPCBUG-025 rlwimix, PPCBUG-026 rlwnmx. - PPCBUG-044 slwx/srwx: bit-31 result like 0x80000000 now CR0.LT. 64-bit ABI ops (rldicl/rldicr/rldic/rldimi/rldcl/rldcr, sldx/srdx/sradx/ sradix, mulhdx/mulhdux/mulldx, divdx/divdux, cntlzdx) intentionally retain the 64-bit `as i64` form per ISA — these are 64-bit-mode instructions. Updated old tests: - addo_sets_xer_ov_on_signed_overflow_and_stickies_so: i32::MAX + 1 → INT_MIN. - addx_rc_uses_64bit_compare_not_32bit: renamed to ..._uses_32bit_compare_in_xbox_abi with assertions flipped to the correct 32-bit ABI behavior. New tests: - andisx_sign_bit_set_classifies_lt (PPCBUG-023). - slwx_high_bit_result_classifies_lt (PPCBUG-044). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 11:55:50 +02:00
MechaCat02	20a730d69e	fix(cpu): PPCBUG-095/096/097/098/105 halfword + lwa load truncation Phase 4 batch 5: 5 PPCBUGs in the load family. lha/lhax/lhau/lhaux sign-extended halfword results to u64 (active poisoning for negative halfwords); lwa/lwax/lwaux sign-extended u32 results. - PPCBUG-095/096/097/098 lha[ux]: `as i16 as i64 as u64` → `as i16 as i32 as u32 as u64`. Sign-extend to i32 then zero-extend. Common trigger: int16_t struct fields, PCM samples, packed vertex deltas. Memory 0x8000 was producing 0xFFFFFFFF_FFFF8000. - PPCBUG-105 lwa/lwax/lwaux: `as i32 as i64 as u64` → `as u64`. Per-canary the 64-bit-mode form sign-extends, but in 32-bit ABI we must zero-extend (canary's behavior is rescued by x86 register zeroing in JIT; pure interpreter has no escape). Memory 0x80000000 was producing 0xFFFFFFFF_80000000. Tests: - lha_negative_halfword_zero_extends_upper (PPCBUG-095). - lhaux_negative_halfword_clean_writeback (PPCBUG-098 + EA update). - lwa_high_bit_set_zero_extends_upper (PPCBUG-105). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 11:47:24 +02:00
MechaCat02	82a9bff934	fix(cpu): PPCBUG-009/010+011/041+042+043 mul/div + srawx truncation Phase 4 batch 4: mulwx, divwx (coupled +CR0), srawx/srawix (coupled +CR0). - PPCBUG-009 mullwx: 32-bit ABI. Product truncated to u32 before write. OE handler still uses full i64 product to detect overflow. - PPCBUG-010+011 divwx (coupled): quotient zero-extended (canary uses ZeroExtend(v, INT64_TYPE)). CR0 view via i32 — without this, a negative i32 quotient (e.g. -3 from -10/3) would be classified as positive in i64 view of the now-zero-extended writeback. - PPCBUG-041+042+043 srawx/srawix (coupled): writeback uses `as u32 as u64` (was `as i64 as u64`). All-ones case (sh>=32 with negative input) writes 0x00000000_FFFFFFFF instead of u64::MAX. CR0 view via i32. CA logic preserved unchanged (audit-verified independently correct). Tests: - mullwx_overflow_truncates_to_32 (PPCBUG-009). - divwx_negative_quotient_zero_extends (PPCBUG-010+011). - srawx_negative_value_zero_extends_upper (PPCBUG-041+043). - srawix_high_count_negative_input_yields_low32_all_ones (PPCBUG-042+043). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 11:44:34 +02:00
MechaCat02	bf8208e88c	fix(cpu): PPCBUG-001/002/003/004/005/007 4b immediate ALU truncation Phase 4 batch 3: 6 PPCBUGs in the same-shape-as-addis (4b) sub-section. All share the pattern of computing on 64-bit values when the 32-bit ABI requires u32 arithmetic. - PPCBUG-001 addi: `li rT, -1` produced 0xFFFFFFFF_FFFFFFFF; now 0x00000000_FFFFFFFF. - PPCBUG-002 addic: writeback truncated + CA from u32 unsigned compare matching canary's `AddDidCarry`. - PPCBUG-003 addicx: same plus CR0 i32 view (regression vs. the frozen ppc-manual snapshot which had the correct form). - PPCBUG-004 mulli: 64-bit signed product now truncated to 32 bits. - PPCBUG-005 subficx: writeback + CA in u32 space; removes the bits-32-63 pollution from sign-extended negative SIMM. - PPCBUG-007 subfcx: defensive 32-bit truncation of CA compare. Same shape as the compare that broke addis (0x828F3F98 / 0x828F3F68 case). Tests: - addi_li_neg_one_zero_extends_upper (PPCBUG-001). - addic_carry_uses_32bit_compare (PPCBUG-002). - mulli_overflow_wraps_to_32 (PPCBUG-004). - subficx_neg_simm_zero_extends (PPCBUG-005). - subfcx_addis_incident_case (PPCBUG-007 — exact addis-incident case). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 11:41:49 +02:00
MechaCat02	145a7a4019	fix(cpu): PPCBUG-034+035+036+037 extsbx/extshx writeback + CR0 (coupled) Phase 4 batch 2: extsbx and extshx writeback truncation + CR0 view fix. Coupled per audit — must land together because the writeback fix would silently break CR0 sign classification if the CR0 fix didn't ship in the same commit. Before: - extsbx: `as i8 as i64 as u64` — every negative byte poisoned upper 32 bits (active poisoning, not latent). 0x80 → 0xFFFFFFFF_FFFFFF80. - extshx: same shape for halfwords. - CR0: `as i64` view — accidentally correct on the buggy 64-bit form because the high bits matched the byte's sign bit. After: - extsbx: `as i8 as i32 as u32 as u64` — sign-extend to i32 then zero-extend to u64. 0x80 → 0x00000000_FFFFFF80. - extshx: same for halfwords. - CR0: `as u32 as i32 as i64` — i32 view, so a result with bit 31 set is correctly classified as negative under the 32-bit ABI. Tests: - extsbx_negative_byte_zero_extends_upper: 0x80 input → 0x00000000_FFFFFF80 with CR0.LT set. - extshx_negative_halfword_zero_extends_upper: same shape for 0x8000. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 11:38:22 +02:00
MechaCat02	e18a0a40b8	fix(cpu): PPCBUG-006/008/018/019/028/029/030/031/033 4a active poisoning Phase 4 batch 1: 9 PPCBUGs in the active-poisoning sub-section. All follow the pattern `!val` on u64, which unconditionally flips the upper 32 bits and poisons the GPR even with clean inputs — every execution corrupts the high 32 bits regardless of upstream state. Sub/neg family: - PPCBUG-006 negx: `(!ra).wrapping_add(1)` on u64 + neg_ov_64 checks 64-bit INT_MIN. Fix: do arithmetic in u32, OE checks PPC[ra32==0x80000000]. - PPCBUG-008 subfex: same shape as above plus 64-bit unsigned CA compare. Fix: cast all operands to u32, compute, write `as u64`. - PPCBUG-018 subfzex: `!ra` on u64. Fix: u32 arithmetic. - PPCBUG-019 subfmex: `!ra` on u64 + always-true CA edge (`!ra != 0` was always true for clean ra<0xFFFFFFFF because high bits of !u64 are non-zero). Fix: u32 arithmetic; CA predicate now correct. Logical NOT family: - PPCBUG-028 orcx: rs \| !rb on u64 → high-bit poison. - PPCBUG-029 norx: !(rs\|rb) — the `not` simplified mnemonic. Hot path, every `not` corrupted GPR upper 32 bits. - PPCBUG-030 nandx: !(rs&rb). - PPCBUG-031 eqvx: !(rs^rb). The common `eqv rA,rA,rA` set-to-all-ones idiom now produces 0x00000000_FFFFFFFF instead of 0xFFFFFFFF_FFFFFFFF. - PPCBUG-033 andcx: rs & !rb. CR0 update at every Rc=1 path now uses `as u32 as i32 as i64` so a result with bit 31 set gets classified as negative under the 32-bit ABI (was positive before because upper bits were ones; will be positive in new truncated form unless we cast through i32). This pre-emptively addresses PPCBUG-020 for these specific opcodes; the catch-all sweep in batch 6 covers the remaining sites. Tests: - nego_sets_ov_only_on_int_min: updated from i64::MIN → 0x80000000 (32-bit). - test_subfze_carry_only_when_ra_zero_and_ca_one: result expectations updated from u64::MAX → 0xFFFFFFFF (low 32 bits, upper 32 zero). - New: neg_clean_input_no_upper_bits (PPCBUG-006 regression). - New: norx_not_simplified_keeps_upper_bits_clean (PPCBUG-029 regression). - New: eqvx_self_self_self_sets_low32_to_all_ones (PPCBUG-031 regression). - New: andcx_bit_clear_keeps_upper_clean (PPCBUG-033 regression). - New: subfex_clean_inputs_no_upper_bits (PPCBUG-008 regression). - New: subfmex_ra_max_ca_zero_clears_ca (PPCBUG-019 always-true CA fix). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 11:35:05 +02:00
MechaCat02	f424132a5b	chore(audit): mark P3 PPCBUGs applied; append P3 progress section P3 phase merged at `f3ebaba`. Update audit-findings.md status fields and append the P3 progress section to audit-report-2026-04-29.md, including the new PPCBUG-700 discovery (VMX128 register accessor canary-compliance). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 11:28:38 +02:00
MechaCat02	f3ebaba5c9	Merge branch 'ppc-audit-fix/p3-isolated-high' — Phase 3 isolated HIGH bugs Phase 3 of the PPC instruction audit fix application. Six commits land six independent (or coupled) PPCBUG fixes: - `cedee3c`: PPCBUG-510 stvewx128 16-byte corruption → 4-byte word write - `52ece4b`: PPCBUG-424+425 vmaddfp128/vmaddcfp128 operand swap (VAVD+VB) - `3d8e2ce`: PPCBUG-053+054 32-bit CTR semantics in bcx/bclrx + mtspr CTR - `d4f6ea7`: PPCBUG-640+650 fmt_bc spurious bdnzge/bdzge condition suffix - `2be25bd`: PPCBUG-641+649 sync vs lwsync L-field disambiguation - `7609dcd`: PPCBUG-700 VMX128 register accessors → canary bitfield layout PPCBUG-700 was a discovery during phase end-to-end review: an independent reviewer cross-checked our va128/vb128/vd128/vx128r_rc_bit accessors against canary's `FormatVX128` bitfield struct (xenia-canary `ppc_decode_data.h:484-663`) and found the bit positions were wrong on all four. The audit's line-2958 "confirmed-clean" assessment was based on a miscount of LSB-first packed C++ bitfields. Real Xbox 360 game code follows canary's convention, so any production VMX128 instruction with register VR >= 32 was silently mis-decoded — though no unit test exercised that path until 52ece4b's operand-swap fix exposed the inconsistency. Subsumes PPCBUG-422's prescribed Rc-bit position. Verification at merge: `cargo test --workspace --release` clean across all crates; targeted vmx128/decoder/disasm-golden tests green. Acid test (`-n 4B --parallel`) deferred to end-of-all-phases per user direction.	2026-05-02 11:22:54 +02:00
MechaCat02	7609dcd406	fix(cpu): PPCBUG-700 VMX128 register accessors match canary bitfield layout Independent review of P3 batch 2 (`52ece4b`) found that all three VMX128 register accessors disagreed with canary's FormatVX128/VX128_R bitfield struct (`xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663`). The audit at line 2958 had marked these "confirmed-clean" but had miscounted LSB-first bitfield offsets. Canary's actual layout (LSB-first, GCC/Clang/MSVC on x86): VA128 = VA128l(5) \| VA128h(1)<<5 \| VA128H(1)<<6 = PPC[11:15] \| PPC[26]<<5 \| PPC[21]<<6 (7-bit selector, 3 fields) VB128 = VB128l(5) \| VB128h(2)<<5 = PPC[16:20] \| PPC[30:31]<<5 (7-bit selector, 2 fields) VD128 = VD128l(5) \| VD128h(2)<<5 = PPC[6:10] \| PPC[28:29]<<5 (7-bit selector, 2 fields) VX128_R Rc = PPC[25] (host bit 6) not PPC[27] as prior fix had The buggy convention was internally consistent with hand-crafted test fixtures (which set bits 29/21/22 to encode the high registers, matching the buggy accessor). Real Xbox 360 game code follows canary's convention, so any production VMX128 instruction with VR >= 32 was silently mis-decoded — but no unit test exercised that path until the va128 fix in `52ece4b` exposed the inconsistency. Changes: - decoder.rs: rewrite va128/vb128/vd128/vx128r_rc_bit to canary positions. Drop the speculative `key4_dt` dot-form dispatch in decode_op6 — canary has no separate dot-form opcodes for VX128_R compute ops; Rc is a runtime modifier read by the interpreter via vx128r_rc_bit(). - decoder.rs tests: rewrite vmx128_test_word helper for canary layout; rename/re-encode vmx128_vd128_, vmx128_va128_, vmx128_vb128_* tests. - interpreter.rs: update encode_vpkd3d128 test helper to encode VD via canary's VD128h field; tests now pass vd=96 explicitly. - tests/disasm_goldens.rs: replace the vrlimi128/vsrw128/vpermwi128/ vperm128 hand-encoded raws with canary-compliant encodings; introduce a shared `encode_vx128` helper. - tests/golden/vmx128_registers.json: re-encode 9 entries (vperm128, vsrw128 ×2, vpermwi128, vrlimi128 ×2, vmaddfp128, vmaddcfp128, vnmsubfp128) to canary-compliant raws preserving the same expected operand strings. - audit-findings.md: new PPCBUG-700 entry documenting the discovery and invalidating the audit's "confirmed-clean" assessment. Affects all VMX128 binary ops (vaddfp128, vsubfp128, vmulfp128, vand128, vor128, vxor128, vnor128, vandc128, vsel128, vslo128, vsro128, vperm128, vsrw128, vmaddfp128, vmaddcfp128, vnmsubfp128, vpkd3d128, vpkshss128, vpkshus128, vpkswss128, vpkswus128, vpkuhum128, vpkuhus128, vpkuwum128, vpkuwus128, vmsum3fp128, vmsum4fp128, vrlimi128, vpermwi128 — 30+ opcodes), plus VX128_R compare dot-forms. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 11:22:20 +02:00
MechaCat02	2be25bdd41	fix(disasm): PPCBUG-641+649 sync/lwsync L-field discrimination PPCBUG-641: PpcOpcode::sync emitted "sync" regardless of the L-field at PPC bit 10. The Xbox 360 acquire barrier (encoding 0x7C2004AC, L=1) is lwsync, used in every spinlock. The disassembly DB stored every lwsync as `mnemonic='sync'`, so `SELECT WHERE mnemonic='lwsync'` returned zero rows regardless of binary content. PPCBUG-649 (companion): the golden fixture for lwsync had no ext_mnemonic field, pinning the wrong output and defeating regression detection. Fix: in disasm.rs, gate on `(instr.raw >> 21) & 1` (PPC bit 10) — when set, emit the lwsync extended form. Update extended_mnemonics.json fixture to expect `ext_mnemonic: "lwsync"`. Note: this is the disassembler-side fix only. The interpreter-side PPCBUG-088 (lwsync vs sync semantics) is separate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 10:43:24 +02:00
MechaCat02	d4f6ea787b	fix(disasm): PPCBUG-640+650 fmt_bc spurious condition suffix on bdnz/bdz PPCBUG-640: For BO=16 (bdnz: decrement CTR, branch if non-zero, ignore CR) and BO=18 (bdz: same with branch-if-zero), `fmt_bc` fell through to the `if decr` block and computed `cond_name_opt` from the don't-care BI=0 / cond_true=false pair, yielding `Some("ge")`. The output was therefore `bdnzge` / `bdzge` — a CTR-only branch with a spurious CR-derived suffix. PPCBUG-650 (companion): the golden fixture pinned the wrong output, so the regression had no detection signal until now. `fmt_bclr` already had the correct `if decr && uncond` guard at line 872 producing `bdnzlr` / `bdzlr`. `fmt_bc` lacked the equivalent. Fix: gate the condition string on `!uncond` inside the `if decr` block. For BO=16/18 (uncond bit set), the condition suffix is now empty. Tests: extended_mnemonics.json fixture rows for bdnz/bdz now expect the correct `ext_mnemonic: "bdnz"` / `"bdz"`. Impact: every analysis-DB query for `bdnz` loops (common in pixel-shader and vertex processing) was returning zero rows; matches stored as `bdnzge`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 10:40:45 +02:00
MechaCat02	3d8e2ced2e	fix(cpu): PPCBUG-053+054 32-bit CTR semantics in bcx/bclrx + mtspr CTR PPCBUG-053: bcx and bclrx tested `ctx.ctr != 0` against the full 64-bit register, but the Xbox 360 ABI runs CTR as a 32-bit counter (canary explicitly truncates: `f.Truncate(ctr, INT32_TYPE)`). When upstream 64-bit GPR pollution flowed through `mtspr CTR, rN`, the upper 32 bits stayed non-zero forever; bdnz then looped past the intended 32-bit zero point because the 64-bit comparison still saw the high bits. PPCBUG-054: `mtspr CTR` writeback wrote the full 64-bit GPR value, acting as a firewall gap that fed PPCBUG-053. Defensive truncation prevents CTR from ever acquiring non-zero upper 32 bits independently of the GPR-pollution source. Fixes: - interpreter.rs:849, 879: ctr_ok now uses `(ctx.ctr as u32) != 0` - interpreter.rs:1523: mtspr CTR writes `val as u32 as u64` Tests: - bcx_bdnz_uses_32bit_ctr_compare: bdnz with CTR=0x0000_0001_0000_0001 decrements to 0x0000_0001_0000_0000 and exits (low 32 bits = 0). - bclrx_uses_32bit_ctr_compare: same coverage for bdnzlr. - mtspr_ctr_truncates_to_32_bits: gpr=0xFFFF_FFFF_8000_0001 → ctr=0x8000_0001. Coupled fix per the audit: PPCBUG-053 and PPCBUG-054 land together because either alone is necessary-but-not-sufficient — the truncation prevents new pollution, the 32-bit compare protects against any pollution that slipped in via routes other than mtspr (e.g. mfctr-mtctr roundtrips). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 10:38:18 +02:00

1 2

79 Commits