diff --git a/audit-findings.md b/audit-findings.md new file mode 100644 index 0000000..26cbb97 --- /dev/null +++ b/audit-findings.md @@ -0,0 +1,3416 @@ +# PPC Instruction Audit — Findings Tracker + +**Started**: 2026-04-29 (single session, audit-only) +**Trigger**: `addis` 32-bit-ABI sign-extension fix surfaced a likely systemic class of bugs. +**Status**: in flight. Per-group reports live in `audit-out/`. This file is the consolidated, stable-ID index. +**Workflow**: audit only this session; fix session(s) reference these IDs. + +## Conventions + +- Every finding has an ID `PPCBUG-NNN` for cross-referencing. +- **Status**: `open` (audit found it, not yet fixed) | `applied` (fix landed) | `wontfix` (intentional) | `dup-of:NNN` (collapsed into another finding). +- **Severity**: + - **HIGH** = wrong arithmetic / control flow on plausible Xbox 360 user code. + - **MEDIUM** = wrong status flag / latent under broken upstream invariants / edge case. + - **LOW** = test gap / cosmetic / dead-code-only. +- All file:line refs are `xenia-rs/crates/xenia-cpu/src/interpreter.rs` unless otherwise noted. +- Suggested fixes are written as one-line patches where possible; see the per-group report for full context. + +## Cross-cutting recommendation + +The single recurring root cause is **violating the 32-bit ABI invariant that all GPR writes truncate to 32 bits**. The cleanest fix is to systematically apply `as u32 as u64` at every GPR writeback in every integer ALU op. The existing CA/CR0/OE helpers will then be correct without further changes (because their inputs become guaranteed-clean). The audit reports list each fix individually; the fix session may choose to apply them as one sweep or one-at-a-time. + +A defensive secondary recommendation: even after the writeback truncation, instructions whose CA computation does its own internal arithmetic on 64-bit operands (`subfcx`, `subfex`, `addic`, `addicx`, `subficx`) should additionally truncate their compare operands. This guards against any future regression that re-pollutes the GPR file. + +--- + +## Batch 1 — integer ALU (groups 1-5) + +Per-group reports: `audit-out/group-01-add-imm.md`, `group-02-add-reg.md`, `group-03-sub-reg.md`, `group-04-multiply.md`, `group-05-divide.md`. + +### PPCBUG-001 — addi sign-extension, no truncation +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:114-118 +- **Symptom**: `addi rT, r0, -1` (= `li rT, -1`) writes `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`. Identical shape to addis. +- **Fix**: + ```rust + ctx.gpr[instr.rd()] = ra_val.wrapping_add(instr.simm16() as i64 as u64) as u32 as u64; + ``` +- **Test gap**: existing `test_addi` only covers positive simm16. Add a test for `li rT, -1` and verify the upper 32 bits are zero. + +### PPCBUG-002 — addic untruncated writeback + 64-bit CA compare +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:133-140 +- **Symptom**: (a) GPR writeback not truncated (same shape as addi). (b) CA computed via 64-bit `result < ra` — Canary's `AddDidCarry` explicitly truncates both operands to int32 first. +- **Fix**: + ```rust + let ra32 = ra as u32; + let imm = instr.simm16() as i32 as u32; + let result32 = ra32.wrapping_add(imm); + ctx.xer_ca = if result32 < ra32 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + ``` +- **Test gap**: zero unit tests for addic. + +### PPCBUG-003 — addicx untruncated writeback + 64-bit CA + CR0 regression +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:141-150 +- **Symptom**: same as PPCBUG-002 plus a CR0 regression: live code uses `update_cr_signed(0, result as i64)` (64-bit signed). The frozen snapshot in `ppc-manual/alu/addicx.md` shows the previously-correct `result as i32 as i64` form. Live code has drifted. +- **Fix**: PPCBUG-002 fix plus `update_cr_signed(0, result32 as i32 as i64)`. +- **Test gap**: zero unit tests. +- **Note**: confirms the manual's frozen snapshots are useful drift detectors — see if other opcodes have similarly regressed. + +### PPCBUG-004 — mulli untruncated 64-bit signed product +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:159-164 +- **Symptom**: RA read as full `i64`, product stored as `u64` without truncation. Per ISA in 32-bit ABI, both factors should be i32 and product should fit in 32 bits (overflow silently wraps per ISA). +- **Fix**: + ```rust + let ra = ctx.gpr[instr.ra()] as i32 as i64; + let imm = instr.simm16() as i64; + ctx.gpr[instr.rd()] = (ra.wrapping_mul(imm) as u32) as u64; + ``` +- **Test gap**: zero unit tests. + +### PPCBUG-005 — subficx untruncated writeback + 64-bit CA compare +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:151-158 +- **Symptom**: (a) `imm.wrapping_sub(ra)` on 64-bit values writes poisoned upper bits; sign-extended `imm` for negative SIMM has bits 32-63 set. (b) CA `imm >= ra` is 64-bit unsigned compare; wrong relative to Canary's 32-bit form. +- **Fix**: + ```rust + let ra32 = ra as u32; + let imm32 = instr.simm16() as i32 as u32; + let result32 = imm32.wrapping_sub(ra32); + ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + ``` +- **Test gap**: zero unit tests. + +### PPCBUG-006 — negx active GPR poisoning + 64-bit OE overflow check +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:319-330 +- **Symptom**: (a) `(!ra).wrapping_add(1)` unconditionally sets upper 32 bits to all-ones because `!ra` flips them. Even a clean `r3 = 5` produces `0xFFFFFFFF_FFFFFFFB` instead of `0x00000000_FFFFFFFB`. **This is active, not latent — every neg in 32-bit-ABI code poisons the GPR.** (b) `neg_ov_64` overflow predicate tests `ra == 0x8000_0000_0000_0000` (64-bit INT_MIN) instead of `ra == 0x0000_0000_8000_0000` (32-bit INT_MIN). +- **Fix**: + ```rust + let result = (!(ra as u32)).wrapping_add(1); + ctx.gpr[instr.rd()] = result as u64; + if instr.oe() { + overflow::apply(ctx, (ra as u32) == 0x8000_0000); + } + if instr.rc_bit() { ctx.update_cr_signed(0, result as i32 as i64); } + ``` +- **Test gap**: existing `nego_sets_ov_only_on_int_min` tests 64-bit INT_MIN — add a 32-bit INT_MIN case. + +### PPCBUG-007 — subfcx CA via 64-bit unsigned compare +- **Severity**: HIGH (defensive — same shape as the compare that broke addis) +- **Status**: open +- **Location**: interpreter.rs:258 +- **Symptom**: `if rb >= ra { 1 } else { 0 }` is the exact 64-bit unsigned compare that the addis bug exploited. Wrong CA when either operand has poisoned upper 32 bits. Apply defensively even if all upstream sources are cleaned, because a wrong CA bit is unrecoverable downstream. +- **Fix**: + ```rust + let ra32 = ra as u32; + let rb32 = rb as u32; + let result32 = rb32.wrapping_sub(ra32); + ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + ``` +- **Test gap**: zero dedicated unit tests for subfcx — the most critical opcode in Group 3 had no coverage. Add 6+ tests including the exact 0x828F3F98 / 0x828F3F68 case from the addis incident. + +### PPCBUG-008 — subfex CA via 64-bit unsigned compare + `!ra` poisons writeback +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:268-284 +- **Symptom**: (a) CA `if rb > ra || (rb == ra && ca != 0)` is 64-bit; same shape as PPCBUG-007. (b) Writeback uses `(!ra).wrapping_add(rb).wrapping_add(ca)` — `!ra` always sets upper 32 bits, guaranteed GPR poison even with clean inputs (same shape as PPCBUG-006). +- **Fix**: + ```rust + let ra32 = ra as u32; + let rb32 = rb as u32; + let ca = ctx.xer_ca as u32; + let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca); + ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + ``` + +### PPCBUG-009 — mullwx untruncated 64-bit signed product +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:331-344 +- **Symptom**: 32x32 multiply produces 64-bit signed `i64` product, written to GPR via `as u64` without truncation. When product overflows i32 (which `mullw_ov` correctly detects), upper 32 bits are non-zero and corrupt downstream 64-bit unsigned compares — same class as addis. +- **Fix** (one line; OE handler unchanged): + ```rust + ctx.gpr[instr.rd()] = product as u32 as u64; + ``` + +### PPCBUG-010 — divwx quotient sign-extended to 64 bits +- **Severity**: HIGH +- **Status**: open (must be applied in same commit as PPCBUG-011) +- **Location**: interpreter.rs:373 +- **Symptom**: `(ra / rb) as i64 as u64` sign-extends a negative i32 quotient. `-10 / 3 = -3` writes `0xFFFFFFFF_FFFFFFFD` instead of `0x00000000_FFFFFFFD`. Canary's `InstrEmit_divwx` uses `f.ZeroExtend(v, INT64_TYPE)` — explicit zero-extension. +- **Fix**: `ctx.gpr[instr.rd()] = (ra / rb) as u32 as u64;` + +### PPCBUG-011 — divwx CR0 update breaks after PPCBUG-010 fix +- **Severity**: MEDIUM (coupled to PPCBUG-010 — must land together) +- **Status**: open +- **Location**: interpreter.rs:379 +- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.rd()] as i64)` accidentally works today because the sign-extended GPR has consistent sign in i64 view. After PPCBUG-010, GPR holds `0x00000000_FFFFFFFD` for `-3` and `as i64` reads positive — CR0.LT will be wrong for negative quotients. +- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64);` + +### PPCBUG-012 — addx writeback not truncated (latent) +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:167-179 +- **Symptom**: 64-bit `wrapping_add` result written to GPR untruncated. Latent: only triggers if upstream operands have poisoned upper 32 bits. With PPCBUG-001 etc. unfixed, that invariant is broken — addx amplifies the poison. +- **Fix**: `ctx.gpr[instr.rd()] = result as u32 as u64;` + +### PPCBUG-013 — addcx writeback not truncated (latent) +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:180-193 +- **Fix**: same shape as PPCBUG-012. + +### PPCBUG-014 — addex writeback not truncated (latent) +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:194-209 +- **Fix**: same shape as PPCBUG-012. + +### PPCBUG-015 — addzex writeback not truncated (latent) +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:210-224 +- **Fix**: same shape as PPCBUG-012. + +### PPCBUG-016 — addmex writeback not truncated (latent + edge case) +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:225-240 +- **Symptom**: same writeback issue plus the `wrapping_sub(1)` produces all-ones upper 32 bits when low 32 bits underflow — guaranteed poison even if inputs are clean (same shape as PPCBUG-006/008). +- **Fix**: truncate operands and result to 32 bits. + +### PPCBUG-017 — subfx writeback not truncated (latent) +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:241-253 +- **Fix**: same shape as PPCBUG-012. + +### PPCBUG-018 — subfzex writeback not truncated + `!ra` poisons +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:285-302 +- **Symptom**: `(!ra).wrapping_add(ca)` flips upper 32 bits — guaranteed poison. +- **Fix**: truncate ra to u32, do arithmetic on u32, write `as u64`. + +### PPCBUG-019 — subfmex writeback poisoning + always-true CA edge +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:303-318 +- **Symptom**: (a) writeback poisoned via `(!ra)`. (b) CA predicate `(!ra) != 0` is always true when ra has clean upper 32 bits (because `!ra` flips them) — so CA is always 1, even in the documented edge case where 32-bit `ra == 0xFFFFFFFF && ca == 0` should yield CA=0. +- **Fix**: operate on u32, then `xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 }`. + +### PPCBUG-020 — CR0 update uses 64-bit signed compare in all sub-register ops +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:250, 264, 281, 299, 315, 327, 341, 379, 396, 410, 419, 428, 445, 462 (every Rc=1 path in groups 2-5) +- **Symptom**: `update_cr_signed(0, result as i64)` views result as 64-bit signed. In 32-bit ABI, bit 31 determines LT/GT, not bit 63. A result like `0x00000000_80000000` is negative in 32-bit but positive in 64-bit — CR0.LT inverted. +- **Fix (catch-all)**: change to `result as u32 as i32 as i64` everywhere. Once PPCBUG-001..-019 truncate writebacks, the upper 32 bits of `result` are zero and this distinction becomes moot — but applying both is cheap and provides defense in depth. +- **Note**: this is one logical fix duplicated across all rc paths; the fix session should grep `update_cr_signed(0, .* as i64)` to find them all. + +### PPCBUG-021 — OE overflow checks at bit 63 in all sub-register ops +- **Severity**: LOW +- **Status**: open +- **Locations**: throughout — `add_ov_64`, `sub_ov_64`, `sum_overflow_64`, `mullw_ov`, etc. (defined in `xenia-cpu/src/overflow.rs`) +- **Symptom**: signed-overflow check operates on 64-bit boundary. For 32-bit-ABI ops (`addo`, `subfo`, `subfco`, etc.), should check at bit 31. With PPCBUG-006 a tighter form was given for `negx`. The pattern probably needs systematic review across overflow.rs. +- **Fix**: open a follow-up audit of overflow.rs after batch B completes. + +### PPCBUG-022 — mulld_ov missing INT_MIN * -1 edge case +- **Severity**: LOW +- **Status**: open +- **Location**: `xenia-cpu/src/overflow.rs` (`mulld_ov` helper) +- **Symptom**: 64-bit signed multiply overflow check doesn't handle `i64::MIN * -1`. +- **Fix**: add the special case to the helper. + +### PPCBUG-023 — andisx CR0 update uses 64-bit signed compare; should use 32-bit +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:475 +- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` interprets the result as 64-bit signed. The `andisx` result is bounded by `0x0000_0000_FFFF_0000`, which is always non-negative in 64-bit view. In 32-bit ABI, bit 31 is the sign bit — results with bit 31 set (e.g. `andis. rA, rS, 0x8000` with rS=0x80000000 → result=0x80000000) should yield CR0.LT=1, but xenia-rs gives CR0.GT=1. The ppc-manual frozen snapshot for `andisx` shows the correct `as i32 as i64` form; the live code has drifted. Common trigger: `andis. rA, rS, 0x8000` to test the sign bit of a 32-bit word. +- **Fix**: + ```rust + ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); + ``` +- **Test gap**: zero tests for `andisx`. Add at minimum: result with bit 31 set (expect LT=1), result with bits 0–30 set (expect GT=1), result=0 (expect EQ=1). + +--- + +## Batch 2 — logical immediate (group 6) + +Per-group report: `audit-out/group-06-logic-imm.md`. + +Group 6 summary: only 1 new bug found. The `simm16` sign-extension pattern does not apply (all ops use `uimm16`). `ori`, `oris`, `xori`, `xoris`, and `andix` are ISA-correct; `andisx` has a CR0 interpretation bug (PPCBUG-023). All 6 opcodes have inadequate test coverage (LOW gaps for 5 of them, MEDIUM gap for `andisx` tied to the bug). + +--- + +## Batch 3 — word rotate-and-mask (group 9) + +Per-group report: `audit-out/group-09-word-rotate.md`. + +Group 9 summary: core arithmetic is clean — `rlw_mask`, rotate logic, and result write are all ISA-correct. The single recurring defect is the Rc=1 CR0 path using `as i64` instead of `as u32 as i32 as i64` (instances of PPCBUG-020 specific to these three opcodes). `rlwimix` zeroes the upper 32 bits of RA instead of preserving them per ISA, but this is safe under 32-bit ABI invariant and classified LOW. Test coverage is poor: 1 partial test for `rlwinmx`, zero for the other two. + +### PPCBUG-024 — rlwinmx CR0 update uses 64-bit signed compare; should use 32-bit +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:667 +- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` — result is a zero-extended u32, so bit 31 set yields +2147483648 in 64-bit signed view but -2147483648 in 32-bit ABI. CR0.LT/GT inverted for results with bit 31 set. `rlwinm.` is the most common dot-form instruction in compiler output (all `slwi.`, `srwi.`, `clrlwi.`, bitfield-test-and-branch idioms). +- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);` +- **Test gap**: `test_rlwinm` exists but non-Rc only, result has bit 31 clear. Add Rc=1 tests with bit 31 set in result. + +### PPCBUG-025 — rlwimix CR0 update uses 64-bit signed compare; should use 32-bit +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:679 +- **Symptom**: same class as PPCBUG-024. `rlwimi.` is compiler-generated for struct bitfield writes; when the inserted value occupies or sets bit 31 of RA, CR0.LT is wrong. +- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);` +- **Test gap**: zero tests for `rlwimix`. Add basic insert (non-Rc) + Rc=1 with bit-31-set case. + +### PPCBUG-026 — rlwnmx CR0 update uses 64-bit signed compare; should use 32-bit +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:690 +- **Symptom**: same class as PPCBUG-024. `rlwnm.` is less frequent but used in variable-shift normalisation patterns. +- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);` +- **Test gap**: zero tests for `rlwnmx`. + +### PPCBUG-027 — rlwimix zeroes upper 32 bits of RA instead of preserving them (ISA deviation, LOW) +- **Severity**: LOW +- **Status**: open (no fix action required for 32-bit ABI emulation) +- **Location**: interpreter.rs:677-678 +- **Symptom**: `let ra = ctx.gpr[instr.ra()] as u32` discards upper 32 bits; result written as `as u64` zero-extends. Per ISA, `(RA) & ¬MASK(MB+32, ME+32)` preserves upper 32 bits of RA. Canary confirms: `f.And(f.LoadGPR(i.M.RA), f.LoadConstantUint64(~m))` with `~m` non-zero in upper half. +- **Impact**: under 32-bit ABI, if the 32-bit GPR invariant holds, upper 32 bits of RA are already zero before `rlwimix`, so both behaviours are identical. The deviation is only observable if an upstream bug (PPCBUG-001..023) has leaked non-zero upper bits into RA — in which case `rlwimix` would silently clean them (beneficial side-effect). No isolated fix needed; resolves automatically when upstream bugs are fixed. +- **Note**: if 64-bit mode support is ever added, this will become a HIGH bug. + +--- + +## Batch 2 — logical register (group 7) [renumbered from collision] + +Per-group report: `audit-out/group-07-logic-reg.md` (note: report uses original IDs PPCBUG-023..029 from the subagent's local numbering; tracker uses PPCBUG-028..033 here to avoid collision with groups 6 and 9). + +The group 7 subagent also flagged a CR0 regression across all 8 opcodes — that is an extension of PPCBUG-020 (catch-all for CR0 64-bit-signed regressions). Adding andx, andcx, orx, orcx, xorx, norx, nandx, eqvx Rc=1 paths to PPCBUG-020's scope rather than creating a new ID. + +### PPCBUG-028 — orcx active GPR poisoning +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:509-513 +- **Symptom**: writes `rs | !rb`. Rust's `!` on `u64` flips all 64 bits — the upper 32 bits of `!rb` are unconditionally all-ones, OR'd into the result. With clean inputs `orc r5, r3, r4` writes `0xFFFFFFFF_xxxxxxxx`. Active poisoning, same shape as PPCBUG-006/008. +- **Fix**: operate on u32, write `as u64`: + ```rust + let result = (ctx.gpr[instr.rs()] as u32) | !(ctx.gpr[instr.rb()] as u32); + ctx.gpr[instr.ra()] = result as u64; + ``` +- **Test gap**: zero tests. + +### PPCBUG-029 — norx active GPR poisoning (the `not` simplified mnemonic) +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:519-523 +- **Symptom**: writes `!(rs | rb)` — outer `!` flips upper 32 bits unconditionally. **`nor rA, rS, rS` is the canonical `not` simplified mnemonic** used pervasively in PPC code; every `not` in 32-bit-ABI Xbox 360 binaries actively poisons the GPR. +- **Fix**: u32 arithmetic, write `as u64`. + +### PPCBUG-030 — nandx active GPR poisoning +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:524-528 +- **Symptom**: writes `!(rs & rb)` — same shape as norx. The simplified mnemonic `nand` is also `nand rA, rS, rS` (= `nor . . .` in some assemblers). +- **Fix**: u32 arithmetic. + +### PPCBUG-031 — eqvx active GPR poisoning +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:529-533 +- **Symptom**: writes `!(rs ^ rb)` — same shape. The idiom `eqv rA, rS, rS` "set rA to all-ones (i.e. -1 in 32-bit ABI)" produces `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`. +- **Fix**: u32 arithmetic. + +### PPCBUG-032 — andx / orx / xorx writeback not truncated (latent) +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:494-498 (andx), 504-508 (orx), 514-518 (xorx) +- **Symptom**: 64-bit bitwise on full GPR values. Latent — clean if both operands are clean; pollutes if either is poisoned upstream. +- **Fix**: `as u32 as u64` truncation at writeback. Once all upstream poison sources are fixed, these become unnecessary; until then, defensive truncation. + +### PPCBUG-033 — andcx active poisoning via `!rb` sub-expression +- **Severity**: MEDIUM (the `!rb` always poisons; outer `&` masks it away when rs is clean — fully active when rs is poisoned) +- **Status**: open +- **Location**: interpreter.rs:499-503 +- **Symptom**: writes `rs & !rb`. The `!rb` always has all-ones upper bits; if rs has clean upper bits (zero), the result is clean. If rs is poisoned upstream, the poison propagates AND the always-set bits in `!rb` make it look "guaranteed". This is closer to active than latent. +- **Fix**: `(rs as u32) & !(rb as u32)` then `as u64`. + +## Batch 2 — sign-extend / count-leading-zeros (group 8) [renumbered] + +Per-group report: `audit-out/group-08-extend-clz.md` (report uses local IDs PPCBUG-023..030; tracker uses PPCBUG-034..039). + +### PPCBUG-034 — extsbx writeback sign-extends to 64 bits +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:537 +- **Symptom**: `as i8 as i64 as u64` — a byte with high bit set (0x80) writes `0xFFFFFFFF_FFFFFF80` instead of `0x00000000_FFFFFF80`. Active poisoning on every negative byte. `extsb` is emitted by compilers to canonicalize signed-byte arguments — common code path. +- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i8 as i32 as u32 as u64;` +- **Test gap**: zero unit tests. +- **Note**: Canary's JIT does the same sign-extension but is rescued by x86's 32-bit-write zeroing the upper 32 of host registers. Pure interpreter has no such escape. + +### PPCBUG-035 — extshx writeback sign-extends to 64 bits +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:542 +- **Symptom**: `as i16 as i64 as u64` — same shape as PPCBUG-034 for halfwords. +- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i16 as i32 as u32 as u64;` + +### PPCBUG-036 — extsbx CR0 coupling +- **Severity**: MEDIUM (must land in same commit as PPCBUG-034) +- **Status**: open +- **Location**: interpreter.rs:538 +- **Symptom**: `update_cr_signed(0, ra as i64)` — currently latent because the unfixed sign-extended value's i64 sign matches bit 7 of the byte. After PPCBUG-034 lands, the truncated value's i64 view becomes always non-negative — CR0.LT will never fire for negative byte results. +- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);` — must land with PPCBUG-034. + +### PPCBUG-037 — extshx CR0 coupling +- **Severity**: MEDIUM (must land with PPCBUG-035) +- **Status**: open +- **Location**: interpreter.rs:543 +- **Symptom**: same coupling shape as PPCBUG-036 for halfwords. + +### PPCBUG-038 — extswx ISA-correct, document asymmetry +- **Severity**: LOW (informational / wontfix) +- **Status**: wontfix +- **Location**: interpreter.rs:547 +- **Symptom**: `as i32 as i64 as u64` produces full 64-bit sign-extension. This IS the documented purpose of extsw — argument-register canonicalization in 64-bit mode. Behavior is intentional. After PPCBUG-034/035 land, document the asymmetry with extsb/extsh in a comment. + +### PPCBUG-039 — cntlzdx counts upper 32 always-zero bits in 32-bit ABI +- **Severity**: LOW +- **Status**: open (probably dead code in Xbox 360 binaries) +- **Location**: interpreter.rs:556-562 +- **Symptom**: counts leading zeros in full 64. If a 32-bit-ABI binary emits cntlzd, the result is `32 + cntlzw(low32)` not `cntlzw(low32)`. ISA-correct for 64-bit mode; only matters if the binary actually emits it. +- **Test gap**: zero tests. + +#### Clean opcodes from group 8 + +- `cntlzwx` (interpreter.rs:551-555) — `(rs as u32).leading_zeros()` reads only low 32 bits, result range 0..=32, upper 32 zero. CR0 path benign because result is small. **Test gap only**, LOW. +- `extswx` CR0 path is correct per ISA (PPCBUG-038 wontfix). + +## Batch 2 — shift (group 11) [renumbered] + +Per-group report: `audit-out/group-11-shift.md` (uses local IDs PPCBUG-050..055; tracker uses PPCBUG-040..045). + +### PPCBUG-040 — DECODER BUG: `sh64()` wrong bit order for sradi (HIGH) +- **Severity**: HIGH (this is a decoder-level bug, file:line is in `decoder.rs` not `interpreter.rs`) +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `xenia-rs/crates/xenia-cpu/src/decoder.rs:91-93` (the `sh64()` accessor method on `DecodedInstr`) +- **Symptom**: the XS-form `sradix` (sradi) shift amount is assembled as `SH[4:0] << 1 | SH[5]` instead of the correct `SH[5] << 5 | SH[4:0]`. **Every `sradi rA, rS, N` instruction where N is not 0 or 63 executes with a completely wrong shift count.** Example: `sradi rA, rS, 32` shifts by 1 instead. This is a silent, structural mis-decoding — none of the interpreter changes can paper over it. +- **Cross-reference**: Canary's `(i.XS.SH5 << 5) | i.XS.SH` pattern is the correct ISA encoding. +- **Fix**: in `decoder.rs:sh64()` body, swap the bit order: + ```rust + pub fn sh64(&self) -> u32 { + // SH5 is at bit 30 of the encoded word; SH[4:0] is at bits 16-20. + let sh_lo = extract_bits(self.raw, 16, 20); + let sh_hi = extract_bits(self.raw, 30, 30); + (sh_hi << 5) | sh_lo + } + ``` +- **Impact**: `sradi` is used by compilers for arithmetic right shifts on 64-bit values. In Xbox 360 32-bit-ABI binaries it should not be common, but it's emitted by some compilers for sign-magnitude conversions and 64-bit fixed-point arithmetic. **This is the kind of silent decoder bug the user explicitly wanted the audit to catch.** +- **Test gap**: no decoder unit test pins `sh64()` for non-trivial SH values. Add fixture cases in `disasm_goldens.rs` for `sradi rA, rS, 1`, `sradi rA, rS, 32`, `sradi rA, rS, 63`. +- **Note**: any other instruction that uses the same XS-form SH split-encoding is suspect. Phase C decoder audit must verify `sradi` and `sradix` are the only consumers of `sh64()`. + +### PPCBUG-041 — srawx writeback sign-extends to 64 bits +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:583, 588 (two writeback paths for the count<32 and count>=32 branches) +- **Symptom**: `result as i64 as u64` violates the 32-bit-ABI zero-extension convention. A negative shifted value writes `0xFFFFFFFF_xxxxxxxx` instead of `0x00000000_xxxxxxxx`. +- **Fix**: `result as u32 as u64` in both writeback paths. +- **Note**: subagent verified the CA computation is **independently correct** — uses `(rs as u32) << (32 - sh) != 0` which is the canonical ISA shifted-out-bits test on 32-bit operands. **Do not change CA logic.** + +### PPCBUG-042 — srawix writeback sign-extends to 64 bits +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:600, 605 (same shape as PPCBUG-041 for srawi) +- **Fix**: `result as u32 as u64`. + +### PPCBUG-043 — srawx / srawix CR0 coupling +- **Severity**: MEDIUM (must land with PPCBUG-041 and PPCBUG-042) +- **Status**: open +- **Locations**: interpreter.rs:593, 607 +- **Symptom**: currently masked by the sign-extended writeback (sign-extension makes the 64-bit and 32-bit sign agree). After truncating the writeback, `as i64` will misread the sign for negative results. +- **Fix**: `as u32 as i32 as i64` in both Rc=1 paths, applied with PPCBUG-041/042. + +### PPCBUG-044 — slwx / srwx CR0 misclassifies negative 32-bit results +- **Severity**: LOW (zero-extended results have bit 31 set in low 32, but always positive in i64 view → CR0.LT never fires for slw/srw with bit-31-set results) +- **Status**: open +- **Locations**: interpreter.rs:568, 576 +- **Fix**: `as u32 as i32 as i64`. + +### PPCBUG-045 — Zero unit tests for any shift opcode +- **Severity**: LOW (test gap only) +- **Status**: open +- **Locations**: interpreter.rs:563-658 (entire shift group: slwx, srwx, srawx, srawix, sldx, srdx, sradx, sradix) +- **Recommendation**: add at least one functional test per opcode. Especially: `srawix r3, r3, 1` with rs=0xFFFFFFFE (CA should be 0), `srawix r3, r3, 1` with rs=0x80000001 (CA should be 1, result=0xC0000000); `sradix r3, r3, 32` (currently wrong per PPCBUG-040). + +#### Clean opcodes from group 11 + +- `slwx` writeback at line 568 (zero-ext 32-bit result via `(rs as u32 << count) as u64`) — clean. +- `srwx` writeback at line 576 — clean. +- `sldx`, `srdx`, `sradx` — 64-bit ops, ISA-correct (probably dead in 32-bit-ABI binaries). +- `sradix` body logic is structurally correct; failure is solely from PPCBUG-040 giving it a wrong shift count. + +## Batch 2 — doubleword rotate (group 10) [renumbered] + +Per-group report: `audit-out/group-10-dword-rotate.md` (uses local IDs PPCBUG-027/028; tracker uses PPCBUG-046/047). + +### PPCBUG-046 — DECODER BUG: wrong bit position for MB[5] in all 6 doubleword-rotate opcodes (HIGH) +- **Severity**: HIGH (decoder-level; impacts the canonical zero-extend-to-32 idiom) +- **Status**: applied (52b05b1, 2026-05-01) +- **Locations**: interpreter.rs — every arm of `rldiclx`, `rldicrx`, `rldicx`, `rldimix`, `rldclx`, `rldcrx` (lines 693-754) +- **Symptom**: each arm computes `let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1)`. The bit at `(instr.raw >> 1) & 1` is **PPC bit 30**, which in MD form is `sh[0]` (the low bit of the shift amount) — NOT `mb[5]`. The high bit of the 6-bit MB field lives at PPC bit 26 = `(instr.raw >> 5) & 1`. + + As written, the code computes `(mb[4:0] << 1) | sh[0]`. Ironically `disasm.rs:1256` (the `mb_md()` helper) has the correct formula. The interpreter was written independently with the wrong bit position — probably a copy-error from `sh64()` where bit 30 really is the split bit. +- **Concrete impact**: + - `clrldi r3, r4, 32` is the canonical "zero-extend low 32 bits" idiom emitted constantly in 32-bit-ABI PPC code. Encoded as `rldicl r3, r4, 0, mb=32`. With mb=32, `mb[5]=1, mb[4:0]=0`. The interpreter decodes mb=0 → mask is all-ones → instruction becomes a no-op. Any downstream 64-bit compare (subfcx CA, cmpld) on that register sees a polluted 64-bit value instead of a clean 32-bit zero-extended one. **This is the same class of bug that caused the addis/BST incident.** + - For `rldcr` (MDS form), the XO field's LSB at bit 30 is always 1 (Rc=0 opcode), so `me[5]` is forcibly set to 1 for every non-record-form invocation — effectively adding 32 to all me values. +- **Fix** (one line per opcode): + ```rust + // Replace in all 6 arms: + let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1); + // With: + let mb = instr.mb() | (((instr.raw >> 5) & 1) << 5); + ``` + Or, cleaner: expose `mb_md()` (currently in disasm.rs:1256) as a method on `DecodedInstr` in `decoder.rs` and have the interpreter call `instr.mb_md()` — single source of truth for MD-form mb extraction. +- **Test gap**: zero execution tests for any of the 6 opcodes; only disasm-golden string-output tests. +- **Note**: this is the second decoder bug found by the audit (PPCBUG-040 / `sh64()` for `sradi` is the first). Phase C decoder audit must verify whether other MD/MDS/XS form accessors have similar bit-position errors. + +### PPCBUG-047 — Zero execution tests for any doubleword-rotate opcode +- **Severity**: LOW (test gap) +- **Status**: open +- **Locations**: interpreter.rs:693-754 (all 6 opcodes) +- **Recommendation**: at minimum, a `clrldi r3, r4, 32` test verifying the result is exactly the low 32 bits of r4. After PPCBUG-046 lands, this test would have caught the MB-reconstruction bug. + +#### What's correct in group 10 + +- `sh64()` accessor — correctly reconstructs 6-bit shift from MD split encoding (cross-check: `disasm.rs` agrees). +- `rld_mask_left()` / `rld_mask_right()` mask helpers — verified against Canary's XEMASK. +- `rldicx`/`rldimix` mask formulas (`63 - sh` for right edge) — correct. +- `rldimix` read-modify-write merge — correct 64-bit mask-insert. +- CR0 `as i64` — correct here because these ARE genuine 64-bit ops (unlike word rotate). +- `rldcl`/`rldcr` register-shift extraction (`gpr[rb] & 0x3F`) — correct. +- No 32-bit writeback truncation needed: these are intentionally 64-bit; 32-bit-ABI compilers only emit them with masks that yield 32-bit-clean results. + +## Batch 3 — branch (group 13) + +Per-group report: `audit-out/group-13-branch.md`. + +Group 13 summary: the branch implementation is substantively correct. All BO/BI bit masks, +CTR decrement-before-test ordering, AA absolute vs relative dispatch, LK unconditional write +(including not-taken path in `bcx`), LR-read-before-LR-write atomicity in `bclrx`, and +`get_cr_bit()` field indexing are all ISA-correct and match Canary. The only execution bugs +are a latent 64-bit CTR zero-test (PPCBUG-053/054, active under current GPR-pollution environment) +and severely thin test coverage (PPCBUG-055). + +### PPCBUG-053 — CTR zero-test uses 64-bit compare; should use 32-bit in `bcx`/`bclrx` +- **Severity**: MEDIUM (effectively HIGH given unfixed PPCBUG-001..031 GPR pollution) +- **Status**: open +- **Locations**: `interpreter.rs:849` (`bcx` `ctr_ok`), `interpreter.rs:879` (`bclrx` `ctr_ok`) +- **Symptom**: `ctx.ctr != 0` compares all 64 bits. In 32-bit ABI the CTR is logically 32-bit. + Canary explicitly truncates to 32 bits: `ctr = f.Truncate(ctr, INT32_TYPE)`. When CTR upper + 32 bits are non-zero (due to upstream GPR pollution flowing through `mtspr CTR, rN`), the + 64-bit test disagrees with the 32-bit ISA semantic. Most dangerous with `neg; mtctr; bdnz`: + `negx` (PPCBUG-006) always sets upper 32 bits, so the 32-bit CTR counter can reach zero + while the 64-bit CTR is still non-zero → infinite loop. +- **Fix**: + ```rust + // Replace in both bcx and bclrx: + let ctr_ok = (bo & 0b00100) != 0 + || (((ctx.ctr as u32) != 0) ^ ((bo & 0b00010) != 0)); + ``` + Or, alternatively, truncate at decrement: + ```rust + if bo & 0b00100 == 0 { + ctx.ctr = ctx.ctr.wrapping_sub(1) as u32 as u64; + } + ``` +- **Test gap**: zero tests for CTR-decrement branches (bdnz, bdz, bdnzt, bdnzf, bdzt, bdzf). + +### PPCBUG-054 — `mtspr CTR` writeback not truncated to 32 bits +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:1411` +- **Symptom**: `crate::context::spr::CTR => ctx.ctr = val` writes the full 64-bit GPR to CTR. + Acts as a firewall gap: any upstream 64-bit GPR pollution flows directly into CTR, where it + will be tested by PPCBUG-053's 64-bit comparison. Defensive fix prevents CTR from ever + acquiring non-zero upper 32 bits independently of the GPR-pollution fix. +- **Note**: the `bcctrx` branch-target read (`(ctx.ctr as u32) & !3`) already truncates + correctly; the bug is confined to the `ctr != 0` zero-test in `bcx`/`bclrx`. +- **Fix**: `crate::context::spr::CTR => ctx.ctr = val as u32 as u64,` +- **Cross-reference**: Group 16 (SPR/MSR) subagent should verify this write-point. + +### PPCBUG-055 — Severely inadequate test coverage for all four branch opcodes +- **Severity**: LOW (test gap) +- **Status**: open +- **Locations**: `interpreter.rs` test module (lines 4455–4491) +- **Current coverage**: `bx` forward (1 test), `bl` LR update (1 test), `bcx` taken beq (1 test via `test_cmp_and_bc`). Zero tests for: `bclrx`, `bcctrx`, any CTR-decrement variant, not-taken path, backward branch, AA=1 absolute, `bcl` LR-write-on-not-taken. +- **Recommended minimum**: blr, bctr, bdnz (taken and not-taken at boundary CTR=1), bclrl old-LR-as-target, bcl LK-write-on-not-taken. See per-group report for concrete encoding patterns. + +--- + +## Batch 3 — trap + system call (group 14) + +Per-group report: `audit-out/group-14-trap-sc.md`. + +Group 14 summary: the core trap evaluation (`trap.rs`) is correct — TO bit constants, signed/unsigned +comparison dispatch, and word-vs-doubleword width handling are all ISA-conformant. The live interpreter +arm properly evaluates the TO field (replacing the old unconditional-trap stub). Three MEDIUM issues +found: PC ordering on trap return, missing LEV dispatch for `sc`, and the Xbox 360 typed-trap +convention (`twi 31, r0, IMM`) not handled. Two LOW findings for stale manual snapshots and test gaps. + +### PPCBUG-063 — `ctx.pc` already at CIA+4 when `StepResult::Trap` returns +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:1543 (`ctx.pc += 4`) before interpreter.rs:1549 (`return StepResult::Trap`) +- **Symptom**: any trap handler that reads `ctx.pc` to find the faulting instruction sees CIA+4 instead + of CIA. The existing `tracing::warn!` compensates with `.wrapping_sub(4)`, confirming the asymmetry. + On real hardware, SRR0 = CIA (trapping instruction address). Current risk LOW (no handler inspects + pc), but HIGH if any SEH/exception-delivery path is added (critical for the C++ throw investigation). +- **Fix**: save CIA before incrementing, restore it when firing the trap: + ```rust + let trap_pc = ctx.pc; + ctx.pc += 4; + if fired { ctx.pc = trap_pc; return StepResult::Trap; } + ``` + Alternatively store CIA in a separate `ctx.srr0`-equivalent field and leave `ctx.pc` at NIA. +- **Note**: `sc` correctly leaves `ctx.pc` at NIA (the return address) — that is a different and + correct design choice. The inconsistency between sc and trap is the bug. + +### PPCBUG-064 — `sc` ignores `LEV` field; `sc 2` (HVcall) silently misdispatched +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:915-918 +- **Symptom**: `sc 2` (Xbox 360 hypervisor call) returns `StepResult::SystemCall` identically to + `sc 0`. Canary dispatches LEV=0 to `syscall_handler` and LEV=2 to `f.function()` (the HVcall + path). For pure game-title code (LEV=0 only) this is invisible; XDK kernel-mode components and + some HV-aware titles may use `sc 2`. +- **Fix**: decode the 7-bit LEV field (bits 20-26 of SC-form encoding), add a `HypervisorCall` + variant to `StepResult`, and dispatch accordingly. + +### PPCBUG-065 — `twi 31, r0, IMM` typed-trap not handled; SIMM type code discarded +- **Severity**: MEDIUM +- **Status**: open +- **Location**: interpreter.rs:1532-1551 (trap arm) +- **Symptom**: `twi 31, r0, IMM` (TO=31=unconditional, RA=r0) is used by the Xbox 360 CRT/kernel + to encode typed C++ exceptions — the 16-bit SIMM carries the exception type discriminator. xenia-rs + fires the trap correctly but discards SIMM. The caller sees a generic `StepResult::Trap` with no + type information, preventing correct C++ SEH dispatch. +- **Canary reference**: `ppc_emit_control.cc:611-616` special-cases `RA==0 && TO==31` and calls + `f.Trap(type)` with the SIMM as the type code. +- **Fix**: add a `trap_type: Option` payload to `StepResult::Trap`. Detect `twi` with `to()==31` + and `ra()==0` and populate it with `instr.simm16() as u16`. +- **Note**: directly relevant to the Sylpheed `std::runtime_error` throw investigation + (project_xenia_rs_sylpheed_throw_2026_04_28.md) — the typed-trap SIMM carries the CRT exception + class that the kernel uses to route to the correct handler. + +### PPCBUG-066 — Stale frozen snapshots in ppc-manual for td/tdi/tw/twi +- **Severity**: LOW +- **Status**: open +- **Location**: `ppc-manual/branch/td.md`, `tdi.md`, `tw.md`, `twi.md` +- **Symptom**: all four show the old unconditional-trap stub (`// For now, just trace and continue`) + instead of the current TO-field-evaluating implementation. +- **Fix**: regenerate after PPCBUG-063 and PPCBUG-065 are resolved. + +### PPCBUG-067 — Test gaps for trap and sc +- **Severity**: LOW +- **Status**: open +- **Location**: interpreter.rs `#[cfg(test)] mod tests` +- **Missing coverage**: `sc` smoke test (fires SystemCall, advances PC); `td` vs `tw` on 64-bit-clean + operands (width discrimination); `tdi`/`td` signed/unsigned LT/GT conditions; `tw 31, r0, r0` + unconditional `trap` encoding; `twi 31, r0, N` typed-trap; negative simm16 in `twi`. + +--- + +## Batch 3 — SPR / MSR / TB / FPSCR / VSCR moves (group 16) + +Per-group report: `audit-out/group-16-spr-msr.md`. + +Group 16 summary: the core paths are clean — `mfcr`, `mtcrf`, `mfspr`, `mtspr`, `mftb`, `mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`, `mfvscr`, `mtvscr` are all functionally ISA-correct. The `spr()` decoder accessor correctly inverts the PPC XFX half-swap encoding. The one MEDIUM finding is `mtmsrd` silently ignoring the `L=1` partial-MSR-write semantics. Five LOW test-gap findings cover near-total absence of unit tests for this entire group. + +### PPCBUG-078 — `mtmsrd` L=1 partial-MSR-write not modelled + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:1458-1461` +- **Symptom**: xenia-rs merges `mtmsr` and `mtmsrd` into a single body that unconditionally writes `ctx.msr = ctx.gpr[instr.rs()]`. PowerISA specifies that `mtmsrd` with instruction bit 15 (`L`) = 1 performs a partial update: only `MSR[EE]` (u64 bit 15) and `MSR[RI]` (u64 bit 0) are modified; all other MSR bits preserved. Kernel code using `mtmsrd L=1` to re-enable external interrupts silently corrupts the entire MSR in xenia-rs. Canary acknowledges the same TODO. +- **Fix**: + ```rust + PpcOpcode::mtmsrd => { + let l = (instr.raw >> (31 - 15)) & 1; + if l == 1 { + let mask: u64 = (1u64 << 15) | 1u64; + let rs = ctx.gpr[instr.rs()]; + ctx.msr = (ctx.msr & !mask) | (rs & mask); + } else { + ctx.msr = ctx.gpr[instr.rs()]; + } + ctx.pc += 4; + } + ``` +- **Test gap**: zero tests for `mtmsr` or `mtmsrd`. + +### PPCBUG-079 — `mtspr` silent drop of unknown-SPR writes without value logging + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:1430-1433` +- **Symptom**: Unknown SPR writes are silently discarded with only a `tracing::warn!()` that omits the value being written. Reduces debuggability; no correctness impact for known Xbox 360 titles. +- **Fix** (optional): `tracing::warn!("mtspr: unimplemented SPR {} <= 0x{:016x}", spr, val)`. + +### PPCBUG-080 — `mfvscr` does not zero the upper 96 bits of VD per ISA + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:2198-2201` +- **Symptom**: ISA requires `mfvscr VD` to place VSCR in the rightmost word of VD and zero bytes 0-11. xenia-rs copies the full 128-bit `ctx.vscr` into `ctx.vr[VD]`, leaving stale data in bytes 0-11 if `ctx.vscr` was populated from a non-zeroed vector. Canary explicitly zero-extends. +- **Fix**: + ```rust + PpcOpcode::mfvscr => { + let vscr_word = ctx.vscr.as_u32x4()[3]; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array([0, 0, 0, vscr_word]); + ctx.pc += 4; + } + ``` + +### PPCBUG-081 — Zero unit tests for `mfcr` / `mtcrf` + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:1436-1453` +- **Recommended additions**: full mfcr round-trip; `mtcrf 0xFF`; `mtcrf 0x80` (CR0 only); `mtcrf 0x38` (ABI CR2|CR3|CR4 restore). + +### PPCBUG-082 — Minimal unit tests for `mfspr` / `mtspr` + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:1376-1435` +- **Note**: only DEC and TBL_WRITE covered; add LR, CTR, XER, TBL/TBU, VRSAVE. + +### PPCBUG-083 — Zero unit tests for `mftb` + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:1462-1470` + +### PPCBUG-084 — Zero interpreter-level round-trip tests for FPSCR move instructions + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:2678-2720` +- **Note**: `fpscr.rs` helper-level tests exist; interpreter dispatch (`mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`) is untested end-to-end. + +### PPCBUG-085 — Zero unit tests for `mfvscr` / `mtvscr` + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:2198-2205` + +IDs PPCBUG-086 and PPCBUG-087 are unallocated — reserved for group 16 follow-up findings. + +--- + +## Batch 3 — cache + sync (group 17) + +Per-group report: `audit-out/group-17-cache-sync.md`. + +Group 17 summary: the cleanest group audited so far. Both `dcbz` and `dcbz128` have correct EA computation (ra=0 special case, 64-bit→u32 truncation, alignment masks `& !31` / `& !127`, byte counts 32/128). The nine no-op opcodes (dcbf, dcbi, dcbst, dcbt, dcbtst, icbi, sync, eieio, isync) are all listed in one arm and complete. The `dcbz128` Xbox 360 specific opcode (RT=1 bit distinguishes from dcbz) dispatches correctly. **0 HIGH, 0 MEDIUM, 2 LOW** findings. + +### PPCBUG-088 — sync disasm ignores L field; `lwsync` (L=1) shows as "sync" +- **Severity**: LOW +- **Status**: open +- **Location**: `xenia-rs/crates/xenia-cpu/src/disasm.rs:364` +- **Symptom**: The `PpcOpcode::sync` disasm arm outputs `"sync"` unconditionally regardless of the L field (PPC bit 10). When L=1 (word `0x7C2004AC`), the instruction should disassemble as `"lwsync"`. The `extended_mnemonics.json` golden already accepts `"sync"` as output for the lwsync case, meaning the test currently passes with the wrong string. +- **Impact**: Disassembly output for `lwsync` (very common in Xbox 360 acquire-barrier idioms) shows as `sync`. No interpreter impact; both L=0 and L=1 are correctly treated as no-op PC advance. +- **Fix**: + ```rust + PpcOpcode::sync => { + // L field at PPC bit 10 + if extract_bits(instr.raw, 10, 10) == 1 { + base("lwsync", String::new(), 0) + } else { + base("sync", String::new(), 0) + } + } + ``` + Update `extended_mnemonics.json` golden to add `"ext_mnemonic": "lwsync"` for that entry. + +### PPCBUG-089 — Zero interpreter execution tests for group 17 +- **Severity**: LOW +- **Status**: open +- **Location**: `xenia-rs/crates/xenia-cpu/src/interpreter.rs` (test module) +- **Symptom**: No `#[test]` covers `dcbz`, `dcbz128`, or any no-op (sync/isync/eieio/dcbf/icbi). A regression in dcbz byte count or alignment would go undetected. +- **Recommended additions**: `dcbz` with misaligned address (verifies 32-byte aligned zero), `dcbz128` with misaligned address (verifies 128-byte aligned zero), both ra=0 and ra!=0 cases, `sync`/`isync`/`dcbf` no-op PC-advance smoke tests. + +--- + +## Batch 3 — CR logical + CR moves (group 15) + +Per-group report: `audit-out/group-15-cr-logical.md`. + +Group 15 summary: **cleanest group audited to date**. All 8 CR logical ops (`crand`, `crandc`, +`creqv`, `crnand`, `crnor`, `cror`, `crorc`, `crxor`), `mcrf`, and `mcrxr` are ISA-correct. +The `cr_logical` helper's use of `fn(bool, bool) -> bool` prevents the `!u64` bit-pollution class +(PPCBUG-028–031 in group 7). CR bit indexing in `get_cr_bit`/`set_cr_bit` is correct (bit/4 = +field, bit%4 = within-field sub-index matching PPC MSB-0 numbering, with sub `{0=LT, 1=GT, 2=EQ, +3=SO}`). `mcrxr` correctly maps XER{SO,OV,CA} to CR{LT,GT,EQ} with SO=false and unconditionally +clears the XER bits. `mcrfs` nibble extraction, field shift formula (`28 - crfs*4`), and +CLEARABLE_MASK (all 14 ISA-clearable exception bits, no FEX/VX) are all correct. One MEDIUM ISA +violation: `mcrfs` omits VX summary recomputation. Two LOW findings: a misleading test comment and +zero coverage for all 8 CR logical ops + `mcrf`. + +### PPCBUG-068 — `mcrfs` does not recompute VX summary bit after clearing VX* exception bits + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:4250` (`ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK)`) +- **Symptom**: When `mcrfs` clears VX* exception bits (VXSNAN, VXISI, VXIDI, VXZDZ, VXIMZ, + VXVC, VXSOFT, VXSQRT, VXCVI) from any source field, the VX summary bit (FPSCR[2], `fpscr::VX + = 1<<29`) is left stale. If those VX* bits were the only contributors to VX, it should become + 0 but remains 1. A subsequent `mcrfs cr0, 0` will then report VX=1 in CR0.EQ, misleading the + caller into thinking an invalid-operation exception is still active. +- **Fix**: + ```rust + // After ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK); add: + if (ctx.fpscr & fpscr::VX_ALL) != 0 { + ctx.fpscr |= fpscr::VX; + } else { + ctx.fpscr &= !fpscr::VX; + } + // FEX recomputation omitted — xenia doesn't model enabled-exception dispatch. + ``` +- **Test gap**: existing test only covers crfS=0 (FX+OX) — no VX* bits involved. Add a test + that sets only VXSNAN, runs `mcrfs cr0, 1`, then verifies VX is now 0. + +### PPCBUG-069 — `mcrfs` test comment claims OX(so)=0 but OX is set in the test + +- **Severity**: LOW (cosmetic; the assert is correct, only the comment is wrong) +- **Status**: open +- **Location**: `interpreter.rs:5402` +- **Symptom**: Comment reads `"FX(lt)=1 and OX(so)=0"`. FPSCR was set to `(1<<31)|(1<<28)`, + which sets both FX and OX. The nibble is `0b1001`, so `so=true`. The assert `cr[2].as_u8() + == 0b1001` is correct; only the comment is wrong. +- **Fix**: `// FX(lt)=1, FEX(gt)=0, VX(eq)=0, OX(so)=1 → 0b1001 = 9` + +### PPCBUG-070 — Zero execution tests for all 8 CR logical ops and `mcrf` + +- **Severity**: LOW (test gap) +- **Status**: open +- **Locations**: `interpreter.rs:1473–1484` +- **Missing minimum**: `crclr` idiom (`crxor BT,BT,BT`, BT=1 → 0), `crset` idiom + (`creqv BT,BT,BT`, BT=0 → 1), `crmove` idiom (`cror BT,BA,BA`), `crnot` idiom + (`crnor BT,BA,BA`, BA=1 → 0), cross-field `crand`/`crandc`, and a full `mcrf + cr0, cr3` field-copy + source-field-intact test. + +--- + +## Pre-pass hints REFUTED by audit + +These were flagged by the orchestrator's regex scan but the subagents found them to be safe: + +- **`divwux` writeback** (interpreter.rs:390) — both operands cast to `u32` before division, `as u64` zero-extends correctly. **Clean.** +- **`mulhwx` intermediate cast** (interpreter.rs:349) — `((result >> 32) as i32 as i64 as u64) & 0xFFFF_FFFF` is redundant but the trailing mask saves correctness. Cosmetic only. +- **`mulhwux` writeback** (interpreter.rs:359) — `(result >> 32) & 0xFFFF_FFFF` clean unsigned. Clean. +- **CR0 stale-prepass-claim**: pre-pass document mentioned `result as i32 as i64`; live code actually uses `result as i64` — so the *claim that the live form is i64* is **correct**, but the prepass implied an i32 form was already there. PPCBUG-020 is the real finding. + +--- + +## Batch 4 — load float (group 23) + +Per-group report: `audit-out/group-23-load-float.md`. + +Group 23 summary: the double-precision load family (`lfd`, `lfdu`, `lfdux`, `lfdx`) is fully +ISA-correct — EA computation, endianness, update-form writeback, and bit-pattern fidelity are +all clean. The single-precision family (`lfs`, `lfsu`, `lfsux`, `lfsx`) has one HIGH bug: +Rust's `as f64` float cast compiles to x86 `CVTSS2SD` which unconditionally sets the IEEE quiet +bit in the output, silently converting f32 SNaN loads to f64 QNaN. The ISA requires the SNaN +to pass through unchanged. FPSCR.NI does not apply to loads (correct by omission). One LOW +test-gap finding. **2 IDs used (PPCBUG-128, PPCBUG-129). 8 IDs unallocated (PPCBUG-130..137).** + +### PPCBUG-128 — lfs/lfsu/lfsx/lfsux silently quieten SNaN via `as f64` Rust float cast + +- **Severity**: HIGH +- **Status**: open +- **Locations**: interpreter.rs:1064 (lfs), 1070 (lfsx), 1087 (lfsu), 1093 (lfsux) +- **Symptom**: All four single-precision load arms use `mem.read_f32(ea) as f64` where + `read_f32` = `f32::from_bits(read_u32(ea))`. The `as f64` Rust float cast compiles to x86 + `CVTSS2SD`, which unconditionally sets bit 51 of the f64 mantissa (the IEEE quiet/signalling + discriminator bit) for any NaN input. An f32 SNaN (e.g. `0x7F800001`) is loaded and written + to the FPR as the f64 QNaN `0x7FF8000002000000` instead of the SNaN `0x7FF0000002000000`. + + **ISA requirement**: "A signalling NaN passes through unchanged into the FPR — it will signal + at the next FP arithmetic instruction." (lfs.md Special Cases). The FPR must hold the SNaN; + VXSNAN fires at the consuming arithmetic op, not at the load. + + **Impact**: (a) Game code storing f32 SNaN sentinels (physics engines mark unset float slots + with SNaN) and then loading+inspecting them: `fpscr::is_snan(ctx.fpr[rd])` returns false + after the load, breaking sentinel detection. (b) Arithmetic ops consuming the loaded value + see a QNaN rather than SNaN, so VXSNAN is never set; games relying on VXSNAN to detect + uninitialized-read bugs get false negatives. + +- **Canary parity**: Canary's JIT also uses CVTSS2SD via `f.Convert()`. Both emulators share + this deviation. The bug is a structural consequence of using semantic float widening rather + than a bit-pattern-preserving widening routine. +- **Fix**: replace the float cast with a bit-manipulation widening that preserves the SNaN bit: + ```rust + fn widen_f32_bits_to_f64(raw32: u32) -> u64 { + let sign = ((raw32 >> 31) as u64) << 63; + let exp32 = ((raw32 >> 23) & 0xFF) as i32; + let mant32 = (raw32 & 0x007F_FFFF) as u64; + if exp32 == 0xFF { + // NaN or Infinity — propagate mantissa left-shifted by 29 bits. + // SNaN (bit22=0) stays SNaN (bit51=0); QNaN (bit22=1) stays QNaN (bit51=1). + sign | (0x7FFu64 << 52) | (mant32 << 29) + } else if exp32 == 0 { + // ±Zero or subnormal f32. + if mant32 == 0 { return sign; } // ±zero + // Subnormal: normalize by finding leading bit, then adjust exponent. + let shift = mant32.leading_zeros() - (64 - 23); + let exp64 = (1023u64 - 126).wrapping_sub(shift as u64); + let mant64 = (mant32 << (shift + 1 + 29)) & 0x000F_FFFF_FFFF_FFFF; + sign | (exp64 << 52) | mant64 + } else { + // Normal f32 → normal f64. + let exp64 = (exp32 as u64) - 127 + 1023; + sign | (exp64 << 52) | (mant32 << 29) + } + } + // In each lfs* arm: + ctx.fpr[instr.rd()] = f64::from_bits(widen_f32_bits_to_f64(mem.read_u32(ea))); + ``` + This function also correctly handles subnormal f32 → normal f64 widening (which the `as f64` + cast already gets right numerically, but now goes through a consistent code path). +- **Test gap**: add a test loading an f32 SNaN (`0x7F800001`) via `lfs` and asserting + `fpscr::is_snan(ctx.fpr[rd])` is `true` and bit 51 of `ctx.fpr[rd].to_bits()` is 0. + +### PPCBUG-129 — Zero interpreter execution tests for all 8 float-load opcodes + +- **Severity**: LOW (test gap) +- **Status**: open +- **Locations**: interpreter.rs test module; `tests/disasm_goldens.rs:249-250` (disasm-only) +- **Symptom**: No `#[test]`-decorated function exercises any float-load interpreter arm. + A regression in EA computation, endianness, f32→f64 widening, or update-form writeback + would go undetected. The SNaN bug (PPCBUG-128) was undetected partly due to this gap. +- **Recommended minimum**: + 1. `lfs` normal: `0x3F800000` (1.0f32) → assert `fpr[rd] == 1.0f64` exact. + 2. `lfs` negative displacement: base minus 4. + 3. `lfs` ra=0 path (absolute addressing). + 4. `lfd` normal: store PI bits, assert exact bit equality via `.to_bits()`. + 5. `lfd` SNaN: store `0x7FF0_0000_0000_0001u64`, assert exact bit equality after load. + 6. `lfsu` / `lfsux` / `lfdu` / `lfdux`: verify loaded FPR value AND rA update address. + 7. After PPCBUG-128 fix: `lfs` SNaN round-trip test. + +IDs PPCBUG-130 through PPCBUG-137 are unallocated — no further bugs found in group 23. + +--- + +## Files modified by the audit + +- `xenia-rs/audit-prepass-findings.md` — Phase A pre-pass red flags (orchestrator regex output). +- `xenia-rs/audit-out/group-01-add-imm.md` — Group 1 report (Sonnet subagent). +- `xenia-rs/audit-out/group-02-add-reg.md` — Group 2 report. +- `xenia-rs/audit-out/group-03-sub-reg.md` — Group 3 report. +- `xenia-rs/audit-out/group-04-multiply.md` — Group 4 report. +- `xenia-rs/audit-out/group-05-divide.md` — Group 5 report. +- `xenia-rs/audit-out/group-06-logic-imm.md` — Group 6 report. +- `xenia-rs/audit-out/group-09-word-rotate.md` — Group 9 report. +- `xenia-rs/audit-out/group-13-branch.md` — Group 13 report. +- `xenia-rs/audit-out/group-14-trap-sc.md` — Group 14 report. +- `xenia-rs/audit-out/group-15-cr-logical.md` — Group 15 report. +- `xenia-rs/audit-out/group-16-spr-msr.md` — Group 16 report. +- `xenia-rs/audit-out/group-17-cache-sync.md` — Group 17 report. +- `xenia-rs/audit-out/group-18-load-byte.md` — Group 18 report. +- `xenia-rs/audit-out/group-19-load-halfword.md` — Group 19 report. +- `xenia-rs/audit-out/group-21-load-doubleword.md` — Group 21 report. +- `xenia-rs/audit-out/group-22-load-mlsr.md` — Group 22 report. +- `xenia-rs/audit-out/group-23-load-float.md` — Group 23 report. +- `xenia-rs/audit-out/group-24-store-byte-half.md` — Group 24 report. +- `xenia-rs/audit-out/group-26-store-doubleword.md` — Group 26 report. +- `xenia-rs/audit-findings.md` — this consolidated tracker. + +**No source code under `xenia-rs/crates/` has been modified.** + +--- + +## Batch 4 — load byte (group 18) + +Per-group report: `audit-out/group-18-load-byte.md`. + +Group 18 summary: **cleanest group audited to date — zero HIGH or MEDIUM bugs.** All four opcodes +(`lbz`, `lbzu`, `lbzx`, `lbzux`) are ISA-correct: EA computation (rA=0 special case, D-field +sign-extension, 32-bit EA truncation), zero-extension of the byte result to 64 bits, and +update-form writeback all match the ISA spec and Canary cross-reference. Two LOW findings only. + +### PPCBUG-090 — lbzu/lbzux: rD==rA "invalid form" silently misloads rD + +- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding) +- **Status**: open +- **Location**: interpreter.rs:951-956 (lbzu), 963-968 (lbzux) +- **Symptom**: When `rD == rA` (invalid form, UISA undefined), the byte load into `gpr[rD]` at + line 953/965 is immediately overwritten by the EA writeback at line 954/966. Net result: + `gpr[rD]` holds the EA, not the loaded byte. Canary has the same behaviour. No practical impact + under normal compiler output. +- **Recommendation**: add `debug_assert!(instr.rd() != instr.ra())` in debug builds. + +### PPCBUG-091 — Zero interpreter execution tests for all four lbz* opcodes + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs test module; disasm_goldens.rs:247 (disasm-only, no execution) +- **Symptom**: No `#[test]` exercises lines 945-968. A regression in EA computation, + zero-extension, or the update writeback would go undetected. +- **Recommended minimum**: `lbz` with ra=0 + negative displacement; `lbzu` normal case (verify + both byte result and rA update); `lbzx` with ra=0; `lbzux` normal case. Each test should + assert `gpr[rD] <= 0xFF` to catch any future accidental sign-extension. + +IDs PPCBUG-092, PPCBUG-093, PPCBUG-094 are unallocated — no further bugs found in group 18. + +--- + +## Batch 4 — load halfword (group 19) + +Per-group report: `audit-out/group-19-load-halfword.md`. + +Group 19 summary: **4 HIGH bugs confirmed — all pre-pass flags validated.** The four `lha*` opcodes +(`lha`, `lhax`, `lhau`, `lhaux`) all use `as i16 as i64 as u64`, sign-extending a negative halfword +to 64 bits in violation of the 32-bit ABI. Every negative halfword load (common for `int16_t` PCM +samples, packed vertex deltas, `short[]` arrays) actively poisons the upper 32 bits of the +destination GPR — identical shape to the `addis` bug. The four `lhz*` opcodes and `lhbrx` are all +clean (`as u64` zero-extension; `swap_bytes() as u64` byte-reversal; correct endian handling; correct +EA computation and update writebacks). Two LOW findings: rD==rA invalid-form in update variants, +and zero unit tests for all nine opcodes. + +### PPCBUG-095 — `lha`: GPR writeback sign-extends to 64 bits +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:990 +- **Symptom**: `mem.read_u16(ea) as i16 as i64 as u64` — memory `0x8000` writes + `0xFFFFFFFF_FFFF8000` instead of `0x00000000_FFFF8000`. Active GPR poisoning for every + negative halfword. Common trigger: `int16_t` struct fields, PCM samples, packed vertex deltas. +- **Fix**: + ```rust + ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64; + ``` +- **Test gap**: zero unit tests. Add: memory `0x8000` → `gpr[rD] == 0x00000000_FFFF8000`; + memory `0x7FFF` → `gpr[rD] == 0x00000000_00007FFF`. + +### PPCBUG-096 — `lhax`: GPR writeback sign-extends to 64 bits +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:996 +- **Symptom**: identical to PPCBUG-095. Indexed form emitted for array access with GPR index. +- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64` +- **Test gap**: zero unit tests. + +### PPCBUG-097 — `lhau`: GPR writeback sign-extends to 64 bits +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:1007 +- **Symptom**: identical to PPCBUG-095. Update form emitted for auto-incrementing `short[]` loops; + poison accumulates across all iterations. +- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64` +- **Test gap**: zero unit tests. Add: verify both `gpr[rD]` (upper-32 = 0) and `gpr[rA]` (EA update). + +### PPCBUG-098 — `lhaux`: GPR writeback sign-extends to 64 bits +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:1013 +- **Symptom**: identical to PPCBUG-095, update+indexed form. +- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64` +- **Test gap**: zero unit tests. +- **Note**: PPCBUG-095..098 are the same one-line fix at four sites. Fix session sweep: + `rg -n 'as i16 as i64 as u64' interpreter.rs` finds exactly these four lines. + +### PPCBUG-099 — `lhau`/`lhaux`: rD==rA invalid-form silently destroys load result +- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding) +- **Status**: open +- **Location**: interpreter.rs:1005-1016 +- **Symptom**: same as PPCBUG-090 (`lbzu`/`lbzux`) — EA writeback overwrites `gpr[rD]` when + `rD == rA`. Net: `gpr[rD]` holds EA, not the loaded value. +- **Recommendation**: `debug_assert!(instr.rd() != instr.ra())` in both arms. + +### PPCBUG-100 — Zero execution tests for all nine halfword-load opcodes +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs test module +- **Symptom**: No `#[test]` exercises any of the 9 opcodes. The HIGH sign-extension bug would + have been caught by any test that checks `gpr[rD] <= 0x0000_0000_FFFF_FFFF`. +- **Recommended minimum**: `lha` with negative halfword (assert upper 32 zero), `lhz` same, + `lhau` verify both rD and rA, `lhzux` verify both rD and rA, `lhbrx` verify byte-swap. + +IDs PPCBUG-101, PPCBUG-102, PPCBUG-103, PPCBUG-104 are unallocated — no further bugs found in group 19. + +--- + +## Batch 4 — load word (group 20) + +Per-group report: `audit-out/group-20-load-word.md`. + +Group 20 summary: **1 HIGH bug (reservation invalidation never called), 1 MEDIUM (cross-thread +reservation isolation), 1 MEDIUM (lwa 64-bit sign-extension hazard), 3 LOW test gaps.** The +zero-extending family (`lwz`/`lwzu`/`lwzx`/`lwzux`) is entirely correct — `mem.read_u32(ea) as u64` +cleanly zero-extends; EA computation, update writebacks, and RA0 handling all match ISA and Canary. +`lwbrx` is correct: the double-swap (`from_be_bytes` then `swap_bytes()`) correctly produces a +little-endian word read, zero-extended. The sign-extending family (`lwa`/`lwax`/`lwaux`) is +ISA-correct for 64-bit mode but a 32-bit-ABI hazard — classified MEDIUM because `lwa` is a +64-bit-mode instruction unlikely to appear in Xbox 360 32-bit-ABI binaries. The HIGH finding is +that `ReservationTable::invalidate_for_write` is defined and unit-tested but **never called** from +any store instruction, breaking multi-threaded `lwarx`/`stwcx.` atomicity under `--parallel`. + +### PPCBUG-105 — lwa / lwax / lwaux sign-extend to 64 bits; 32-bit-ABI hazard + +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:1032 (lwa), 1038 (lwax), 1043 (lwaux) +- **Symptom**: `mem.read_u32(ea) as i32 as i64 as u64` — a word with high bit set (e.g. `0x8000_0000`) + writes `0xFFFF_FFFF_8000_0000` to rD. ISA-correct for 64-bit-mode `lwa`. In 32-bit ABI, the poisoned + upper 32 bits produce wrong CA / CR results in downstream 64-bit unsigned compares — same shape as + the `addis` bug. +- **Likelihood**: LOW on real Xbox 360 32-bit-ABI binaries (compilers use `lwz` for word loads; `lwa` + is a 64-bit-mode instruction). Risk elevated if the binary contains 64-bit-mode kernel code. +- **Note**: Canary also uses `SignExtend(..., INT64_TYPE)` — both are ISA-correct. Pre-pass flagged + HIGH; audit downgrades to MEDIUM because `lwa` is unlikely in 32-bit-ABI Xbox 360 code. + +### PPCBUG-106 — lwa no-update-form undocumented (LOW / informational) + +- **Severity**: LOW +- **Status**: open +- **Location**: interpreter.rs:1029-1034 +- **Symptom**: `lwa` arm has no RA writeback. Correct per ISA (no `lwau` in PowerISA). Undocumented. +- **Fix**: add comment `// No lwau in PowerISA; lwa is DS-form non-update only.` + +### PPCBUG-107 — `invalidate_for_write` never called from stores; lwarx/stwcx. atomicity broken under `--parallel` (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Locations**: `reservation.rs:234` (definition, never called from interpreter); `interpreter.rs:1182-1278` (all store arms, none call it) +- **Symptom**: `ReservationTable::invalidate_for_write(addr)` is defined and correctly unit-tested but + no interpreter store arm calls it. Under M3 `--parallel` with the table enabled, a plain `stw` by + thread B to a cache line reserved by thread A does NOT clear thread A's table slot. Thread A's + subsequent `stwcx.` calls `t.try_commit()`, which succeeds — spurious success, violating + store-conditional atomicity. All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic + counters) built on `lwarx`/`stwcx.` are broken in multi-threaded mode. +- **Concrete scenario**: thread A: `lwarx r3, 0, r4` (reserves line). Thread B: `stw r5, 0(r4)` + (same address; should invalidate). Thread A: `stwcx. r6, 0, r4` → should fail (CR0.EQ=0) but + succeeds (CR0.EQ=1). Thread A's store silently overwrites thread B's store. +- **Fix**: in every store arm, before `mem.write_*`, add: + ```rust + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + ``` + `has_active_reservers()` is a single `Relaxed` atomic load — negligible cost for non-atomic code + (common case returns false immediately). Alternative: inject the table into the memory layer so + `write_u32`/`write_u64` call it automatically. +- **Test gap**: add interpreter-level test: `lwarx` reserve a line, intervening `stw` to the same + line, `stwcx.` must fail (CR0.EQ=0). + +### PPCBUG-108 — Legacy per-ctx reservation path: cross-thread invalidation impossible (MEDIUM) + +- **Severity**: MEDIUM +- **Status**: applied (ca5b90b, 2026-05-01) +- **Location**: interpreter.rs:1148-1153 (stwcx legacy path) +- **Symptom**: When table is None/disabled, reservation state lives in per-thread `PpcContext` fields. + A store by thread B cannot clear `ctx_A.has_reservation`. Safe in strict lockstep (one host thread). + Broken under real parallelism with the table inadvertently disabled. +- **Fix**: add a `debug_assert!` in `lwarx`/`stwcx.` that table is enabled when multiple host threads + are active. The M3 scheduler should always enable the table before spawning a second host thread. + +### PPCBUG-109 — Zero unit tests for lwa / lwax / lwaux + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs test module +- **Recommended minimum**: + - `lwa` with `0x8000_0000` → `gpr[rD] == 0xFFFF_FFFF_8000_0000`. + - `lwa` with `0x7FFF_FFFF` → `gpr[rD] == 0x0000_0000_7FFF_FFFF`. + - `lwax` with ra=0. + - `lwaux`: verify loaded value and rA update. + +### PPCBUG-110 — Zero unit tests for lwbrx + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs test module +- **Recommended minimum**: memory `[0x11, 0x22, 0x33, 0x44]` at EA → `gpr[rD] == 0x4433_2211`; ra=0; + assert `gpr[rD] <= 0xFFFF_FFFF`. + +### PPCBUG-111 — lwarx / stwcx test suite missing key cases + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs:5167-5207 (two tests exist) +- **Missing**: `lwarx` ra=0; `stwcx.` without prior `lwarx` → CR0.EQ=0; second `lwarx` displaces + first; post-PPCBUG-107-fix store-invalidation test; `lwarx` zero-extension assertion. + +IDs PPCBUG-112, PPCBUG-113, PPCBUG-114 are unallocated — reserved for group 20 follow-up. + +--- + +## Batch 4 — load doubleword (group 21) + +Per-group report: `audit-out/group-21-load-doubleword.md`. + +Group 21 summary: **cleanest load group audited — zero HIGH bugs.** All six instructions (`ld`, +`ldu`, `ldux`, `ldx`, `ldbrx`, `ldarx`) are ISA-correct: 64-bit load, big-endian byte order, +EA computation (RA=0, DS-form, u32 truncation), update-form writebacks, and reservation tracking +all pass scrutiny against Canary and the ISA spec. `ldbrx`'s double-swap pattern was investigated +and confirmed correct (PPCBUG-115 informational). One MEDIUM documentation finding, two LOW findings. + +### PPCBUG-115 — `ldbrx` byte-swap confirmed correct (informational) + +- **Severity**: LOW (confirmed clean, informational only) +- **Status**: wontfix +- **Location**: `interpreter.rs:4157-4159` +- **Analysis**: `mem.read_u64` uses `u64::from_be_bytes` internally (confirmed in `heap.rs:404` + and interpreter's `TestMem`), so it returns the BE-decoded value. Calling `.swap_bytes()` + re-reverses to give the LE interpretation, which is exactly what `ldbrx` specifies. Canary + achieves the same result by skipping `ByteSwap` at the HIR level. Both approaches are correct. + See per-group report for full byte-level worked example. + +### PPCBUG-116 — `ld`/`ldx`/`ldu`/`ldux` as 32-bit-ABI poison sources (documentation) + +- **Severity**: MEDIUM (awareness/documentation; no change to load instructions themselves) +- **Status**: open +- **Location**: `interpreter.rs:1017-1058` +- **Symptom**: These instructions correctly write full 64-bit values to the destination GPR. + Xbox 360 32-bit-ABI binaries legitimately emit them for TOC loads, vtable loads, and kernel + structure accesses — all of which may have non-zero upper 32 bits. Until PPCBUG-001..089 + arithmetic truncation fixes land, such values can flow into 64-bit compares and corrupt CA + bits and CR fields — the inverse of the `addis` bug (pollution from memory side vs. sign-ext). +- **Key guard already in place**: PPCBUG-007's `subfcx` CA fix truncates operands to u32 before + the compare, correctly handling `ld`-originated 64-bit values. This is the most critical + downstream consumer and the fix is already specified. + +### PPCBUG-117 — Stale frozen snapshot in `ppc-manual/memory/ldarx.md` + +- **Severity**: LOW +- **Status**: open +- **Location**: `ppc-manual/memory/ldarx.md` (frozen snapshot section) +- **Symptom**: Snapshot uses old field name `ctx.reserved_addr`; live code uses + `ctx.reserved_line = ea & !RESERVATION_MASK` (M3 refactor). Cosmetic only. +- **Fix**: Regenerate snapshot after M3 field names settle. + +### PPCBUG-118 — Zero functional tests for `ld`, `ldx`, `ldu`, `ldux`, `ldbrx` + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` test module +- **Symptom**: `test_ldarx_stdcx_pair` covers `ldarx`/`stdcx` only. Five doubleword load + variants are untested. Recommended minimum: `ld` with positive DS, negative DS, and RA=0; + `ldx` basic; `ldu` with RA writeback check; `ldux` with RA writeback check; `ldbrx` with + asymmetric data to distinguish output from plain `ldx`. + +IDs PPCBUG-119 through PPCBUG-122 are unallocated — reserved for group 21 follow-up. + +--- + +## Batch 4 — load multiple/string (group 22) + +Per-group report: `audit-out/group-22-load-mlsr.md`. + +Group 22 summary: one structural HIGH bug (`lswx` is always a no-op due to missing XER TBC field), +one MEDIUM coupling bug (the write path discards TBC on `mtspr XER`), one MEDIUM ISA-form deviation +(`lmw` does not skip RA-in-range stores unlike Canary), and two LOW findings. The `lswi` body itself +is correct; `lmw` core logic (loop bound, zero-extension, byte-packing, register wraparound) is clean. +Zero unit tests across all three opcodes. + +### PPCBUG-123 — `lswx` XER TBC field not modeled; always loads 0 bytes + +- **Severity**: HIGH +- **Status**: open +- **Location**: `context.rs:235-237` (`xer()` method) + `interpreter.rs:4172` +- **Symptom**: `ctx.xer()` assembles only SO[31], OV[30], CA[29] — bits 0–28 are always zero. + `lswx` reads `ctx.xer() & 0x7F` expecting the XER TBC byte-count field at bits 0–6, but always + gets 0. The `while bytes_left > 0` loop never executes; **`lswx` is permanently a no-op** — + no bytes are loaded, no destination registers are written. The companion `stswx` at + `interpreter.rs:4191` has the identical pattern and is equally broken. +- **Root cause**: `PpcContext` has no `xer_tbc` field. Neither `xer()` nor `set_xer()` model + XER[25:31]. Any `mtspr XER, rN` that sets a non-zero byte count silently discards it (PPCBUG-124). +- **Cross-reference**: Canary marks `lswx` as `XEINSTRNOTIMPLEMENTED()` — xenia-rs implemented the + body but left the XER infrastructure incomplete. +- **Fix**: + 1. Add `pub xer_tbc: u8` to `PpcContext`. + 2. In `xer()`: `| (self.xer_tbc as u32)` for bits 0–6. + 3. In `set_xer()`: `self.xer_tbc = (val & 0x7F) as u8`. + The `lswx` body is then correct as-is. +- **Test gap**: zero unit tests. After fix: `mtspr XER, r3` (r3=4) then `lswx r5, 0, r4` should + write exactly 4 bytes into r5 (high byte = first byte at EA). + +### PPCBUG-124 — `set_xer()` discards TBC on `mtspr XER` (structural coupling to PPCBUG-123) + +- **Severity**: MEDIUM (must land with PPCBUG-123) +- **Status**: open +- **Location**: `context.rs:239-244` +- **Symptom**: `set_xer()` writes only SO/OV/CA from the 32-bit value, silently discarding bits 0–28 + (including the 7-bit TBC field). Any guest `mtspr XER, rN` with a non-zero byte count loses that + count; subsequent `lswx`/`stswx` see TBC=0. Fix is the same three-line change as PPCBUG-123. + +### PPCBUG-125 — `lmw` missing RA-in-destination-range skip + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:1515` +- **Symptom**: PowerISA declares `lmw rT, D(rA)` invalid when `rA` is in `[rT..31]`. Canary skips + the store to `rA` in that case (`if (i.D.RT + j == i.D.RA) continue`). xenia-rs pre-computes EA + before the loop (so EA values remain correct), but overwrites `rA` with the loaded word instead of + preserving it. Result differs from Canary for this invalid encoding. Any program that relies on RA + surviving a nominally invalid `lmw` will see the wrong value. +- **Fix**: + ```rust + for r in instr.rd()..32 { + if r == instr.ra() { ea = ea.wrapping_add(4); continue; } + ctx.gpr[r] = mem.read_u32(ea as u32) as u64; + ea = ea.wrapping_add(4); + } + ``` +- **Test gap**: zero tests. Add: `lmw r28, 0(r28)` (RA=RT=28) — after fix, gpr[28] unchanged. + +### PPCBUG-126 — `lswi` uses `instr.rb()` instead of `instr.nb()` for the NB field + +- **Severity**: LOW (maintenance hazard, not a correctness bug) +- **Status**: open +- **Location**: `interpreter.rs:1340` +- **Symptom**: `instr.rb()` and `instr.nb()` both extract bits 16–20 and return identical values. + Using `rb()` misrepresents the operand as a register reference rather than a 5-bit immediate count. + The companion `stswi` at line 1359 has the same pattern. A future `rb()` type-system refactor + could break `lswi`/`stswi` silently. +- **Fix**: `instr.nb()` at both sites. + +### PPCBUG-127 — Zero execution tests for lmw, lswi, lswx + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` test module +- **Symptom**: No `#[test]` exists for any of the three opcodes. A regression in loop bounds, + byte-packing, EA computation, or the NB=0 special case would go undetected. +- **Recommended minimum**: `lmw r30, 0(r1)` (2-word load); `lswi r3, r4, 8` (2-word byte pack); + `lswi r31, r4, 8` (register wraparound → r31 and r0); `lswi r3, r4, 0` (NB=0→32 special case); + post-PPCBUG-123 fix: `lswx` with XER TBC=4 (1-word load), TBC=0 (no-op), TBC=5 (partial word). + +--- + +## Batch 5 — store byte/halfword (group 24) + +Per-group report: `audit-out/group-24-store-byte-half.md`. + +Group 24 summary: **3 findings: 1 HIGH (cross-cutting reservation invalidation), 1 LOW/informational +(update-form zero-extension correct but undocumented), 1 LOW (zero test coverage).** EA computation, +value truncation (`as u8`, `as u16`), RA=0 special cases, update-form writeback zero-extension, +big-endian `mem.write_u16` path, and `sthbrx` byte-reverse logic are all ISA-correct. The single +HIGH finding is the systemic absence of `invalidate_for_write` calls — same class as PPCBUG-107, +now documented for all 9 byte/halfword store opcodes. + +### PPCBUG-130 — All 9 store-byte/halfword opcodes missing `invalidate_for_write` (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Locations**: `interpreter.rs:1207` (stb), `1213` (stbu), `1219` (stbx), `1225` (stbux), + `1231` (sth), `1237` (sthu), `1243` (sthx), `1249` (sthux), `1337` (sthbrx) +- **Class**: same root cause as PPCBUG-107 (stw/stdcx family — `invalidate_for_write` never called + from any store arm). +- **Symptom**: Under `--parallel`, a `stb`, `sth`, or `sthbrx` (or any variant in this group) to a + cache line reserved by another thread via `lwarx`/`ldarx` does NOT clear the table slot. + The reserving thread's subsequent `stwcx.`/`stdcx.` spuriously succeeds even though an + intervening sub-word store has modified the line — violating store-conditional atomicity. Affects + any lock-free protocol that uses byte or halfword stores adjacent to or inside a `lwarx`/`stwcx.` + loop (e.g. byte-level spinlocks, tagged-pointer updates, audio ring-buffer flags). +- **Fix** (per PPCBUG-107 pattern): before each `mem.write_u8/u16`, add: + ```rust + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + ``` +- **Note**: PPCBUG-107 is the canonical parent finding. PPCBUG-130 documents that the byte/halfword + group must be included in the same fix sweep. + +### PPCBUG-131 — Update-form rA zero-extension correct but undocumented (LOW / informational) + +- **Severity**: LOW (informational — behavior is correct) +- **Status**: open (documentation gap) +- **Locations**: `interpreter.rs:1216` (stbu), `1228` (stbux), `1240` (sthu), `1252` (sthux) +- **Symptom**: Each update-form arm writes `ctx.gpr[instr.ra()] = ea as u64` where `ea: u32`. + This zero-extends to 64 bits — correct in the 32-bit ABI (addresses are 32-bit; upper half must + be zero). No bug, but there is no comment explaining the deliberate zero-extension. A maintainer + who computes EA as `u64` throughout and drops the `as u32` intermediate would silently + sign-extend negative displacements into rA, mirroring the `addis` bug shape. +- **Fix**: add comment `// EA is u32; zero-extend into rA (32-bit ABI: upper 32 bits must be 0).` + at each update-form writeback line. + +### PPCBUG-132 — Zero unit tests for all 9 store-byte/halfword opcodes (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` test module +- **Symptom**: No `test_stb*` or `test_sth*` functions exist. Any regression in EA computation, + value truncation, update-form writeback order, or `sthbrx` byte-swap logic would be invisible. +- **Recommended minimum**: `stb` basic + ra=0; `stbu`/`stbux` with rA writeback check; `stbx` + ra=0; `sth` big-endian byte check (`0xDEAD` → `[0xDE, 0xAD]`); `sthu`/`sthux` writeback; + `sthbrx` byte-reversed check (`0xDEAD` → `[0xAD, 0xDE]`); post-PPCBUG-130 fix: `lwarx` + `stb` + to same line + `stwcx.` → CR0.EQ=0. + +IDs PPCBUG-133 through PPCBUG-139 are unallocated — reserved for group 24 follow-up. + +--- + +## Batch 5 — store word (group 25) + +Per-group report: `audit-out/group-25-store-word.md`. + +Group 25 summary: **8 findings: 4 HIGH (reservation invalidation per opcode), 0 MEDIUM, 4 LOW.** +Core arithmetic and semantics are entirely clean for all 6 opcodes. EA computation (RA=0 guards, +simm16 sign-extend, u32 truncation), value truncation (`as u32`), update-form writebacks +(`ea as u64` zero-extension), big-endian `mem.write_u32`, `stwbrx` byte-reversal, and `stwcx` +conditional-store logic (cache-line reservation check, CAS, CR0 update, reservation always +cleared) all match the ISA and Canary exactly. The `stwcx` manual snapshot is stale (uses old +`reserved_addr` field name; live code correctly uses `reserved_line` at cache-line granularity — +actually MORE correct than the snapshot). Dominant finding is the same systemic miss as PPCBUG-107 +and PPCBUG-130: `invalidate_for_write` is never called from any plain store arm. + +### PPCBUG-140 — stw: missing `invalidate_for_write` call (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Location**: `interpreter.rs:1183-1188` +- **Systemic root cause**: PPCBUG-107 +- **Symptom**: Under `--parallel` with the ReservationTable enabled, a plain `stw` by thread B + to a cache line reserved by thread A does not clear thread A's table slot. Thread A's + subsequent `stwcx.` spuriously succeeds (CR0.EQ=1) even though thread B has written the line. + All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic counters) built on + `lwarx`/`stwcx.` are broken in multi-threaded mode. `stw` is the most common store instruction — + every stack write, pointer store, and integer field write is affected. +- **Fix**: Before `mem.write_u32(ea, ...)`: + ```rust + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + ``` + `has_active_reservers()` is a single `Relaxed` load — zero cost in the common non-atomic case. + +### PPCBUG-141 — stwu: missing `invalidate_for_write` call (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Location**: `interpreter.rs:1189-1194` +- **Systemic root cause**: PPCBUG-107 +- **Symptom**: Same class as PPCBUG-140. `stwu r1, -N(r1)` is the canonical function-prologue + stack-allocation idiom emitted by every compiled function. A thread holding a reservation on + the stack region would see spurious `stwcx.` success after any prologue store. +- **Fix**: Same pattern as PPCBUG-140, inserted before `mem.write_u32`. + +### PPCBUG-142 — stwx: missing `invalidate_for_write` call (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Location**: `interpreter.rs:1195-1200` +- **Systemic root cause**: PPCBUG-107 +- **Symptom**: Same class as PPCBUG-140. `stwx` is the indexed store used for array writes and + indirect dereferences — common in loops that may run concurrently with reservation holders. +- **Fix**: Same pattern as PPCBUG-140. + +### PPCBUG-143 — stwux: missing `invalidate_for_write` call (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Location**: `interpreter.rs:1201-1206` +- **Systemic root cause**: PPCBUG-107 +- **Symptom**: Same class as PPCBUG-140. Less common than stw/stwu but still a plain store + that must participate in reservation invalidation. +- **Fix**: Same pattern as PPCBUG-140. + +### PPCBUG-144 — stwbrx: missing `invalidate_for_write` call (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Location**: `interpreter.rs:1568-1573` +- **Systemic root cause**: PPCBUG-107 +- **Symptom**: Same class as PPCBUG-140. Byte-reversed stores (used for LE-payload GPU command + buffers, file format fields) are still plain stores with respect to the reservation protocol. +- **Fix**: Same pattern as PPCBUG-140. `ea` is already a `u32` at this point (line 1570). + +### PPCBUG-145 — stwcx: stale manual snapshot uses `reserved_addr` (LOW) + +- **Severity**: LOW (documentation only; live code is correct) +- **Status**: open +- **Location**: `ppc-manual/memory/stwcx.md` (frozen snapshot section) +- **Symptom**: The frozen snapshot shows `ctx.reserved_addr == ea` (exact-word comparison). + The live code at `interpreter.rs:1137-1153` uses `ctx.reserved_line == line` where + `line = ea & !RESERVATION_MASK` (cache-line comparison). The live code is MORE correct per + ISA (PowerISA 2.07B defines reservation at cache-line granularity). Snapshot reflects an + earlier implementation before M3 introduced `RESERVATION_MASK` and `reserved_line`. + Tests confirm live behavior is correct (`stwcx_succeeds_within_same_cache_line`). +- **Fix**: Regenerate the `stwcx.md` snapshot to show current field names and add a note on + the ISA cache-line granule. + +### PPCBUG-146 — Zero unit tests for stwu / stwx / stwux / stwbrx (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` test module +- **Symptom**: Four of the six group-25 opcodes have zero dedicated unit tests. +- **Recommended minimum**: + - `stwu r3, -8(r1)`: verify memory at `r1-8` and `gpr[1]` updated to `old_r1 - 8`. + - `stwx ra=0`: store at `gpr[rb]`, verify memory and no RA writeback. + - `stwux`: indexed update — verify store and RA writeback. + - `stwbrx 0x11223344`: bytes at EA should be `[0x44, 0x33, 0x22, 0x11]`. + +### PPCBUG-147 — stwcx test suite missing key cases (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs:5167-5208` (two existing tests) +- **Missing**: + - `stwcx.` without prior `lwarx` → CR0.EQ=0, memory not written. + - Post-PPCBUG-140-fix: `lwarx` then `stw` to same line then `stwcx.` → CR0.EQ=0. + - RA=0 form: `stwcx. rS, 0, rB`. + - Explicit memory check on failure path (assert memory unchanged). + +IDs PPCBUG-148 and PPCBUG-149 are unallocated — reserved for group 25 follow-up. + +--- + +## Batch 5 (continued) — store multiple/string (group 27) + +Per-group report: `audit-out/group-27-store-mlsr.md`. + +Group 27 summary: **5 findings: 2 HIGH, 1 MEDIUM, 2 LOW.** `stswx` is a permanent no-op (identical +root cause as PPCBUG-123 for `lswx` — XER TBC field not modeled; fixed as side effect of +PPCBUG-123/124). `stmw`, `stswi`, and `stswx` all omit `invalidate_for_write`, aggravated vs. +single-word stores because a single `stmw` can dirty multiple cache lines. `stswi` uses `instr.rb()` +instead of `instr.nb()` (maintenance hazard, same shape as PPCBUG-126 for `lswi`). Zero unit tests +across all three opcodes. + +### PPCBUG-160 — stmw, stswi, stswx missing `invalidate_for_write`; multi-line atomicity exposure (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Locations**: `interpreter.rs:1521` (stmw), `interpreter.rs:1357` (stswi), `interpreter.rs:4189` (stswx) +- **Extends**: PPCBUG-107. The prior stated range `1182-1278` does not cover these three arms. + Multi-word instructions (stmw up to 128 bytes = 2 lines; stswx up to 127 bytes = ~2 lines) make + the probability of missing a reservation invalidation much higher than single-word stores. +- **Symptom**: thread B's `stmw` saves 18+ non-volatile registers across two cache lines. Thread A's + `lwarx` reservation on the second line is not cleared. Thread A's `stwcx.` spuriously succeeds. + Because `stmw` is the ABI-standard non-volatile register save, this is triggered constantly in + function prologues — any lock-free primitive inside a prologue/epilogue window is at risk. +- **Fix** (same pattern as PPCBUG-107): before each `mem.write_u32`/`mem.write_u8` call, add the + `invalidate_for_write` guard. See group-27 report for per-opcode code snippets. +- **Test gap**: `lwarx` reserve a line, `stmw` across that line, `stwcx.` must return CR0.EQ=0. + +### PPCBUG-161 — `stswx` is a permanent no-op: XER TBC not modeled (HIGH) + +- **Severity**: HIGH +- **Status**: open +- **Location**: `interpreter.rs:4189` (`stswx` arm) + `context.rs:235-243` (`xer()`/`set_xer()`) +- **Companion**: PPCBUG-123 (lswx), PPCBUG-124 (mtspr XER). This finding covers the store side. +- **Symptom**: `ctx.xer() & 0x7F` always returns 0 (no `xer_tbc` field). `stswx` unconditionally + stores zero bytes. The byte-loop body is otherwise correct and requires no further changes. +- **Fix**: same three-line fix as PPCBUG-123 (add `xer_tbc: u8` to `PpcContext`; update `xer()` + and `set_xer()`). The `stswx` body is correct once TBC is live. +- **Test gap**: `mtspr XER` (TBC=5) + `stswx r3, 0, r4` → 5 bytes written big-endian. + +### PPCBUG-162 — `stswi` uses `instr.rb()` instead of `instr.nb()` for NB field (MEDIUM) + +- **Severity**: MEDIUM (maintenance hazard; not a runtime correctness bug today) +- **Status**: open +- **Location**: `interpreter.rs:1359` +- **Companion**: PPCBUG-126 (`lswi` identical pattern at line 1340). +- **Symptom**: `instr.rb()` and `instr.nb()` extract the same bits 16-20, so values are equal now. + If `rb()` is ever given a newtype wrapper (e.g. `RegIdx`) to enforce register semantics, the cast + `instr.rb() as u32` will either fail or yield wrong semantics — silently treating a register index + as a byte count. +- **Fix**: `let nb = if instr.nb() == 0 { 32 } else { instr.nb() };` + +### PPCBUG-163 — Zero unit tests for stmw, stswi, stswx (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` test module +- **Symptom**: No `#[test]` exists for any of the three opcodes. Regressions in loop bounds, byte + order, EA computation, NB=0 handling, or register wraparound are invisible. +- **Recommended minimum**: stmw 2-word and 32-word cases; stswi 4-byte / 0 to 32 / wraparound / + partial; stswx (post PPCBUG-123 fix) TBC=4, TBC=0, TBC=5. See group-27 report for full list. + +ID PPCBUG-164 is unallocated — reserved for group 27 follow-up. + +--- + +## Batch 5 (continued) — store doubleword (group 26) + +Per-group report: `audit-out/group-26-store-doubleword.md`. + +Group 26 summary: **0 HIGH, 2 MEDIUM, 2 LOW.** The core semantics of all six opcodes are +ISA-correct: `ds()` decoder extracts the DS-form displacement correctly; `mem.write_u64` handles +big-endian byte ordering; update-form writebacks are zero-extended and in the right order; `stdcx.` +CR0 encoding, reservation check, and table-path interaction all match the ISA. `stdbrx` correctly +applies `swap_bytes()`. No 32-bit writeback truncation issues (these are store ops, not ALU ops). +Two MEDIUM findings: (1) PPCBUG-150 extends PPCBUG-107 to the doubleword stores (same gap — +`invalidate_for_write` never called); (2) PPCBUG-151 identifies that `stwcx.` and `stdcx.` share +the same reservation slot without a width discriminator, allowing a `lwarx`+`stdcx.` or +`ldarx`+`stwcx.` cross-pair to succeed when it should fail. Four IDs used (PPCBUG-150..153). + +### PPCBUG-150 — `std`/`stdu`/`stdx`/`stdux`/`stdbrx` do not call `invalidate_for_write` (scope extension of PPCBUG-107) + +- **Severity**: MEDIUM (same classification as PPCBUG-107) +- **Status**: applied (ca5b90b, 2026-05-01) +- **Locations**: + - `interpreter.rs:1258` (`std`) + - `interpreter.rs:1264` (`stdx`) + - `interpreter.rs:1269` (`stdu`) + - `interpreter.rs:1275` (`stdux`) + - `interpreter.rs:4163` (`stdbrx`) +- **Symptom**: When `--parallel` is active and the `ReservationTable` is enabled, any of these + five stores to an address another HW thread has reserved via `ldarx` will NOT invalidate that + thread's reservation. The `ldarx`-holding thread's `stdcx.` can subsequently succeed even though + the memory was overwritten — a classic LL/SC ABA gap. Fix session for PPCBUG-107 must include + these five sites. +- **Fix**: in each arm, after `mem.write_u64(ea, ...)`, add: + ```rust + if let Some(t) = &ctx.reservation_table { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + ``` + +### PPCBUG-151 — `stdcx.`/`stwcx.` reservation width not discriminated: cross-width pair silently succeeds + +- **Severity**: MEDIUM +- **Status**: applied (ca5b90b, 2026-05-01) +- **Location**: `interpreter.rs:4119-4155` (`stdcx`) vs `interpreter.rs:1134-1180` (`stwcx`) +- **Symptom**: Both `stwcx.` and `stdcx.` match reservations using only `(has_reservation, + reserved_line)`. A `lwarx` reservation can be spuriously committed by `stdcx.`, or a `ldarx` + reservation by `stwcx.`, as long as the cache line matches. The ISA requires pairing — `lwarx` + must be committed by `stwcx.`, and `ldarx` by `stdcx.`. Cross-width commit reads the wrong width + from memory and writes back the wrong width, with no failure indication (CR0.EQ=1). +- **Fix**: add a `reservation_width: u8` field (4 or 8) to `PpcContext`. `stwcx.` requires + `reservation_width==4`; `stdcx.` requires `reservation_width==8`. In the table path, pack the + 1-bit width flag into one of the spare bits of the 64-bit slot (bits 39–32 are always zero for + line addresses in the 32-bit guest address space). + +### PPCBUG-152 — `stdu`/`stdux` no invalid-form guard for RS==RA (LOW) + +- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this) +- **Status**: open +- **Location**: `interpreter.rs:1267-1278` +- **Symptom**: When `RA==RS`, the store writes the original RS value, then RA (==RS) is + overwritten with EA, destroying the source. ISA marks this invalid-form. Consistent with + policy of other update-form stores in groups 18-22. +- **Fix**: `debug_assert!(instr.ra() != 0 && instr.ra() != instr.rs())` in debug builds. + +### PPCBUG-153 — Zero unit tests for std/stdu/stdx/stdux/stdbrx; stdcx. happy-path only (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` test module (only `test_ldarx_stdcx_pair` at line 4629) +- **Missing coverage**: `std` with negative DS; `std` with RA=0; `stdu` update writeback; `stdx` + with RA=0; `stdux` indexed update; `stdbrx` byte-reversed output; `stdcx.` failure path (no + prior reservation or EA mismatch); `stdcx.` `has_reservation` cleared on failure. +- **Recommended minimum**: 6 tests — see per-group report for encodings. + +IDs PPCBUG-154 through PPCBUG-159 are unallocated — reserved for group 26 follow-up. + +--- + +## Batch 5 (continued) — store float (group 28) + +Per-group report: `audit-out/group-28-store-float.md`. + +Group 28 summary: **7 findings: 3 HIGH, 1 MEDIUM, 3 LOW.** EA computation, endianness, update-form +writebacks, and `stfiwx` integer-word extraction are all correct. Critical bugs: (1) `stfs*` never +raises FPSCR exception bits (VXSNAN, XX, OX, UX) required by PowerISA for double→single narrowing; +(2) `stfs*` ignores FPSCR.RN rounding mode, always using round-to-nearest-even; (3) all 9 FP store +arms omit `invalidate_for_write` (same class as PPCBUG-107). The `stfd*` family and `stfiwx` are +clean (bit-pattern stores with no conversion). Zero unit tests across all 9 opcodes. +**7 IDs used (PPCBUG-165..171). 3 IDs unallocated (PPCBUG-172..174).** + +### PPCBUG-165 — stfs* does not raise FPSCR exception bits (VXSNAN, XX, OX, UX) + +- **Severity**: HIGH +- **Status**: open +- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux) +- **Symptom**: PowerISA requires that `stfs` double→single narrowing raises FPSCR[VXSNAN] for SNaN + input, FPSCR[OX] on overflow to ±∞, FPSCR[UX] on underflow to ±0/denormal, and FPSCR[XX] when the + result is inexact. None of these bits are ever set. The narrowing is done via `ctx.fpr[instr.rs()] as f32` + (x86 `CVTSD2SS`); no FPSCR inspection or update follows. Games that poll FPSCR[OX] to detect + overflow (physics engines clamping large velocities), or FPSCR[VXSNAN] after sentinel SNaN writes, + get false negatives. +- **Canary parity**: Canary also omits these FPSCR updates for `stfs*`. Both share the deviation. +- **Fix**: after the narrowing, check `fpscr::is_snan(src)` → set `VXSNAN`; compare source vs. + f64 round-trip of narrowed value for inexact; compare src.is_finite() && f32.is_infinite() for + overflow. See group-28 report for illustrative code sketch. + +### PPCBUG-166 — stfs* ignores FPSCR.RN; always uses round-to-nearest-even + +- **Severity**: HIGH +- **Status**: open +- **Locations**: interpreter.rs:1284, 1289, 1296, 1301 +- **Symptom**: `ctx.fpr[instr.rs()] as f32` uses the host MXCSR rounding mode, never consulting + `ctx.fpscr & fpscr::RN_MASK`. Any game that configures FPSCR.RN to truncate/ceil/floor and then + stores via `stfs` gets the wrong f32 in memory (wrong by at most 1 ULP). The stfs.md spec + explicitly acknowledges this gap. +- **Canary parity**: Canary also ignores FPSCR.RN for stfs. Both share the deviation. +- **Fix**: read `ctx.fpscr & fpscr::RN_MASK` and set host MXCSR before narrowing, then restore. + Minimum viable: `debug_assert_eq!(ctx.fpscr & fpscr::RN_MASK, 0)` for debug-build visibility. + +### PPCBUG-167 — All 9 FP store arms missing `invalidate_for_write` (PPCBUG-107 class) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux), + 1308 (stfd), 1313 (stfdu), 1320 (stfdx), 1325 (stfdux), 1333 (stfiwx) +- **Symptom**: Same class as PPCBUG-107. Under M3 `--parallel`, a FP store by thread B to a + cache line reserved by thread A via `lwarx` does not clear thread A's reservation table slot. + Thread A's subsequent `stwcx.` spuriously succeeds. Rendering workers using FP stores to shared + transform/particle buffers co-located with spinlock sites are at risk. +- **Fix**: before each `mem.write_f32`/`write_f64`/`write_u32` in every FP store arm: + ```rust + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + ``` + Recommend a single sweep of all store groups (PPCBUG-107, 130, 160, 167) to avoid further drift. + +### PPCBUG-168 — stfs* SNaN narrowing: `as f32` quietens SNaN without raising FPSCR.VXSNAN + +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:1284, 1289, 1296, 1301 +- **Symptom**: When FRS holds an f64 SNaN (bit 51 = 0), `CVTSD2SS` sets the f32 quiet bit (bit 22), + producing a QNaN in memory, without raising FPSCR[VXSNAN]. The stored memory bytes are correct per + IEEE-754 (narrowing an SNaN produces a QNaN). The bug is the missing FPSCR signal, a subset of + PPCBUG-165. **Contrast with PPCBUG-128** (lfs stores wrong FPR bits — HIGH severity; here memory + bytes are right, only the flag is missing). +- **Note**: fixed as a side effect of the PPCBUG-165 fix. No independent code change needed. + +### PPCBUG-169 — stfd* bit-pattern store: confirmed correct (informational) + +- **Severity**: LOW (confirmed clean, informational) +- **Status**: wontfix +- **Locations**: interpreter.rs:1305, 1311, 1317, 1323 +- **Analysis**: `write_f64(ea, fpr)` → `write_u64(ea, fpr.to_bits())` → `val.to_be_bytes()`. Pure + bit-pattern, correct big-endian. SNaN preserved. EA computation and update-form writebacks all + correct. Canary parity confirmed. No bugs. + +### PPCBUG-170 — stfiwx: confirmed correct (informational) + +- **Severity**: LOW (confirmed clean, informational) +- **Status**: wontfix +- **Location**: interpreter.rs:1329-1335 +- **Analysis**: `write_u32(ea, fpr.to_bits() as u32)` correctly extracts the low 32 bits of the + 64-bit FPR as a raw bit pattern (the integer word produced by `fctiw`/`fctiwz`) and stores + big-endian. RA=0 handled correctly. No FPSCR effects required. Canary parity confirmed. No bugs. + +### PPCBUG-171 — Zero unit tests for all 9 store-float opcodes + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs test module +- **Symptom**: No `#[test]` covers any of the 9 FP store arms. Regressions in EA computation, + endianness, update-form writeback order, or double→single narrowing are invisible. +- **Recommended minimum** (10 tests): `stfd` normal + SNaN bit-exact; `stfdu` update writeback; + `stfs` round-trip (1.0); `stfs` overflow (→ ±∞); `stfsx` ra=0; `stfsux` update; `stfiwx` integer + word extract; post-PPCBUG-165 fix: SNaN → FPSCR.VXSNAN set; post-PPCBUG-166 fix: RN=truncate. + +IDs PPCBUG-172 through PPCBUG-174 are unallocated — reserved for group 28 follow-up. + +--- + +## Batch 6 — FPU single-precision (group 29) + +Per-group report: `audit-out/group-29-fpu-single.md`. + +**Context**: The live implementation is substantially more capable than the frozen ppc-manual +snapshots indicated. `to_single()` correctly dispatches on FPSCR.RN; `check_invalid_*` helpers +correctly set VXSNAN, VXISI, VXIMZ, VXZDZ, VXIDI, ZX; `update_after_op` sets OX, UX, and +FPRF. The remaining bugs are: (1) XX/FI/FR (inexact) never set anywhere; (2) fmadd/fmsub +*sx variants missing the VXISI check for the add-phase infinity collision (their *x double +siblings have the same gap); (3) fnmadd/fnmsub NaN sign bit incorrectly flipped by Rust `-`; +(4) fresx produces a full IEEE 1/b instead of the ~12-bit hardware estimate; (5) FPSCR.NI +flush-to-zero not modelled; (6) SNaN→QNaN propagation relies on host SSE behavior rather than +the ISA-canonical derivation. + +**8 IDs used (PPCBUG-180..187). 12 IDs unallocated (PPCBUG-188..199).** + +### PPCBUG-180 — XX / FI / FR bits never set across all FPU *sx opcodes (and double siblings) + +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: `fpscr.rs:184-194` (`update_after_op`); affects interpreter.rs:2252-2494 +- **Symptom**: `FPSCR[XX]` (inexact) should be set whenever the mathematical result of an + FP operation cannot be represented exactly in the destination format (single or double) and + a rounding step occurs. `FPSCR[FI]` (fraction inexact) and `FPSCR[FR]` (fraction rounded) + encode the direction. `update_after_op` sets `OX` (overflow to ±∞) and `UX` (subnormal + result) but has no inexact-detection logic. Since most `*sx` operations on arbitrary inputs + require rounding to single precision, XX is almost always wrong (false zero). Games using + FPSCR polling to check exactness receive false "exact" results. +- **Canary parity**: Canary's `UpdateFPSCR` also does not set XX/FI/FR. Both share this gap. +- **Fix**: In `update_after_op` (or a post-`to_single` helper), compare the pre-round f64 + result with the post-round f64 result. If they differ, set `XX`; inspect the difference sign + to set `FR`; set `FI = FR || (result was not exactly representable)`. + +### PPCBUG-181 — fmaddsx / fnmaddsx missing VXISI check for add-phase ±∞ collision + +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:2339-2348 (fmaddsx), 2383-2392 (fnmaddsx) +- **Symptom**: When `FRA × FRC = +∞` and `FRB = -∞` (or vice versa), PowerISA §4.3.4 + requires `FPSCR[VXISI]` to be set and the result to be a QNaN. The double-precision sibling + `fmaddx` (line 2327) correctly calls `fpscr::check_invalid_add(ctx, a * c, b, false)` after + the multiply-check. `fmaddsx` omits this call entirely — only `check_invalid_mul` runs. + Games using fused-madd in dot-product accumulators that might overflow to ±∞ (e.g. lighting + accumulators with very large normals) lose the VXISI signal. +- **Fix**: + ```rust + // inside fmaddsx arm, after check_invalid_mul: + fpscr::check_invalid_add(ctx, a * c, b, false); + ``` + Same for fnmaddsx (same operand pair, same `false` sense for the add). + +### PPCBUG-182 — fmsubsx / fnmsubsx missing VXISI check for subtract-phase ±∞ collision + +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:2361-2370 (fmsubsx), 2405-2414 (fnmsubsx) +- **Symptom**: When `FRA × FRC = ±∞` and `FRB = ±∞` with the same sign, `(±∞) − (±∞)` + should fire `FPSCR[VXISI]`. Neither `fmsubsx` nor `fnmsubsx` calls `check_invalid_add`. +- **Fix**: + ```rust + // inside fmsubsx arm, after check_invalid_mul: + fpscr::check_invalid_add(ctx, a * c, -b, false); + ``` + Same for fnmsubsx. The negated `b` turns the subtract into the add-form so that + `check_invalid_add(..., false)` uses the correct infinity-sign comparison. + +### PPCBUG-183 — fnmaddsx / fnmsubsx NaN sign bit incorrectly flipped by Rust unary `-` + +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:2388 (fnmaddsx), 2410 (fnmsubsx) +- **Symptom**: `to_single(ctx, -(a.mul_add(c, b)))` — Rust's unary `-f64` always flips the + IEEE sign bit, including when the value is NaN. PowerISA §4.3.2 specifies that the final + negation in `fnmadd`/`fnmsub` is NOT applied to a QNaN result: if the fused computation + yields a NaN (due to SNaN input, VXIMZ, or VXISI), the negation is skipped and the NaN is + propagated with its canonical sign unchanged. xenia-rs flips the sign bit of any NaN result, + producing a QNaN with the wrong sign. Observable by storing via `stfd` and inspecting bits. + Games using sign-bit NaN tagging (e.g. `0xFFC00000` vs `0x7FC00000` as distinct sentinels) + are affected. +- **Fix**: + ```rust + // fnmaddsx arm: + let inner = a.mul_add(c, b); + let result = to_single(ctx, if inner.is_nan() { inner } else { -inner }); + // fnmsubsx arm: + let inner = a.mul_add(c, -b); + let result = to_single(ctx, if inner.is_nan() { inner } else { -inner }); + ``` + +### PPCBUG-184 — fresx produces full-precision IEEE 1/b instead of ~12-bit hardware estimate + +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:2481-2494 +- **Symptom**: `fres` on Xenon hardware produces a reciprocal approximation via a 256-entry + LUT with linear interpolation, accurate to roughly 1/4096 relative error (~12 mantissa + bits). xenia-rs computes `to_single(1.0 / b)` — the fully IEEE-754 correctly-rounded + single-precision reciprocal. The result is up to ~4096× more accurate than hardware. + Newton-Raphson refinement code `x = fres(d); x = x*(2 - d*x)` is not broken by this (NR + converges even from an accurate seed), but code that checks the seed's error magnitude for + convergence termination, or that relies on `fres(d)*d ≠ 1.0` to decide whether to refine, + may take the wrong branch. Also, `fres(d)*d` on xenia is much closer to 1.0 than on hardware, + so a "was the estimate good enough?" check based on the residual will give wrong answers. +- **Canary parity**: Canary uses `f.Recip(f.Convert(frB, FLOAT32_TYPE))` — approximates by + first converting to f32 (quantizing the input), then applying the host reciprocal. Still + produces a fully-accurate IEEE single reciprocal rather than the 12-bit table estimate. + Both emulators share the deviation. Canary's conversion-first approach is slightly closer to + hardware (the input is quantized before the reciprocal), so if a future fix is desired, + Canary's approach is the better reference. +- **Fix (minimal viable)**: Pre-convert input to f32 to match Canary's quantization: + `let b32 = b as f32; to_single(ctx, 1.0_f64 / b32 as f64)`. This matches Canary but still + does not emulate the 12-bit LUT. Full fix requires an `fres` LUT matching Xenon's hardware + table (documented in Xbox 360 SDK / GamePPCLisa docs). + +### PPCBUG-185 — FPSCR.NI flush-to-zero not modelled; subnormal results propagate through *sx + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: All *sx arms in interpreter.rs; fpscr.rs has `NI` not defined as a constant +- **Symptom**: Xenon firmware sets `FPSCR.NI = 1` at boot. With NI=1, the Xenon FPU flushes + subnormal inputs and results to the appropriate signed zero before and after every + floating-point operation. xenia-rs inherits the host x86 IEEE-754 default (NI=0), which + propagates subnormals. Subnormal differences: (a) subnormal FPR inputs are used as-is by + xenia vs. treated as ±0 by hardware; (b) subnormal results are stored by xenia vs. flushed + to ±0 by hardware. `update_after_op` sets `UX` when the result is subnormal, but does NOT + flush it. Games with NI-dependent behavior — most Xbox 360 titles compiled with default + Xenon ABI settings — may see different float results in subnormal-touching paths. +- **Canary parity**: Canary also inherits host IEEE NI=0 semantics. Both share this gap. +- **Fix**: After `to_single` (or the double-precision result), check `ctx.fpscr & fpscr::NI_BIT` + (needs a constant adding) and if set, flush subnormals: `if result.is_subnormal() { result = + result.signum() * 0.0 }`. Apply to inputs as well for strict correctness. + +### PPCBUG-186 — SNaN → QNaN propagation relies on host SSE; not ISA-canonical for all *sx + +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:2252-2414 (all arithmetic *sx arms without explicit SNaN guard) +- **Symptom**: When an SNaN input reaches `faddsx`/`fsubsx`/`fmulsx`/`fdivsx`, the code calls + `check_invalid_add/mul/div` (correctly sets VXSNAN) but then performs the operation on the + raw SNaN value: `a + b`, `a * c`, etc. On x86-64 SSE2, the hardware `ADDSD`/`MULSD` ops + produce a QNaN from the first SNaN operand (bit 51 set, other mantissa bits preserved). This + matches ISA §4.3.2.2 for the common case. However, for `mul_add` (VFMADD231SD on AVX), the + SNaN propagation priority may differ: the ISA specifies FRA takes priority over FRB, but + hardware FMA may use a different priority for the three-operand form. The `fsqrtsx` and + `fresx` arms handle SNaN explicitly (via `is_snan` check) but do not synthesize the correct + QNaN result — they rely on `b.sqrt()` / `1.0/b` to produce a NaN, which the host does. + This is a latent risk; active wrong-result cases require bit-level NaN inspection. + +### PPCBUG-187 — Zero interpreter execution tests for all 10 group-29 opcodes + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs test module (no `#[test]` covers any *sx or fresx) +- **Symptom**: Regressions in rounding, FPSCR side effects, or operand-field decoding are + invisible to CI. The existing fpscr unit tests cover helper functions in isolation; no test + exercises the full `step()` path for any single-precision FPU opcode. +- **Recommended minimum** (12 tests — see group-29 report for encodings): + `fadds` exact; `fadds` VXISI; `fsubs` VXISI; `fmuls` 0×∞; `fdivs` ZX; + `fmadds` VXISI regression (PPCBUG-181); `fmsubs` VXISI regression (PPCBUG-182); + `fnmadds` NaN-sign (PPCBUG-183); `fnmsubs` NaN-sign (PPCBUG-183); + `fsqrts` negative input VXSQRT; `fsqrts` round-trip; `fres` basic reciprocal. + +IDs PPCBUG-188 through PPCBUG-199 are unallocated — reserved for group 29 follow-up. + +--- + +## Batch 6 (continued) — FPU arithmetic double (group 30) + +Per-group report: `audit-out/group-30-fpu-double.md`. + +Group 30 summary: **9 findings (PPCBUG-200..208). 2 MEDIUM cross-cutting, 3 MEDIUM opcode-specific, 4 LOW.** Result arithmetic is correct for all 10 opcodes. FPSCR infrastructure is partially wired: VXSNAN, OX, UX, ZX, VXISI (add/sub), VXIMZ, VXZDZ, VXIDI, VXSQRT all set correctly for the opcodes that need them. Critical gaps: (1) XX/FR/FI bits never set by any opcode — same gap as PPCBUG-180 but now confirmed on the double-precision path; (2) FPSCR.RN not honored for double arithmetic — single-precision has `round_to_single` but double has no equivalent; (3) fmsubx/fnmaddx/fnmsubx omit the VXISI check for ∞-collision in the add step; (4) fnmaddx/fnmsubx flip NaN sign bit via Rust `-` operator but ISA requires NaN sign preserved. frsqrtex uses full-precision 1/sqrt(b) instead of the hardware estimate — acceptable. All FMA forms use `f64::mul_add` for correct single-rounding semantics. +**9 IDs used (PPCBUG-200..208). 11 IDs unallocated (PPCBUG-209..219).** + +### PPCBUG-200 — All group-30 opcodes: XX, FR, FI bits never set +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `fpscr.rs:184-194` (`update_after_op`); `interpreter.rs:2248,2268,2289,2310,2335,2357,2379,2401,2463,2510` +- **Symptom**: Same gap as PPCBUG-180 but confirmed for the double-precision path. `update_after_op` only tracks OX (overflow to infinity) and UX (subnormal). FPSCR[XX] (inexact sticky), FPSCR[FR] (round direction), and FPSCR[FI] (inexact for current op) are never updated by any group-30 opcode. Every double-precision arithmetic operation that rounds a non-representable result silently omits these bits. +- **Fix**: Same as PPCBUG-180 — read MXCSR exception flags after each f64 operation and map to FI/XX/FR. For double, no `to_single` step is involved so the comparison must be done via MXCSR or by a post-op bit-level comparison of inputs vs. result. +- **Test gap**: Zero tests verify XX set after any inexact double-precision operation. + +### PPCBUG-201 — All group-30 opcodes: FPSCR.RN not honored for double arithmetic +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:2242-2512` (all 10 arms) +- **Symptom**: Host f64 operators always use nearest-even (host MXCSR default). `fpscr.rs` has a complete `rounding_mode(ctx)` helper and directed rounding helpers for single-precision (`round_to_single`), but no equivalent for double arithmetic. Guest `mtfsfi` RN changes have no effect on faddx/fsubx/fdivx/fsqrtx etc. +- **Fix**: Wrap each double-precision arithmetic arm with an MXCSR round-mode set/restore when `ctx.fpscr & fpscr::RN_MASK != 0`. Fast path (RN=0) stays zero-cost. +- **Test gap**: No test changes RN and verifies directed rounding on any double arithmetic opcode. + +### PPCBUG-202 — fmaddx: non-FMA `a * c` used in check_invalid_add can spuriously raise/miss VXISI +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:2332` +- **Symptom**: `check_invalid_add(ctx, a * c, b, false)` uses a separate two-rounding multiply to approximate the FMA intermediate product. When the true FMA intermediate is finite but the standalone product overflows to ±∞, VXISI fires spuriously. When the true intermediate is ±∞ but the standalone product is finite (extreme cancellation), VXISI is missed. +- **Fix**: Derive VXISI from input-value properties directly: if `(a.is_infinite() || c.is_infinite())` (product is mathematically infinite) and `b.is_infinite()` with opposing sign → VXISI. +- **Test gap**: No test covers the large-value cancellation case in fmaddx. + +### PPCBUG-203 — fmsubx, fnmaddx, fnmsubx: VXISI never raised for ∞-collision in add/sub step +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: `interpreter.rs:2354` (fmsubx), `2376` (fnmaddx), `2398` (fnmsubx) +- **Symptom**: Same pattern as PPCBUG-181/182 for the double-precision variants. These three arms call only `check_invalid_mul` and omit `check_invalid_add`. Per ISA, all four FMA variants must raise VXISI when the add step yields ∞+∓∞. Example for fmsub: `A×C = +∞`, `B = +∞` → `+∞ − +∞` → VXISI. Currently the result NaN propagates silently with no FPSCR update. The fnmsub pattern is the canonical Newton-Raphson step — the most common FPU path in Xbox 360 graphics code. +- **Fix**: Add `fpscr::check_invalid_add(ctx, a * c, b, true)` for `fmsubx`/`fnmsubx` and `fpscr::check_invalid_add(ctx, a * c, b, false)` for `fnmaddx` (apply PPCBUG-202 sign-fix simultaneously). +- **Test gap**: Zero tests for VXISI on any of the three opcodes. + +### PPCBUG-204 — fmaddx check_invalid_add sub-issue (sign logic reliant on imprecise product) +- **Severity**: LOW (sub-issue of PPCBUG-202) +- **Status**: open +- **Location**: `interpreter.rs:2332` +- **Symptom**: VXISI logic is internally consistent with the passed `a * c` value, but that value can have the wrong sign in extreme overflow/underflow cases. Resolve as part of PPCBUG-202. + +### PPCBUG-205 — fnmaddx / fnmsubx: Rust `−` flips NaN sign bit; ISA requires NaN sign preserved +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: `interpreter.rs:2377` (fnmaddx), `interpreter.rs:2399` (fnmsubx) +- **Symptom**: Same pattern as PPCBUG-183 for the double-precision variants. Rust's unary `-` applied to a NaN result flips the IEEE-754 sign bit. PowerISA Book I §4.3.4 states the negation is not applied to NaN results. Title code using NaN sentinels (audio middleware, debug fills) receives sign-flipped NaN payloads. +- **Fix**: + ```rust + let fma = a.mul_add(c, b); // fnmaddx + let result = if fma.is_nan() { fma } else { -fma }; + // and analogously for fnmsubx + ``` +- **Test gap**: No test exercises fnmaddx/fnmsubx with NaN-producing inputs to check sign of result NaN. + +### PPCBUG-206 — frsqrtex edge cases correct; no code change needed (informational) +- **Severity**: LOW (confirmed clean, informational) +- **Status**: wontfix +- **Location**: `interpreter.rs:2496-2512` +- **Analysis**: ZX fires for ±0. VXSQRT guard correctly excludes -0.0. frsqrte(+∞)=+0 correct. Full-precision is acceptable over-precision. +- **Fix**: Add comment: `// Full-precision: hardware gives ~12-14 bit estimate. NR converges identically.` +- **Test gap**: Zero frsqrtex unit tests — add 4 (±0 inputs, negative input+VXSQRT, SNaN, +∞). + +### PPCBUG-207 — FMA opcode OX logic correct, OX edge cases untested (informational) +- **Severity**: LOW (confirmed clean, informational) +- **Status**: wontfix +- **Location**: `interpreter.rs:2335,2357,2379,2401` +- **Analysis**: `inputs_were_finite` correctly suppresses OX when an input is already infinite. OX fires when all inputs are finite but the FMA result overflows — ISA-correct. +- **Test gap**: Zero tests for OX scenario in any FMA opcode. + +### PPCBUG-208 — Zero tests for fsubx, fdivx, fmsubx, fnmaddx, fnmsubx, fsqrtx, frsqrtex +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` test module +- **Symptom**: 7 of 10 group-30 opcodes have zero tests. `faddx` has 1 happy-path test; `fmulx` has 1; `fmaddx` has 1. None have FPSCR/Rc=1/edge-case coverage. +- **Recommended minimum** (12 tests): `fsubx` normal; `fsubx` VXISI; `fdivx` normal; `fdivx` ZX; `fdivx` VXZDZ; `fmsubx` normal; `fnmaddx` normal; `fnmsubx` normal; `fnmaddx` NaN-sign regression (PPCBUG-205); `fsqrtx` normal; `fsqrtx` negative+VXSQRT; `frsqrtex` positive. + +IDs PPCBUG-209 through PPCBUG-219 are unallocated — reserved for group 30 follow-up. + +--- + +## Pending batches + +- Batch 2: groups 6-11 — logical immediate, logical register, sign-extend/CLZ, word rotate, doubleword rotate, shift. +- Batch 3: groups 12-17 — compare, branch, trap+sc, CR logical, SPR/MSR, cache+sync. +- Batch 4: groups 18-23 — loads (byte, halfword, word, doubleword, multiple/string, float). +- Batch 5 (partial): groups 24, 26, 27, 28 done; group 25 (store word) pending. +- Batch 6 (partial): groups 29, 30 done; group 31 (FPU convert/compare) pending. +- Batch 7: groups 32-34 — VMX integer (add/sub, compare/min/max, logical/shift). +- Batch 8: groups 35-38 — VMX permute/pack, VMX float, VMX multiply-sum, VMX load/store. +- Phase C: decoder field extractors, decoder opcode-lookup, disassembler formatter parity. +- Phase D: this file gets re-sorted by severity and finalized. + +--- + +## Batch 6 (continued) — FPU sign/move/compare/convert/round (group 31) + +Per-group report: `audit-out/group-31-fpu-misc.md`. + +Group 31 summary: **9 findings (PPCBUG-221..231; IDs 220/222/226 retracted after analysis). +1 HIGH, 3 MEDIUM, 5 LOW.** The sign-bit manipulation family (`fabsx`, `fnegx`, `fnabsx`, `fmrx`) +and `fselx` are all ISA-correct — Rust arithmetic maps to bit-level operations that preserve SNaN +payloads. `fcmpu` is correct (FPRF and VXSNAN set; no spurious VXVC). The conversion group is +mostly correct for result values and overflow sentinels; the main gaps are FPSCR inexact/FR/FI +tracking (shared with groups 29/30) and one subtle NearestEven tie-breaking defect in +`round_to_i64` that affects `fctidx`. `fcmpo` silently omits VXSNAN/VXVC despite having a +comment acknowledging the gap. + +**9 IDs used (PPCBUG-221, 223, 224, 225, 227, 228, 229, 230, 231). IDs 220/222/226 retracted. +IDs PPCBUG-232..239 unallocated.** + +### PPCBUG-221 — `fctidx` / `round_to_i64` NearestEven tie-breaking uses f64::EPSILON; broken for |v| > 2^52 + +- **Severity**: HIGH +- **Status**: open +- **Location**: `fpscr.rs:220–238` (`round_to_i64`, `NearestEven` case) +- **Symptom**: The tie-breaking code computes `diff = (v - v.trunc()).abs()` and tests + `(diff - 0.5).abs() < f64::EPSILON` to detect a half-integer. Above `|v| = 2^52`, + `v.trunc() == v` for all representable f64 values (all are exact integers), so `diff == 0.0` + and the tie-breaking branch is never taken — the code falls through to `v.round() as i64`, + which is round-half-away-from-zero instead of round-half-to-even. Every fctid call on a + large odd half-integer (e.g. `(2^52 + 1).5`) produces the wrong integer. In practice these + exact 0.5 cases are rare for large values but can appear in audio sample-count arithmetic + and physics fixed-point pipelines. +- **Fix**: replace the NearestEven arm with a fractional-part-only tie check that is exact for + |v| <= 2^52 and degenerates correctly to truncation above 2^52: + ```rust + RoundingMode::NearestEven => { + let t = v.trunc(); + let frac = v - t; // exact for |v| <= 2^52; ==0 above (already integer) + let fa = frac.abs(); + if fa > 0.5 { t as i64 + if v >= 0.0 { 1 } else { -1 } } + else if fa < 0.5 { t as i64 } + else { + // Exact 0.5 tie — round to even. + let fi = t as i64; + if fi & 1 == 0 { fi } else { fi + if v >= 0.0 { 1 } else { -1 } } + } + } + ``` +- **Test gap**: add `round_to_i64` tests in `fpscr.rs:tests`: 0.5→0, 1.5→2, 2.5→2, 3.5→4, + -0.5→0, -1.5→-2. Existing tests cover 2.5→2 and 3.5→4 (currently accidentally correct). + +### PPCBUG-223 — `fcmpo` omits FPSCR[VXSNAN] and FPSCR[VXVC] on NaN operands + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:2645–2675` +- **Symptom**: `fcmpo` body is identical to `fcmpu` — it sets FPRF and the CR field correctly + but calls no `fpscr::set_exception`. PowerISA requires: QNaN → `FPSCR[VXVC, VX, FX]`; + SNaN → additionally `FPSCR[VXSNAN]`. `fcmpu` correctly sets VXSNAN for SNaN; `fcmpo` does + not. A comment in the source acknowledges "not modeled yet." +- **Impact**: `fcmpo.` (Rc=1) checking CR1.FX after a NaN compare will see FX=0 instead of + FX=1. `mffsx` after `fcmpo` will not reflect VXVC. Xbox 360 CRT comparison primitives + (`islessgreater`, ordered relational operators) use `fcmpo`. +- **Fix**: + ```rust + if fra.is_nan() || frb.is_nan() { + ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: false, so: true }; + if fpscr::is_snan(fra) || fpscr::is_snan(frb) { + fpscr::set_exception(ctx, fpscr::VXSNAN | fpscr::VXVC); + } else { + fpscr::set_exception(ctx, fpscr::VXVC); + } + } + ``` + +### PPCBUG-224 — `fcfidx` does not set FPSCR[XX/FX] for inexact i64→f64 conversion + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:2528–2536` +- **Symptom**: Only FPRF is updated. Per ISA, `fcfid` sets `FPSCR[XX, FX]` (and FR/FI) when + the i64 value has more than 53 significant bits and precision is lost. Any i64 with + `|v| > 2^53` triggers inexact. Common trigger: large frame/sample counters, address values. +- **Fix**: after the conversion, compare `(result as i64) != (bits as i64)` and call + `fpscr::set_exception(ctx, fpscr::XX)` if inexact. + +### PPCBUG-225 — `frspx` does not set FPSCR[XX/FX/FR/FI] on inexact rounding + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:2516–2527` +- **Symptom**: `update_after_op` sets OX/UX only. The ISA requires FR/FI/XX/FX on any f64→f32 + rounding that is not exact. `frsp` is the canonical double→single-precision narrowing idiom + in compiler output — virtually every call is inexact. +- **Fix**: after `to_single`, compare result vs b; if different and both finite, call + `fpscr::set_exception(ctx, fpscr::XX | fpscr::FI | ...)` with FR set if magnitude increased. + +### PPCBUG-227 — `fctiwx` rounding: `round_to_i32` inherits NearestEven defect via `round_to_i64` + +- **Severity**: LOW +- **Status**: open +- **Location**: `fpscr.rs:241–243` +- **Symptom**: `round_to_i32` calls `round_to_i64` then clamps. The PPCBUG-221 defect in + `round_to_i64` does not manifest for i32-range values (the epsilon check accidentally works + at this scale), but the structural fragility is inherited. Fixing PPCBUG-221 cures this. +- **Recommendation**: add unit tests `round_to_i32(0.5)==0`, `round_to_i32(1.5)==2`, + `round_to_i32(2.5)==2` to verify correct round-to-even behavior. + +### PPCBUG-228 — Zero interpreter execution tests for fabsx/fnegx/fnabsx/fmrx/fselx/fcmpo/fcfidx/fctidx/fctidzx/frspx + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` `#[cfg(test)]` module +- **Symptom**: 10 of the 13 group-31 opcodes have zero dedicated tests. `test_fcmpu` covers + only the ordered comparison `5.0 > 3.0`. `test_fctiwzx` covers one positive truncation. + `test_fadd`/`test_fmul` are group-30 tests, not group-31. +- **Recommended minimum**: SNaN-preservation test for fabsx/fnegx/fnabsx; fselx with NaN/−0/−1; + fcmpo QNaN→VXVC (after PPCBUG-223 fix); fcfidx exact and inexact; fctidx tie cases; frspx + inexact → XX set (after PPCBUG-225 fix); fctiwx nearest-even tie; fctiwzx NaN sentinel. + +### PPCBUG-229 — `fctidx` / `fctidzx` do not set FPSCR[XX/FX] for inexact inputs + +- **Severity**: LOW +- **Status**: open +- **Locations**: `interpreter.rs:2537–2574` +- **Symptom**: Per ISA, float-to-integer conversions set `FPSCR[XX, FX]` when the source + value is not an integer (the fractional part is discarded). Neither opcode sets XX. + Shared root cause with PPCBUG-224/225. + +### PPCBUG-230 — `fctiwx` / `fctiwzx` do not set FPSCR[XX/FX] for inexact inputs + +- **Severity**: LOW +- **Status**: open +- **Locations**: `interpreter.rs:2575–2612` +- **Symptom**: Same omission as PPCBUG-229 for the word-width conversion pair. + +### PPCBUG-231 — `frspx` SNaN input result written as QNaN (host platform dependency) + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:2519–2524` +- **Symptom**: Rust's `as f32` (CVTSD2SS) can set the quiet bit on SNaN input, producing a + QNaN in the FPR. Per ISA, `frsp` on SNaN should quieten it — so the QNaN result is + correct in kind. The risk is that the exact QNaN bit-pattern may differ from PPC's + canonical quietening (which ORs bit 22 into the f32 mantissa). Game code inspecting the + NaN payload after frsp may see a different payload. Same structural root cause as + PPCBUG-128 (`lfs` SNaN quietening), but lower severity because frsp IS arithmetic. + +IDs PPCBUG-232 through PPCBUG-239 are unallocated — no further bugs found in group 31. + +--- + +## Batch 7 — VMX integer add/sub (group 32) + +Per-group report: `audit-out/group-32-vmx-int-addsub.md`. + +**Scope**: `vaddubm`, `vaddubs`, `vadduhm`, `vadduhs`, `vadduwm`, `vadduws`, `vaddsbs`, `vaddshs`, +`vaddsws`, `vaddcuw`, `vsububm`, `vsububs`, `vsubuhm`, `vsubuhs`, `vsubuwm`, `vsubuws`, `vsubsbs`, +`vsubshs`, `vsubsws`, `vsubcuw`. + +**Overall verdict**: All 20 opcodes are arithmetically correct. No HIGH-severity bugs found. +Lane indexing (big-endian, PPC element 0 = `Vec128::bytes[0]`), saturation arithmetic, VSCR.SAT +sticky-set, and vaddcuw/vsubcuw carry/borrow semantics are all implemented correctly. +4 LOW-severity findings (2 test gaps, 1 code organization, 1 API hazard). + +### PPCBUG-240 — 18 of 20 group-32 opcodes have zero interpreter-level tests + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` `#[cfg(test)]` module +- **Symptom**: Only `test_vaddubs_saturates_and_sets_vscr_sat` covers any group-32 opcode. + `vaddubm`, `vsububm`, `vadduhm`, `vsubuhm`, `vadduwm`, `vsubuwm`, `vaddsbs`, `vsubsbs`, + `vadduhs`, `vsubuhs`, `vaddshs`, `vsubshs`, `vadduws`, `vsubuws`, `vaddsws`, `vsubsws`, + `vaddcuw`, `vsubcuw` — all 18 have no tests. No high risk today but no regression guard. +- **Recommended minimum**: wrap-around test (byte, halfword, word); sat-at-max and sat-at-min tests; + VSCR.SAT sticky-set across two successive saturating instructions; vaddcuw carry lane; vsubcuw + no-borrow lane. + +### PPCBUG-241 — `vadduwm` / `vsubuwm` stranded in a separate section from the rest of group-32 + +- **Severity**: LOW (maintenance hazard) +- **Status**: open +- **Location**: `interpreter.rs:2090–2104` (stranded) vs. `interpreter.rs:2784` (§4a group-32 section) +- **Symptom**: The two word-modulo opcodes are matched 700 lines above the rest of the group, with + only a comment at line 2819 as a cross-reference. A future sweep of §4a for group-32 changes + would miss them. +- **Fix**: Move both arms into §4a and remove the comment at line 2819. + +### PPCBUG-242 — `set_vscr_sat(false)` can non-stickily clear SAT from arithmetic handlers + +- **Severity**: LOW (API hazard) +- **Status**: open +- **Location**: `context.rs:252–259` +- **Symptom**: `set_vscr_sat(bool)` accepts `false`, which would clear the sticky SAT bit. All + current arithmetic callers pass `true` only (inside `if sat { ... }` guards), so no mis-clear + occurs today. But the API is misleading — a future saturating handler that writes + `set_vscr_sat(lane_sat)` with `lane_sat = false` would silently clear a previously-set bit. +- **Fix**: Rename to `sticky_set_vscr_sat()` (no bool argument, always ORs). Retain + `force_vscr_sat(bool)` for `mtvscr`. + +### PPCBUG-243 — `vmx.rs` saturation helpers: u16/i16/u32/i32 variants have zero unit tests + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `crates/xenia-cpu/src/vmx.rs:705–799` +- **Symptom**: `vmx.rs` tests cover 5 cases of `sat_add/sub_i8/u8`. The 8 helpers for wider + types (`sat_add_u16`, `sat_sub_u16`, `sat_add_i16`, `sat_sub_i16`, `sat_add_u32`, `sat_sub_u32`, + `sat_add_i32`, `sat_sub_i32`) are mathematically correct but unguarded by any test. Recommended + additions listed in the per-group report. + +IDs PPCBUG-244 through PPCBUG-274 are unallocated — no further bugs found in group 32. + +--- + +## Batch 7 — VMX integer compare / min / max / avg (group 33) + +Per-group report: `audit-out/group-33-vmx-int-compare.md`. + +### PPCBUG-275 — All VC-form vector compare dot forms: `rc_bit()` reads wrong bit; CR6 never updated + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Affected opcodes**: `vcmpequb.`, `vcmpequh.`, `vcmpgtsb.`, `vcmpgtsh.`, `vcmpgtub.`, `vcmpgtuh.` +- **Location**: `decoder.rs:75` + `interpreter.rs:3318`, `3331`, `3344`, `3357`, `3370`, `3383` +- **Symptom**: `rc_bit()` is implemented as `self.raw & 1 != 0` (reads LSB = bit 0 of the word). + For VC-form instructions the Rc flag is at **PPC bit 21 = LSB bit 10**, not bit 0. Bit 0 is + the LSB of the 10-bit XO field. All integer compare XO values are even (XO=6, 70, 518, 774, 582, 838), + so their bit 0 is always 0. The CR6 update block is **unconditionally dead** regardless of + whether the programmer wrote the dot form. `vcmpequb. vMask, vData, vNeedle` + `bc 12,26` + (branch on CR6.LT = all-true) is the canonical AltiVec memchr idiom; it will always fall through. +- **Fix**: + ```rust + // decoder.rs — add: + /// Rc bit for VC-form vector compare instructions (PPC bit 21 = LSB bit 10). + #[inline] pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 } + ``` + Replace `instr.rc_bit()` with `instr.vc_rc_bit()` at interpreter.rs:3318, 3331, 3344, 3357, + 3370, 3383. + +### PPCBUG-276 — `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`: same VC-form Rc bug + +- **Severity**: MEDIUM +- **Status**: applied (52b05b1, 2026-05-01) +- **Affected opcodes**: `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.` +- **Location**: `interpreter.rs:2237`, `3396`, `3406` +- **Symptom**: Same root cause as PPCBUG-275. XO for vcmpequw=134, vcmpgtuw=646, vcmpgtsw=902 — + all even, bit 0 always 0. Word-compare dot forms never update CR6. `vcmpequw128` uses the + VMX128_R Rc encoding which also likely reads the wrong bit. +- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:2237, 3396, 3406. Separately verify + VMX128_R Rc bit position for `vcmpequw128` (may require its own extractor). + +### PPCBUG-277 — Zero tests for all `vcmp*` dot forms and CR6 correctness + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` `#[cfg(test)]` module +- **Symptom**: No test exercises any of the 10 integer vector compare opcodes. Critical missing: + `vcmpequb.` all-true → CR6.LT=1; `vcmpequb.` all-false → CR6.EQ=1; `vcmpgtsb` signed + boundary (0x80 vs 0x7F must yield false, not true); `vcmpgtsh` at 0x8000 vs 0x7FFF. + +### PPCBUG-278 — Zero tests for all 12 `vmax*` / `vmin*` opcodes + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` `#[cfg(test)]` module +- **Symptom**: None of vmaxub/uh/uw/sb/sh/sw, vminub/uh/uw/sb/sh/sw are tested. Critical missing: + `vmaxsb(0x80, 0x7F)` = 0x7F (signed max of -128 and +127); `vminsb(0x80, 0x7F)` = 0x80. + Without these, signed vs unsigned confusion in min/max would not be caught. + +### PPCBUG-279 — Zero tests for all 6 `vavg*` opcodes; no signed-boundary or rounding coverage + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` `#[cfg(test)]` module; `vmx.rs` test module +- **Symptom**: `avg_u8` through `avg_i32` helpers have no unit tests. Key rounding case: + `avg_u8(0, 1)` must be 1 (round up), not 0 (truncation). `avg_i32(i32::MIN, i32::MIN)` must + be `i32::MIN` without overflow. + +IDs PPCBUG-280 through PPCBUG-314 are unallocated — no further bugs found in group 33. + +--- + +## Batch 6 — VMX integer logical / shift / rotate / select (group 34) + +Per-group report: `audit-out/group-34-vmx-logic-shift.md`. + +Group 34 summary: the bitwise logical ops (vand/vandc/vor/vxor/vnor and their 128 variants) +are all ISA-correct — Vec128 is `[u8; 16]` with no padding bits, so `!(u32)` flips exactly +32 bits per lane with no upper-bit pollution (the PPCBUG-029/030/031 class does not apply to +VMX register files). The per-lane shifts (vslb/vsrb/vsrab, vslh/vsrh/vsrah, vslw/vsrw/vsraw +and their 128 variants) all correctly mask the shift count to the lane width before shifting; +vsraw uses i32 arithmetic right shift which is correctly defined in Rust for shift-by-31. +The per-lane rotates (vrlb/vrlh/vrlw and 128 variants) are correct. The whole-register bit +shifts (vsl/vsr) and whole-register byte shifts (vslo/vsro and 128 variants) correctly +extract the shift count from VB.b[15] with the proper bit masks. vsel and vsel128 are correct +including the read-before-write ordering on vsel128's vc=vd aliasing. + +**One HIGH bug found**: vrlimi128 extracts both the rotate-amount (z) field and the +blend-mask (IMM) field from the wrong bit positions of the instruction word. + +**0 MEDIUM bugs with code change needed. 1 HIGH. 10 LOW (test gaps and informational).** + +### PPCBUG-315 — vrlimi128 z and IMM fields extracted from wrong bit positions + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: interpreter.rs:3551–3552 +- **Symptom**: `shift = ((instr.raw >> 16) & 0x3)` reads integer bits 16–17 — the low 2 bits + of the 5-bit IMM (blend-mask) field — instead of the 2-bit `z` (rotate) field at integer + bits 6–7. `mask = (instr.raw >> 2) & 0xF` reads integer bits 2–5 — VD128h extension bits + and a reserved field — instead of the low 4 bits of IMM at integer bits 16–19. + **Every `vrlimi128` executes with a wrong rotate amount and a wrong per-word select mask.** + The only benign case is the degenerate encoding where `z == IMM[1:0]` and the garbage mask + happens to equal the intended mask — unlikely in real code. +- **VX128_4 field layout** (LSB-0 integer bit numbering after PPC big-endian byte-swap to host): + - `VD128l : 5` at integer bits 21–25 (PPC bits 6–10) + - `IMM : 5` at integer bits 16–20 (PPC bits 11–15) — blend mask, 4 bits used + - `VB128l : 5` at integer bits 11–15 (PPC bits 16–20) + - `z : 2` at integer bits 6–7 (PPC bits 24–25) — rotate amount 0..3 + - `VD128h : 2` at integer bits 2–3 (PPC bits 28–29) +- **Fix**: + ```rust + let shift = ((instr.raw >> 6) & 0x3) as usize; // z field: integer bits 6-7 + let mask = (instr.raw >> 16) & 0xF; // IMM low 4 bits: integer bits 16-19 + ``` +- **Canary reference**: `ppc_decode_data.h:585–608` `FormatVX128_4`; `ppc_emit_altivec.cc:1318,1324`. +- **Note**: the rotate logic (`b[(shift + i) % 4]`) and mask-select logic (`(mask >> (3-i)) & 1`) + in the interpreter body are ISA-correct — only the field extraction is wrong. +- **Test gap**: no interpreter execution test for vrlimi128 (PPCBUG-325). + +### PPCBUG-316 — Zero interpreter execution tests for vslb/vsrb/vsrab (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs:3440–3463 + +### PPCBUG-317 — Zero interpreter execution tests for vslh/vsrh/vsrah (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs:3472–3503 + +### PPCBUG-318 — vslo/vsro byte-shift count max is 15 (correct; informational) + +- **Severity**: LOW (informational / wontfix) +- **Status**: wontfix +- `N` is a 4-bit field; max shift is 15 bytes = 120 bits (not 128). VD retains + the 8 LSBs of VA in position [127:120] at N=15. ISA-correct. + +### PPCBUG-319 — vsel128 vc=vd read-before-write ordering (correct; informational) + +- **Severity**: LOW (informational / wontfix) +- **Status**: wontfix +- `c = ctx.vr[vc]` is read before `ctx.vr[vd] = result`. Correctly sequenced. + +### PPCBUG-320 — Zero interpreter execution tests for vslw/vsrw/vsraw/vrlw (+128 variants) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs:2108–2155 + +### PPCBUG-321 — Zero interpreter execution tests for vsl/vsr + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs:3508–3521 + +### PPCBUG-322 — Zero interpreter execution tests for vslo/vsro (+128 variants) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs:3523–3541 + +### PPCBUG-323 — Zero interpreter execution tests for vand/vandc/vor/vxor/vnor (+128 variants) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs:1900–1944 + +### PPCBUG-324 — Zero interpreter execution tests for vsel/vsel128 + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs:1945–1967 + +### PPCBUG-325 — Zero interpreter execution tests for vrlb/vrlh/vrlw/vrlimi128 (+128 variants) + +- **Severity**: LOW (test gap; fix PPCBUG-315 before writing vrlimi128 tests) +- **Status**: open +- **Location**: interpreter.rs:3464–3503, 2144–2155, 3550–3565 + +IDs PPCBUG-326 through PPCBUG-354 are unallocated — no further bugs found in group 34. + +--- + +## Batch 8 — VMX permute / merge / splat / pack / unpack (group 35) + +Per-group report: `audit-out/group-35-vmx-permute.md`. + +**Summary**: 5 HIGH, 3 MEDIUM, 9 LOW. Four VX128_* field-extraction bugs; one missing post-pack permutation logic; VSCR.SAT and pack saturation semantics are all correct. Zero interpreter tests for any group-35 opcode. + +### PPCBUG-360 — vperm128: VC register read from wrong field (vd128() instead of VX128_2 VC bits 23-25) + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `interpreter.rs:1979` +- **Symptom**: `vperm128` uses the VX128_2 instruction form. The permute-control register VC is a 3-bit field at PPC bits 23-25 (LSB integer bits 6-8). The code does `vc = instr.vd128()` which reads PPC bits 6-10 + 21-22 — a completely different set of bits. Every `vperm128` therefore permutes with a control vector read from the wrong register, producing garbage output. `vperm128` is one of the most-used VMX128 ops in Xbox 360 graphics code (texture/vertex data layout). +- **Fix**: + ```rust + // decoder.rs — add accessor: + #[inline] pub fn vc128_2(&self) -> usize { ((self.raw >> 6) & 0x7) as usize } + // interpreter.rs:1979 — replace: + vc = instr.vc128_2(); // VX128_2 VC field at PPC bits 23-25 + ``` +- **ISA ref**: `ppc-manual/vmx/vperm.md`, VX128_2 encoding; `ppc_decode_data.h:541-561`; `ppc_emit_altivec.cc:1203-1204` (`VX128_2_VC`). + +### PPCBUG-361 — vsldoi128: SH field MSB reads bit 4 (reserved) instead of bit 9 + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `interpreter.rs:2012` +- **Symptom**: VX128_5 SH is a 4-bit field at LSB integer bits 6-9. Code does `((raw >> 6) & 0x7) | (((raw >> 4) & 0x1) << 3)`. This reads bit 4 (a reserved field, always 0 in valid encodings) as the MSB of SH instead of bit 9. Shifts of 8-15 bytes silently resolve as shifts of 0-7 bytes. `vsldoi128` with `SH >= 8` (common in vector rotation patterns) always produces the wrong result. +- **Fix**: + ```rust + let sh = ((instr.raw >> 6) & 0xF) as usize; // SH field: integer bits 6-9 + ``` +- **ISA ref**: `ppc-manual/vmx/vsldoi.md`, VX128_5 encoding; `ppc_decode_data.h:609-634`; canary `VX128_5_SH`. + +### PPCBUG-362 — vpermwi128: PERMh (high 3 bits of 8-bit PERM immediate) read from VD128l bits instead of bits 6-8 + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `interpreter.rs:4089` +- **Symptom**: VX128_P PERM = `PERMl[4:0] | (PERMh[2:0] << 5)` where PERMl is at integer bits 16-20 and PERMh is at integer bits 6-8. Code does `(raw >> 16) & 0xFF` which reads bits 16-23. Bits 21-23 are VD128l[4:2], not PERMh. The top 3 bits of the 8-bit PERM immediate are wrong; output word lane selections for lanes 0 and 1 are controlled by garbage bits. Same pattern as PPCBUG-315. +- **Fix**: + ```rust + let imm = ((instr.raw >> 16) & 0x1F) | (((instr.raw >> 6) & 0x7) << 5); // VX128_P PERM + ``` +- **ISA ref**: `ppc_decode_data.h:664-686`; `ppc_emit_altivec.cc:1214`. + +### PPCBUG-363 — vpkd3d128: post-pack permutation (pack + z fields) entirely absent; output always placed in wrong lane when pack != 0 + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `interpreter.rs:3783-3808` +- **Symptom**: Canary's `vpkd3d128` does three things: (1) pack VB by type, (2) permute the result with the existing VD register using a control determined by `pack` (IMM[1:0]) and `shift` (z field at integer bits 6-7), (3) store. Xenia-rs does only (1) and (3), skipping the entire lane-placement permutation. When `pack != 0`, the packed value must be merged into a specific 32-bit or 64-bit slot of VD — this merge never happens. `pack=0` is the only safe case. Most D3D vertex pack sequences use `pack=1` (32-bit slot) with varying `shift`. +- **Fix**: Extract `pack = uimm & 3` and `shift = (instr.raw >> 6) & 3` (z field), read existing `ctx.vr[vd]`, apply the permutation table from `ppc_emit_altivec.cc:2125-2188`, write back. +- **ISA ref**: `ppc_emit_altivec.cc:2088-2191`. + +### PPCBUG-364 — vsldoi (non-128): correct; PPCBUG-365 — vsplt*: correct; informational + +- **Severity**: LOW (wontfix) +- **Status**: wontfix +- `vsldoi` correctly extracts SH as `(raw >> 6) & 0xF`. `vspltb/vsplth/vspltw` correctly read UIMM from the VA position (integer bits 16-20, masked to lane width). No bugs. + +### PPCBUG-366 — vspltisb / vspltish: sign-extension idiom is correct but non-obvious; future regression risk + +- **Severity**: MEDIUM +- **Status**: open (clarity fix recommended) +- **Location**: `interpreter.rs:2059-2060`, `2064-2066` +- **Symptom**: `simm | !0x1F` where `simm` is typed `i8`/`i16` is functionally correct (Rust narrows `!0x1F` to the target type), but the pattern is fragile under refactoring. Recommend: + ```rust + let simm = (((instr.raw >> 16) & 0x1F) as i32).wrapping_shl(27).wrapping_shr(27) as i8; + ``` + +### PPCBUG-367 — vupkhpx / vupklpx: channel replication vs zero-extend divergence; canary is unimplemented + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `vmx.rs:318-330` +- **Symptom**: `unpack_pixel_555` replicates 5-bit RGB channels (`r << 3 | r >> 2`) to fill 8 bits. ISA specifies zero-extension into bits 7:3, leaving bits 2:0 as zero. The replicate approach produces slightly different values (and slightly higher values), diverging from hardware. +- **Fix**: `let r8 = r << 3;` (drop the `| r >> 2` replication term). + +### PPCBUG-368 — vpkpx: pack_pixel_555 channel assignment unverified against hardware; canary comparison inconclusive + +- **Severity**: MEDIUM +- **Status**: open (needs hardware trace or more detailed canary analysis) +- **Location**: `vmx.rs:310-316` +- **Symptom**: The xenia-rs layout comment says R=bits 8-15, G=16-23, B=24-31. Canary's `vkpkx_in_low` uses different shift amounts (`>> 9` for R, `>> 6` for G, `>> 3` for B), suggesting either a different input layout assumption or the channels are swapped. Without a hardware reference, cannot determine which is authoritative. + +### PPCBUG-369 — vpkd3d128 z-field not extracted (sub-issue of PPCBUG-363) + +- **Severity**: LOW (tracked under PPCBUG-363) +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `interpreter.rs:3785` +- The `z` field (VX128_4, integer bits 6-7) is never extracted. Correct extraction: `(instr.raw >> 6) & 0x3`. + +### PPCBUG-370 — Zero interpreter tests for vperm / vperm128 (test gap) + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:1970-1995` + +### PPCBUG-371 — Zero interpreter tests for vsldoi / vsldoi128 (test gap) + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:1997-2020` + +### PPCBUG-372 — Zero interpreter tests for vpermwi128 (test gap) + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:4087-4099` + +### PPCBUG-373 — Zero interpreter tests for vmrghb / vmrglb / vmrghh / vmrglh (test gap) + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:3570-3600` + +### PPCBUG-374 — Zero interpreter tests for vspltb / vsplth / vspltw / vspltw128 (test gap) + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:2022-2048` + +### PPCBUG-375 — Zero interpreter tests for vspltisb / vspltish / vspltisw / vspltisw128 (test gap) + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:2050-2068` + +### PPCBUG-376 — Zero interpreter tests for all vpk* (16 ops) + VSCR.SAT coverage (test gap) + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:3607-3718` + +### PPCBUG-377 — Zero interpreter tests for vupkhsb / vupklsb / vupkhsh / vupklsh (test gap) + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:3722-3754` + +### PPCBUG-378 — Zero interpreter tests for vpkd3d128 / vupkd3d128 (test gap; blocked on PPCBUG-363) + +- **Severity**: LOW +- **Status**: open +- **Location**: `interpreter.rs:3783-3835` + +IDs PPCBUG-379 through PPCBUG-419 are unallocated — no further bugs found in group 35. + +--- + +## Batch 9 — VMX float arithmetic / compare / convert / estimate (group 36) + +Per-group report: `audit-out/group-36-vmx-float.md`. + +Group 36 summary: **21 findings (PPCBUG-420..440). 6 HIGH, 8 MEDIUM, 7 LOW.** The most +critical bugs are: (1) four VMX float compare VC-form opcodes use `rc_bit()` (bit 0) instead +of the correct VC-form Rc bit (bit 10) — CR6 is never updated, same root cause as PPCBUG-275; +(2) vmaddfp128 and vmaddcfp128 have their multiplicand and accumulator operands swapped — +every matrix multiply / Newton-Raphson step using these opcodes produces the wrong result; +(3) VMX128_R dot-form compares (vcmpeqfp128. etc.) decode as Invalid due to missing key4 +entries in decode_op6. + +**6 HIGH, 8 MEDIUM, 7 LOW. 21 IDs used (PPCBUG-420..440). 39 IDs unallocated (PPCBUG-441..479).** + +### PPCBUG-420 — vcmpeqfp / vcmpgefp / vcmpgtfp: `rc_bit()` reads wrong bit; CR6 never updated + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Affected opcodes**: `vcmpeqfp.`, `vcmpgefp.`, `vcmpgtfp.` +- **Location**: `interpreter.rs:1875`, `1885`, `1895` +- **Symptom**: `rc_bit()` = `self.raw & 1` reads LSB bit 0. For VC-form the Rc flag is at + PPC bit 21 = LSB bit 10. All XO values (vcmpeqfp=198, vcmpgefp=454, vcmpgtfp=710) have + bit 0 = 0, so CR6 is never updated for any float compare dot form. `vcmpeqfp.` + `bc 12,24` + (branch all-equal) always falls through. +- **Cross-reference**: PPCBUG-275 (identical root cause for integer vcmp). Canary reads + `i.VXR.Rc` (ppc_emit_altivec.cc:625, 633, 641). +- **Fix**: Add `pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }` to + `decoder.rs` and replace `instr.rc_bit()` at interpreter.rs:1875, 1885, 1895. + +### PPCBUG-421 — vcmpbfp: `rc_bit()` reads wrong bit (VC-form); Rc gate permanently dead + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `interpreter.rs:3428` +- **Symptom**: Same root cause as PPCBUG-420. XO=966, bit 0 = 0; CR6 update never fires + for `vcmpbfp.`. The CR6 value logic (`eq = !any_out`) is correct; only the gate is wrong. +- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:3428. + +### PPCBUG-422 — vcmpeqfp128 / vcmpgefp128 / vcmpgtfp128 / vcmpbfp128: `rc_bit()` reads wrong bit (VX128_R-form) + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `interpreter.rs:1875`, `1885`, `1895`, `3428` (shared arms with non-128 forms) +- **Symptom**: For VX128_R-form, Rc is at PPC bit 27 = LSB bit 4 (confirmed from canary's + `VX128_R` bitfield: `uint32_t Rc : 1` at bit 4 from LSB). `rc_bit()` reads bit 0. Fix + PPCBUG-423 first (dot forms decode as Invalid before this even matters). +- **Fix**: Add `pub fn vx128r_rc_bit(&self) -> bool { (self.raw >> 4) & 1 != 0 }` and use + it in the VX128_R compare arms. + +### PPCBUG-423 — vcmpeqfp128. / vcmpgefp128. / vcmpgtfp128. / vcmpbfp128.: dot forms decode as `Invalid` + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `decoder.rs:640-648` (decode_op6 VMX128 compare key4 table) +- **Symptom**: decode_op6 extracts `key4 = (bits22-24 << 3) | bit27`. When Rc=1, PPC bit 27 + is set, making key4 = non-dot value + 1. Dot-form key4 values (1, 9, 17, 25, 33) are all + absent from the match table. Decoder returns `PpcOpcode::Invalid`. Any game shader using a + VMX128-form float compare dot form traps with unimplemented opcode. +- **Fix**: Add dot-form entries to the key4 match table mapping to the same opcodes (the + interpreter arm uses `instr.vx128r_rc_bit()` to conditionally update CR6): + ```rust + 0b000001 => return PpcOpcode::vcmpeqfp128, + 0b001001 => return PpcOpcode::vcmpgefp128, + 0b010001 => return PpcOpcode::vcmpgtfp128, + 0b011001 => return PpcOpcode::vcmpbfp128, + 0b100001 => return PpcOpcode::vcmpequw128, + ``` + +### PPCBUG-424 — vmaddfp128: operand swap — computes VA×VB+VD instead of VA×VD+VB + +- **Severity**: HIGH +- **Status**: open +- **Location**: `interpreter.rs:1771` (`r[i] = ai.mul_add(bi, di)`) +- **Symptom**: Canary (ppc_emit_altivec.cc:806-809) documents `(VD) <- (VA × VD) + VB` and + routes as `MulAdd(VA, VD, VB)`. Xenia-rs reads VA, VB, VD then computes + `ai.mul_add(bi, di)` = `VA × VB + VD` — VB and VD roles swapped. Every shader using + vmaddfp128 for matrix multiply or Newton-Raphson accumulation accumulates the wrong value. + The existing denorm-flush test aliases vA=vD=v2, making the swap invisible. +- **Fix**: `r[i] = ai.mul_add(di, bi);` + +### PPCBUG-425 — vmaddcfp128: operand swap — computes VD×VB+VA instead of VA×VD+VB + +- **Severity**: HIGH +- **Status**: open +- **Location**: `interpreter.rs:4065` (`r[i] = di.mul_add(bi, ai)`) +- **Symptom**: Canary (ppc_emit_altivec.cc:819) documents `(VD) <- (VA × VD) + VB`. + Xenia-rs computes `VD × VB + VA`. Both the first multiplicand and the addend are wrong. +- **Fix**: `r[i] = ai.mul_add(di, bi);` +- **Test gap**: zero tests for `vmaddcfp128`. Add test with distinct VA, VB, VD registers. + +### PPCBUG-426 — vnmsubfp: two rounding steps instead of fused FMA; NaN sign may be flipped + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:1786` (`r[i] = bi - ai * ci`) +- **Symptom**: `vmaddfp` uses single-rounded `ai.mul_add(ci, bi)`, but `vnmsubfp` uses + `bi - ai * ci` (two operations, two rounding steps). ISA specifies a single fused operation. + Canary acknowledges the same limitation (ppc_emit_altivec.cc:1136). Additionally, the + implicit negation in subtraction may flip the sign bit of a NaN result (see PPCBUG-183). +- **Fix**: `r[i] = -ai.mul_add(ci, -bi);` — single FMA rounding: `-(ai*ci + (-bi))` = `bi - ai*ci`. + +### PPCBUG-427 — vnmsubfp128: same two-rounding form as vnmsubfp + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:1803` (`r[i] = di - ai * bi`) +- **Symptom**: Same class as PPCBUG-426 for the VMX128 form. +- **Fix**: `r[i] = -ai.mul_add(bi, -di);` + +### PPCBUG-428 — vrefp / vrefp128: full-precision 1/x instead of ~12-bit hardware estimate + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:1853` (`r[i] = 1.0 / b[i]`) +- **Symptom**: Same class as PPCBUG-184 (fresx). Xenon vrefp provides ~12-bit accuracy; + xenia-rs computes full IEEE-754 division. Canary also uses full precision in practice. + +### PPCBUG-429 — vrsqrtefp / vrsqrtefp128: full-precision 1/sqrt(x) instead of ~12-bit estimate + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:1862` (`r[i] = 1.0 / b[i].sqrt()`) +- **Symptom**: Same class as PPCBUG-428 for reciprocal square root. + +### PPCBUG-430 — vexptefp / vexptefp128: full-precision exp2(x) instead of ~12-bit estimate + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:3934` (`r[i] = b[i].exp2()`) +- **Symptom**: Same class as PPCBUG-428. NaN/Inf edge cases may diverge. + +### PPCBUG-431 — vlogefp / vlogefp128: full-precision log2(x) instead of ~12-bit estimate + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:3944` (`r[i] = b[i].log2()`) +- **Symptom**: Same class as PPCBUG-428. + +### PPCBUG-432 — vrfin / vrfin128: Rust `round()` is round-half-away-from-zero; ISA requires round-to-nearest-even + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:2172` (`r[i] = b[i].round()`) +- **Symptom**: `vrfin(0.5)` → ISA = 0.0; Rust = 1.0. `vrfin(2.5)` → ISA = 2.0; Rust = 3.0. + Canary uses SSE2 `ROUNDPS` which is round-to-nearest-even. +- **Fix**: Use `f32::round_ties_even()` (stable since Rust 1.77). + +### PPCBUG-433 — vctsxs / vcfpsxws128: NaN input returns 0 instead of saturating to INT_MIN (0x80000000) + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `vmx.rs:217` (`if x.is_nan() { return (0, true); }`) +- **Symptom**: AltiVec ISA: NaN in vctsxs saturates to INT_MIN (0x80000000). Xenia-rs returns 0. +- **Fix**: `if x.is_nan() { return (i32::MIN, true); }` + +### PPCBUG-434 — vctuxs NaN → 0 is correct; informational + +- **Severity**: LOW (wontfix) +- **Status**: wontfix +- **Location**: `vmx.rs:225` +- **Note**: Unsigned NaN saturates to 0 per ISA. Xenia-rs is correct. Add a comment. + +### PPCBUG-435 — vaddfp / vsubfp / vmulfp128: subnormal inputs not flushed when VSCR.NJ=1 + +- **Severity**: MEDIUM (latent — Xbox 360 always boots with NJ=1) +- **Status**: open +- **Location**: `interpreter.rs:1713`, `1729`, `1812` +- **Symptom**: VSCR.NJ=1 requires flush-to-zero for subnormal inputs. vmaddfp family correctly + calls `vmx::flush_denorm()`; plain add/sub/mul do not check VSCR. + +### PPCBUG-436 — vmsum3fp128 / vmsum4fp128: per-product intermediates not individually flushed + +- **Severity**: MEDIUM (latent) +- **Status**: open +- **Location**: `interpreter.rs:4076`, `4083` +- **Symptom**: `flush_denorm` on final sum only. Per-lane products can be subnormal and + accumulate before the final flush. + +### PPCBUG-437 — vmaddfp / vmaddfp128 / vmaddcfp128 / vnmsubfp128: subnormal output not flushed + +- **Severity**: MEDIUM (latent) +- **Status**: open +- **Location**: `interpreter.rs:1752–1754`, `1771–1773`, `4064–4067`, `1803–1805` +- **Symptom**: VSCR.NJ=1 requires flushing subnormal results. Inputs flushed; outputs are not. + +### PPCBUG-438 — Zero tests for vcmpeqfp / vcmpgefp / vcmpgtfp / vcmpbfp and dot forms + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` test module + +### PPCBUG-439 — Zero tests for vrfiz / vrfin / vrfip / vrfim and 128-bit variants + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs:2158–2192` + +### PPCBUG-440 — Zero tests for vctsxs / vctuxs / vcfsx / vcfux and 128-bit variants + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs:3842–3923` + +IDs PPCBUG-441 through PPCBUG-479 are unallocated — no further bugs found in group 36. + +--- + +## Batch 8 — VMX integer multiply-sum / multiply-half / sums / special (group 37) + +Per-group report: `audit-out/group-37-vmx-mulsum.md`. + +**Note**: All opcodes in this group are `XEINSTRNOTIMPLEMENTED()` stubs in xenia-canary; correctness is derived from the IBM ISA and `ppc-manual/vmx/` snapshots. `vrlimi128` is already tracked as PPCBUG-315. + +### PPCBUG-482 — `vmhaddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct) + +- **Severity**: WITHDRAWN +- **Status**: no bug +- **Note**: Draft analysis suggested >>16; the spec snapshot `ppc-manual/vmx/vmhaddshs.md` + explicitly shows `prod = (VA[i]*VB[i]) >> 15` and the pathological-case example confirms + `0x8000*0x8000 >> 15 = 32768`. Xenia-rs matches the spec exactly. No code change. + +### PPCBUG-483 — `vmhraddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct) + +- **Severity**: WITHDRAWN +- **Status**: no bug +- **Note**: `ppc-manual/vmx/vmhraddshs.md` explicitly shows `(product + 0x4000) >> 15`. + Xenia-rs matches. No code change needed. + +### PPCBUG-487 — vsumsws/vsum2sws/vsum4sbs/vsum4ubs/vsum4shs: VB operand mis-named as "c"/"VC" + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `interpreter.rs:3249-3307` +- **Symptom**: All five vsum* handlers use a VX-form instruction (two operands: VA and VB). + The code names the VB source `c` and the comment references "vC" — implying a non-existent + third register operand. Only `instr.ra()` and `instr.rb()` are valid for VX form; there is + no `rc()`. The arithmetic is correct (rb() is called), but the naming misleads maintainers + into thinking there is a VA-form three-operand encoding. +- **Fix**: Rename `c` → `b` and update comments to say `VB` instead of `vC` in all five + handler bodies. + +### PPCBUG-490 — Zero tests for all six vmsum* opcodes + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` `#[cfg(test)]` section +- **Symptom**: No unit test for `vmsumubm`, `vmsummbm`, `vmsumuhm`, `vmsumuhs`, `vmsumshm`, + `vmsumshs`. Critical missing: saturation + VSCR.SAT for `vmsumuhs`/`vmsumshs`; mixed-sign + byte product for `vmsummbm`; modulo wrap for `vmsumshm`. + +### PPCBUG-491 — Zero tests for `vmhaddshs` and `vmhraddshs` + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` `#[cfg(test)]` section +- **Symptom**: No test for either multiply-high-add instruction. Key cases: `VA = 0x8000`, + `VB = 0x8000` (minus-one-times-minus-one saturating case); `VA = VB = 0x7FFF, VC = 0x7FFF` + (add post-shift result to max accumulator). Verify VSCR.SAT is set on saturation and clear + on non-saturating inputs. + +### PPCBUG-492 — Zero tests for `vmladduhm` + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` `#[cfg(test)]` section + +### PPCBUG-493 — Zero tests for all eight `vmule*` / `vmulo*` opcodes + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` `#[cfg(test)]` section +- **Symptom**: No test for `vmuleub`, `vmuloub`, `vmulesb`, `vmulosb`, `vmuleuh`, `vmulouh`, + `vmulesh`, `vmulosh`. Key: even vs odd lane distinction (`vmulesh` vs `vmulosh`) is untested. + +### PPCBUG-494 — Zero tests for all five vsum* opcodes + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `interpreter.rs` `#[cfg(test)]` section +- **Symptom**: No test for `vsumsws`, `vsum2sws`, `vsum4sbs`, `vsum4ubs`, `vsum4shs`. + Missing: zero-output-lanes verification for `vsumsws` (w[0..2] must be 0) and `vsum2sws` + (w[0], w[2] must be 0); VSCR.SAT on overflow for all signed/unsigned variants. + +### PPCBUG-495 — `vsumsws` comment says "vC[3]" should say "VB[3]" + +- **Severity**: LOW (cosmetic) +- **Status**: open +- **Location**: `interpreter.rs:3248` + +IDs PPCBUG-480, PPCBUG-481, PPCBUG-482 (withdrawn), PPCBUG-483 (withdrawn), PPCBUG-484, +PPCBUG-485, PPCBUG-486, PPCBUG-488, PPCBUG-489, PPCBUG-496, PPCBUG-497, PPCBUG-498 are +either withdrawn (no bug found after re-examination), informational, or references to +existing IDs. IDs PPCBUG-499 through PPCBUG-509 are unallocated — no further bugs found +in group 37. + +--- + +## Batch 8 — VMX load/store (group 38) + +Per-group report: `audit-out/group-38-vmx-loadstore.md`. + +**Opcodes**: lvebx, lvehx, lvewx, lvewx128, lvlx, lvlx128, lvlxl, lvlxl128, lvrx, lvrx128, +lvrxl, lvrxl128, lvsl, lvsl128, lvsr, lvsr128, lvx, lvx128, lvxl, lvxl128, stvebx, stvehx, +stvewx, stvewx128, stvlx, stvlx128, stvlxl, stvlxl128, stvrx, stvrx128, stvrxl, stvrxl128, +stvx, stvx128, stvxl, stvxl128. + +Group 38 summary: The load family (lvx, lvxl, lvlx, lvrx, lvsl, lvsr, lvebx, lvehx, lvewx, +lvewx128 and all 128/LRU-hint variants) is arithmetically correct. EA computation, alignment +masking, big-endian byte ordering, RA=0 special cases, and lane indexing all match the ISA and +the `ea_indexed` helper. **5 HIGH bugs found** — the systemic `invalidate_for_write` gap +(PPCBUG-107 family) applies to ALL 16 VMX store opcodes, and `stvewx128` has an additional +severe memory-corruption bug (writes 16 bytes instead of 1 word). **1 MEDIUM** (behavioral +divergence between lvebx/lvehx/lvewx and canary's full-line simplification — xenia-rs is +architecturally more correct). **1 MEDIUM** (lvsr sh=0 edge-case correctness, documentation +gap). **3 LOW** test-coverage gaps. + +### PPCBUG-510 — `stvewx128` stores all 16 bytes instead of one word; 12-byte memory corruption (HIGH) + +- **Severity**: HIGH +- **Status**: open +- **Location**: interpreter.rs:2776-2781 +- **Symptom**: Uses `& !0xF` (16-byte alignment) then stores all 16 bytes of the vector. + ISA semantics: word-align EA, extract the word lane `(EA & 0xF) >> 2`, store 4 bytes only. + The non-128 `stvewx` (interpreter.rs:1675-1687) is correct — `stvewx128` was not updated + to match. Corrupts 12 adjacent bytes on every execution. +- **Canary reference**: `InstrEmit_stvewx_` (cc:170-185) — `ea & ~3`, extract lane, `ByteSwap`, + store 4 bytes only. `stvewx128` routes through the same helper as `stvewx`. +- **Fix**: mirror the `stvewx` body with `instr.vs128()` substituted for `instr.rs()`. + +### PPCBUG-511 — `stvx`, `stvx128`, `stvxl`, `stvxl128` missing `invalidate_for_write` (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Locations**: interpreter.rs:1598-1603 (stvx), 1605-1610 (stvx128), 1699-1705 (stvxl/stvxl128) +- **Root cause**: PPCBUG-107 (systemic) +- **Symptom**: Under `--parallel`, a 16-byte stvx to a reserved line does not clear the + reservation table slot. The reserving thread's `stwcx.` spuriously succeeds. +- **Fix**: per PPCBUG-107 pattern — add `invalidate_for_write(ea)` guard before the byte loop. + +### PPCBUG-512 — `stvebx`, `stvehx`, `stvewx`, `stvewx128` missing `invalidate_for_write` (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Locations**: interpreter.rs:1655 (stvebx), 1664 (stvehx), 1675 (stvewx), 2776 (stvewx128) +- **Root cause**: PPCBUG-107 (systemic) +- **Note**: `stvewx128` must also fix PPCBUG-510 before adding the invalidation call (or the + invalidation covers the wrong, over-wide address range). + +### PPCBUG-513 — `stvlx`, `stvlx128`, `stvlxl`, `stvlxl128` missing `invalidate_for_write` (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Locations**: interpreter.rs:2746-2749 (stvlx/stvlxl), 2751-2754 (stvlx128/stvlxl128) +- **Root cause**: PPCBUG-107 (systemic) +- **Note**: partial stores can span a 128-byte line boundary when `ea & 0xF != 0` and + `n = 16 - shift` crosses the line; two `invalidate_for_write` calls may be needed. + +### PPCBUG-514 — `stvrx`, `stvrx128`, `stvrxl`, `stvrxl128` missing `invalidate_for_write` (HIGH) + +- **Severity**: HIGH +- **Status**: applied (ca5b90b, 2026-05-01) +- **Locations**: interpreter.rs:2756-2759 (stvrx/stvrxl), 2761-2764 (stvrx128/stvrxl128) +- **Root cause**: PPCBUG-107 (systemic) +- **Note**: stvrx at shift=0 is a no-op (no bytes written); guard can skip the call in + that case. Otherwise invalidate `ea & !0xF` (the preceding aligned block). + +### PPCBUG-515 — `lvebx`, `lvehx`, `lvewx` implement element semantics; canary uses full-line load (MEDIUM) + +- **Severity**: MEDIUM +- **Status**: open +- **Locations**: interpreter.rs:1613-1653 +- **Symptom**: xenia-rs places the loaded byte/halfword/word into the correct lane and preserves + other lanes from VD (ISA-correct for the "undefined" lanes). Canary does a full aligned + 16-byte `lvx`-style load that overwrites all lanes. Both are valid under the ISA's "undefined" + specification, but game code compiled against canary may observe the canary behavior. The + divergence is documented and no code change is required unless canary compatibility becomes + an explicit goal. + +### PPCBUG-516 — `lvsr` sh=0 produces {16,17,...,31}; correct per ISA but undocumented (MEDIUM) + +- **Severity**: MEDIUM (documentation gap — computation is correct) +- **Status**: open +- **Location**: interpreter.rs:2218-2226 +- **Symptom**: When EA is 16-byte aligned, `lvsr` produces byte values all >= 16 (the "select + entirely from VB" identity for `vperm`). The formula `(16 - sh) + i` cannot overflow u8 + because `sh <= 15` guarantees `(16 - sh) + 15 <= 31`. No computation bug — but there is no + comment explaining why values > 15 are correct. Add a comment and a `debug_assert!(sh <= 15)`. + +### PPCBUG-517 — Zero test coverage for lvlx/lvrx/stvlx/stvrx boundary edge cases (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: vmx.rs tests (lines 756-792); interpreter.rs test module +- **Missing**: shift=15 for lvlx (1 byte loaded), shift=1 for lvrx (15 bytes), stvlx/stvrx + round-trip, stvrx at shift=0 confirmed no-op, full lvlx+lvrx+vor unaligned memcpy idiom + verified byte-exact. + +### PPCBUG-518 — Zero interpreter-level execution tests for all 36 VMX load/store opcodes (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: interpreter.rs test module +- **Missing**: lvx alignment masking, stvx byte-order verification, lvebx lane placement, + lvsl/lvsr permute index values, lvewx128 after PPCBUG-510 fix. 17 recommended minimum tests + enumerated in per-group report. + +### PPCBUG-519 — `stvrx` aligned no-op is silent; no debug trace (LOW) + +- **Severity**: LOW +- **Status**: open +- **Location**: vmx.rs:284-292 (`store_vector_right`) +- **Symptom**: shift=0 returns immediately with no trace event. Confusing during memory- + visibility debugging. Add `tracing::trace!` in debug builds. + +IDs PPCBUG-520 through PPCBUG-559 are unallocated — no further bugs found in group 38. + +--- + +## Phase C1 — Decoder field extractors + +Per-group report: `audit-out/phase-c1-decoder-fields.md`. + +Comprehensive audit of all `DecodedInstr` field accessors in `decoder.rs` lines 21-165, cross-checked against ISA form specs, Canary `FormatXxx` structs, and the interpreter's inline re-extraction. Phase B already found PPCBUG-040/046/275/315/360-363/420-422. Phase C1 adds 8 new findings (PPCBUG-560..567). + +**Confirmed-clean** (no new finding): `op`, `rd`/`rs`/`rt`, `ra`, `rb`, `rc`, `simm16`, `uimm16`, `d`, `ds`, `li`, `bd`, `bo`, `bi`, `aa`, `lk`, `oe`, `to`, `mb`/`me` (M-form only), `sh`, `spr`, `crm`, `crfd`/`crfs`, `l`, `crbd`/`crba`/`crbb`, `nb`, `va128`/`vb128`/`vd128`/`vs128`, `extract_vx128_uimm5`. + +### PPCBUG-560 — sh64() test helper wrong bit order; masks PPCBUG-040 from unit tests (HIGH) + +- **Severity**: HIGH +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `xenia-rs/crates/xenia-cpu/tests/disasm_goldens.rs:160-176` (function `rldicl`) +- **Symptom**: The `rldicl` test helper encodes `sh[5:1]` at PPC bits 16-20 and `sh[0]` at PPC bit 30. The ISA encodes `sh[4:0]` at PPC bits 16-20 and `sh[5]` at PPC bit 30. The wrong `sh64()` formula `(sh_lo << 1) | sh_hi` correctly inverts the wrong encoding, making the test pass — but fails on real binary code. + + **Counterexamples** (ISA-encoded input → `sh64()` output): + + | True shift | sh64() result | Error | + |-----------|--------------|-------| + | 1 | 2 | +1 | + | 16 | 32 | +16 | + | 32 | 1 | -31 | + | 33 | 3 | -30 | + | 63 | 63 | 0 (coincidence) | + + Only `sh=0` and `sh=63` decode correctly. All other shifts (1-62) are wrong against real code. + +- **Fix for `sh64()`** (per PPCBUG-040): + ```rust + pub fn sh64(&self) -> u32 { + (extract_bits(self.raw, 30, 30) << 5) | extract_bits(self.raw, 16, 20) + } + ``` +- **Fix for test helper** (must be in same commit): + ```rust + // Correct: sh_lo = sh & 0x1F → PPC bits 16-20; sh_hi = sh >> 5 → PPC bit 30 + (30 << 26) | (rs << 21) | (ra << 16) | ((sh & 0x1F) << 11) + | (mb_lo << 6) | (mb_hi << 5) | (0 << 2) | ((sh >> 5) << 1) | rc + ``` +- **Cross-reference**: PPCBUG-040 (primary finding). PPCBUG-560 is the test-infrastructure companion. + +### PPCBUG-561 — Missing `mb_md()` accessor on `DecodedInstr`; interpreter inlines wrong formula at 6 sites (MEDIUM) + +- **Severity**: MEDIUM +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `decoder.rs` — accessor absent; `disasm.rs:1256` has correct local helper; `interpreter.rs` lines 696, 706, 716, 726, 736, 746 each inline the wrong formula +- **Symptom**: Interpreter uses `(instr.mb() << 1) | ((instr.raw >> 1) & 1)` which: (a) reads `SH5` (PPC bit 30, host bit 1) instead of `MB5` (PPC bit 26, host bit 5) as the high bit; (b) places the high bit at position 0 instead of position 5. `disasm.rs` has the correct version already — expose it as `DecodedInstr::mb_md()`. +- **Cross-reference**: PPCBUG-046 (primary finding). + +- **Fix**: + ```rust + // Add to decoder.rs: + #[inline] pub fn mb_md(&self) -> u32 { + extract_bits(self.raw, 21, 25) | (extract_bits(self.raw, 26, 26) << 5) + } + ``` + Replace all 6 inline sites in `interpreter.rs` with `instr.mb_md()`. + +### PPCBUG-562 — Missing `vc_rc_bit()` and `vx128r_rc_bit()` per-form Rc accessors (MEDIUM) + +- **Severity**: MEDIUM +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `decoder.rs` — no per-form Rc accessors; `interpreter.rs` uses generic `rc_bit()` (bit 31) for both VC and VX128_R forms +- **Symptom**: Generic `rc_bit()` reads PPC bit 31 (LSB). VC-form Rc is at PPC bit 21 = `(raw >> 10) & 1`. VX128_R-form Rc is at PPC bit 27 = `(raw >> 4) & 1`. Using bit 31 for these forms means the CR6 update gate is permanently disabled for all dot-form VMX vector compares — root cause of PPCBUG-275/420/421/422. +- **Fix**: + ```rust + /// Rc for VC-form vector compare (vcmpeqfp, vcmpgefp, vcmpgtfp, vcmpbfp, etc.) — PPC bit 21. + #[inline] pub fn vc_rc_bit(&self) -> bool { extract_bits(self.raw, 21, 21) != 0 } + /// Rc for VX128_R-form compare (vcmpeqfp128, vcmpgefp128, etc.) — PPC bit 27. + #[inline] pub fn vx128r_rc_bit(&self) -> bool { extract_bits(self.raw, 27, 27) != 0 } + ``` +- **Cross-reference**: PPCBUG-275 / PPCBUG-420 / PPCBUG-421 / PPCBUG-422. + +### PPCBUG-563 — Missing `vx128_4_z()` and `vx128_4_imm()` for VX128_4 form (MEDIUM) + +- **Severity**: MEDIUM +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `decoder.rs` — accessors absent; `interpreter.rs:3551-3552` (vrlimi128) reads wrong bit positions +- **Symptom**: VX128_4 form has `IMM` (5-bit) at PPC bits 11-15 (host bits 16-20) and `z` (2-bit) at PPC bits 24-25 (host bits 6-7). Interpreter `vrlimi128` uses `(raw >> 16) & 0x3` for shift (reads VB128l partial) and `(raw >> 2) & 0xF` for mask (reads VD128h region). +- **Fix**: + ```rust + #[inline] pub fn vx128_4_imm(&self) -> u32 { extract_bits(self.raw, 11, 15) } + #[inline] pub fn vx128_4_z(&self) -> u32 { extract_bits(self.raw, 24, 25) } + ``` +- **Cross-reference**: PPCBUG-315. + +### PPCBUG-564 — Missing `vx128_p_perm()` for VX128_P form; PERMh reads XO bits (MEDIUM) + +- **Severity**: MEDIUM +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:4089` (vpermwi128) uses `(raw >> 16) & 0xFF` which reads PERMl (correct) but uses XO/reserved bits 21-23 for PERMh instead of PPC bits 23-25 +- **Symptom**: Top 3 bits of the 8-bit PERM selector are wrong for every `vpermwi128` instruction. Lane selections for words 0 and 1 are garbage. +- **Fix**: + ```rust + #[inline] pub fn vx128_p_perm(&self) -> u32 { + extract_bits(self.raw, 11, 15) | (extract_bits(self.raw, 23, 25) << 5) + } + ``` +- **Cross-reference**: PPCBUG-362. + +### PPCBUG-565 — Missing `vx128_5_sh()` for VX128_5 form; vsldoi128 MSB reads reserved bit (MEDIUM) + +- **Severity**: MEDIUM +- **Status**: applied (52b05b1, 2026-05-01) +- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:2012` (vsldoi128) uses `(raw >> 4) & 0x1` for the shift MSB (reads PPC bit 27 = reserved) instead of PPC bit 22 = host bit 9 = `(raw >> 9) & 1` +- **Symptom**: vsldoi128 shift amounts ≥ 8 (where the 4th bit matters) use a garbage bit. The correct 4-bit SH is at PPC bits 22-25 (host bits 6-9) = `(raw >> 6) & 0xF`. +- **Fix**: + ```rust + #[inline] pub fn vx128_5_sh(&self) -> u32 { extract_bits(self.raw, 22, 25) } + ``` +- **Cross-reference**: PPCBUG-361. + +### PPCBUG-566 — Missing XER TBC field accessor documentation for lswx/stswx (LOW) + +- **Severity**: LOW +- **Status**: open +- **Location**: `decoder.rs` — XER[25:31] (7-bit transfer byte count) is runtime state, not an instruction field; no accessor exists and no documentation notes the gap +- **Symptom**: `lswx`/`stswx` use XER[25:31] as their byte count. The interpreter has no way to read this via the normal accessor pattern. Not a bit-position error, but a structural gap. +- **Recommendation**: add `ctx.xer_tbc() -> u8` to `PpcContext` returning `(ctx.xer() >> 25) & 0x7F`. Document that these are the only instructions that read XER as a count operand. + +### PPCBUG-567 — Zero unit tests pin any scalar field accessor (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `decoder.rs` unit tests; `tests/disasm_goldens.rs` +- **Symptom**: Phase 4 tests pin `va128`/`vb128`/`vd128`/`vs128` only. No test verifies: `sh64()` against ISA-encoded instructions (existing test validates wrong round-trip — PPCBUG-560), `mb_md()` (absent), `vc_rc_bit()`/`vx128r_rc_bit()` (absent), `ds()` for negative displacement, `spr()` for LR/CTR/XER beyond DEC. +- **Recommended additions**: decoder-level unit tests using ISA-correct encodings for `sh64`, `mb_md`, the two new Rc accessors, `ds` negative, `spr` for LR=8 and CTR=9. See phase-c1-decoder-fields.md for concrete encoding examples. + +IDs PPCBUG-568 through PPCBUG-599 are unallocated — no further bugs found in Phase C1 scope. + +--- + +## Phase C2 — Decoder opcode-lookup tables + +Per-group report: `audit-out/phase-c2-decoder-lookup.md`. + +**Methodology**: complete line-by-line comparison of all `decode_opNN` functions in +`xenia-rs/crates/xenia-cpu/src/decoder.rs` against +`xenia-canary/src/xenia/cpu/ppc/ppc_opcode_lookup_gen.cc`, plus cross-reference of +`ppc-manual/forms/` for VC, VX128_R, VX128_5, VA, VX128_3, VX128_4 forms. + +**Overall verdict**: the decoder is structurally sound and entry-by-entry matches +Canary for all real Xbox 360 instructions, with one pre-known exception (PPCBUG-600 = +PPCBUG-423). Zero new wrong-entry bugs. One new medium-severity cross-reference bug +(dot-form gap), one medium maintainability risk (key-ordering dependency), three LOWs +(test gaps, reserved-encoding misidentification, undocumented fast-path). + +### PPCBUG-600 — `decode_op6` key4: VMX128 compare dot-forms decode as Invalid (MEDIUM) + +- **Severity**: MEDIUM (cross-reference for PPCBUG-423; same root cause, Phase C2 ID) +- **Status**: applied (52b05b1, 2026-05-01) (dup-of:423 for the fix; this ID is for Phase C2 tracking) +- **Location**: `decoder.rs:640-648` (`decode_op6`, key4 match table) +- **Symptom**: The VX128_R form places `Rc` at PPC bit 27. The key4 formula is + `(bits 22-24 << 3) | bit27`. When Rc=1 (dot-form), bit27=1 and key4 is odd. + Only even key4 values are in the table. Five dot-form encodings fall through to + `PpcOpcode::Invalid`: + - `vcmpeqfp128.` → key4=0b000001 (1), decodes as Invalid + - `vcmpgefp128.` → key4=0b001001 (9), decodes as Invalid + - `vcmpgtfp128.` → key4=0b010001 (17), decodes as Invalid + - `vcmpbfp128.` → key4=0b011001 (25), decodes as Invalid + - `vcmpequw128.` → key4=0b100001 (33), decodes as Invalid +- **Contrast**: standard VMX VC-form compares (op=4 key3) are correct because their + Rc bit (bit21) is outside the key3 window (bits 22-31). VMX128_R uses a different + form where Rc is at bit27, which is inside the key4 window. +- **Fix**: Add 5 dot-form entries to key4 in `decode_op6`: + ```rust + 0b000001 => return PpcOpcode::vcmpeqfp128, + 0b001001 => return PpcOpcode::vcmpgefp128, + 0b010001 => return PpcOpcode::vcmpgtfp128, + 0b011001 => return PpcOpcode::vcmpbfp128, + 0b100001 => return PpcOpcode::vcmpequw128, + ``` + The interpreter's existing `instr.rc_bit()` check already handles CR6 update for + dot-forms — decoder just needs to emit the right opcode. +- **See also**: PPCBUG-423 (Phase B original finding) for impact assessment and + full context. + +### PPCBUG-601 — `decode_op6` key ordering creates undocumented correctness dependency (MEDIUM) + +- **Severity**: MEDIUM (maintainability risk; no current wrong-decode for real code) +- **Status**: open +- **Location**: `decoder.rs:603-637` (`decode_op6`, key1/key2/key3 dispatch) +- **Symptom**: key1 (bits 21-22 << 5 | bits 26-27), key2 (bits 21-23 << 4 | bits 26-27), + and key3 (bits 21-27) all overlap. Correctness depends on an implicit invariant: + vpkd3d128 and vrlimi128 (matched by key2) always have bits 26-27 = `01`, while all + 15 key3 unary entries always have bits 26-27 = `11`. If a future instruction were + added to key2 with bits 26-27 = `11`, it would shadow a key3 entry. No comment in + the source documents this constraint. +- **Fix**: Add a comment block above the key2/key3 dispatches explaining the invariant: + ``` + // key2 matches bits 26-27 == 01 only (vpkd3d128, vrlimi128). + // key3 entries all have bits 26-27 == 11. No overlap is possible + // for any currently-defined Xbox 360 instruction. + ``` + +### PPCBUG-602 — `decode_op4` vsldoi128 fallback: over-broad single-bit catch-all (LOW) + +- **Severity**: LOW (only fires for reserved/undefined encodings in practice) +- **Status**: open +- **Location**: `decoder.rs:558-561` +- **Symptom**: The VX128_5 form for vsldoi128 is identified by op=4, bit27=1. The + dispatch uses a bare `if extract_bits(code, 27, 27) == 1` after the other tables, + rather than an exact VX128_5-form check. Reserved VA extended opcodes that happen + to have their key4 bit4 (= word bit27) set decode as vsldoi128 instead of Invalid. + Example: VA XO=0b100011 (35, reserved gap between vmladduhm=34 and vmsumubm=36) + — key4 misses, bit27=1 fires → decoded as vsldoi128. ISA specifies reserved + encodings should trap; this silently assigns a meaning. +- **Fix (optional)**: Strengthen to an exact match: + ```rust + // VX128_5 form: SH@22-25, VA128h@26, XO=bit27. Bits 28-31 carry VD128h/VB128h. + // Only vsldoi128 uses this form. Verify the XO bit and absence of load/store marker. + if extract_bits(code, 27, 27) == 1 && extract_bits(code, 30, 31) != 0b11 { + return PpcOpcode::vsldoi128; + } + ``` + Alternatively, accept current behavior and add a comment. + +### PPCBUG-603 — Primary opcode 9 maps to Invalid; correct but undocumented (LOW) + +- **Severity**: LOW (test gap / documentation only) +- **Status**: open +- **Location**: `decoder.rs:369` (the `_ => PpcOpcode::Invalid` arm of `lookup_opcode`) +- **Symptom**: Primary opcode 9 (`dozi` in original POWER ISA) is undefined on + Xenon/750CL and correctly decodes as Invalid. Canary also returns `PPC_DECODER_MISS`. + No comment documents this intentional absence. +- **Fix**: Add `// 9 = dozi (POWER-only, not present on Xenon)` comment near the + match, or explicitly add `9 => PpcOpcode::Invalid` with a comment. + +### PPCBUG-604 — Zero decoder unit tests for decode_op5, decode_op6, decode_op30, decode_op63 (LOW) + +- **Severity**: LOW (test gap) +- **Status**: open +- **Location**: `decoder.rs:897-1107` (test module) +- **Symptom**: The 10 existing decoder tests cover addi, lwz, branch, stw, ori, and + cache mechanics. None exercise VMX128 (op=5, op=6), rotate-doubleword (op=30), or + FPU (op=63) opcode paths. In particular, no test would have caught PPCBUG-600 + (vcmpeqfp128 dot-form decodes as Invalid) before it caused a runtime trap. +- **Recommended minimum additions** (8 tests): + 1. `vcmpeqfp128` (Rc=0) → decodes as `vcmpeqfp128`. + 2. `vcmpeqfp128.` (Rc=1) → decodes as `vcmpeqfp128` (tests PPCBUG-600 fix). + 3. `vcmpeqfp` (op=4, Rc=0) → key3 check, bit21=0. + 4. `vcmpeqfp.` (op=4, Rc=1) → key3 check, bit21=1, same decode. + 5. `vsldoi128` (op=4, bit27=1) → fallback fires. + 6. `rldicl` (op=30) → decode_op30. + 7. `fadd` (op=63, Rc=0) → arithmetic table. + 8. `fadd.` (op=63, Rc=1) → same decode as fadd. + +### PPCBUG-605 — `decode_op31` sradix fast-path is correct but undocumented (LOW) + +- **Severity**: LOW (documentation gap only) +- **Status**: open +- **Location**: `decoder.rs:702-705` +- **Symptom**: The sradix pre-check uses bits 21-29 (9 bits). The subsequent main + table uses bits 21-30 (10 bits). Because no main-table entry has bits 21-29 = + 0b110011101, the fast-path cannot shadow a legitimate main-table entry. However, + this is not documented in the source, and a reader might worry that sradix (Rc=0, + bits 21-30 = 0b1100111010) or sradix. (Rc=1, same bits 21-30) could conflict with + a future entry at key 0b1100111010. +- **Fix**: Add a comment: `// sradix: XS-form, XO=413 (bits 21-29=0b110011101).` + `// No main-table entry uses bits 21-30 starting with 0b110011101x.` + +IDs PPCBUG-606 through PPCBUG-639 are unallocated — no further bugs found in Phase C2. + +--- + +## Phase C3 — Disassembler formatter parity + +Per-group report: `audit-out/phase-c3-disasm.md`. + +**Methodology**: Full line-by-line audit of `disasm.rs:format()` and all ~70 per-class helpers. +Cross-referenced against `xenia-canary/src/xenia/cpu/ppc/ppc_opcode_disasm_gen.cc`, +`tests/golden/extended_mnemonics.json`, and `tests/golden/base_mnemonics.json`. +Checked: mnemonic correctness (Rc/OE/LK/AA/L-field), operand formatting (signed vs unsigned, +hex vs decimal), simplified-mnemonic priority, branch-condition extended forms, VMX register +naming, VX128 field extraction, and golden test coverage. + +**Overall verdict**: The formatter is structurally sound. All OE/Rc/LK/AA suffix handling, the +simplified mnemonic priority order, VMX 5-bit and VMX128 7-bit register naming, SPR mnemonics, +and CR-logical extended forms are correct. Two HIGH bugs found: the `bdnz`/`bdz` extended +mnemonic appends a spurious condition suffix, and the pre-existing `sync`/`lwsync` bug +(PPCBUG-088) is re-assessed as HIGH in disassembler scope. Two MEDIUM bugs: decimal vs hex +for SIMM immediates and D-form displacements (diverges from every real PPC disassembler). +Several LOW findings for golden fixture correctness and edge cases. + +**Key finding**: the disassembler's VX128 field extraction (vperm128 VC, vsldoi128 SH, +vpermwi128 PERM) is CORRECT in all three cases where the interpreter (PPCBUG-360/361/362) +has the wrong extraction. The disassembler was written independently and got them right. + +### PPCBUG-640 — `fmt_bc`: pure `bdnz`/`bdz` emits `bdnzge`/`bdzge` (spurious condition suffix) (HIGH) + +- **Severity**: HIGH +- **Status**: open +- **Location**: `disasm.rs:829-834` +- **Symptom**: For `bcx` with BO=16 (`bdnz`: decrement CTR, branch if CTR≠0, CR ignored): + - `decr = (16 & 4) == 0` = true + - `uncond = (16 & 16) != 0` = true + - Code falls into the `if decr` branch and computes `cond_name_opt` from `(cr_bit=0, cond_true=false)` → `Some("ge")` + - Emits: **`bdnzge`** — WRONG. ISA simplified form is `bdnz`. + + For BO=18 (`bdz`): same path → **`bdzge`** — WRONG. + + The bug is absent in `fmt_bclr` which has an explicit `if decr && uncond` guard at line 872 + producing `bdnzlr`/`bdzlr` correctly. `fmt_bc` lacks this guard. + + The golden fixture "bdnz 0x82000040" (PPCBUG-650 companion) pins the wrong output. + +- **Fix**: In `fmt_bc`, inside the `if decr` block, gate the condition string on `!uncond`: + ```rust + if decr { + let z = if bo & 0x02 != 0 { "z" } else { "nz" }; + let cond_str = if uncond { "" } else { cond_name_opt.unwrap_or("") }; + let ext_mnem = format!("bd{z}{cond_str}{a}{l}"); + let ext_ops = format!("{cr}0x{target:08X}"); + with_ext(&base_mnem, base_ops, 8, &ext_mnem, ext_ops, 8) + } + ``` + Also update golden fixtures PPCBUG-650. + +- **Impact**: All analysis-DB queries for `bdnz` loops (common in pixel-shader and vertex + processing loops) return zero rows; they are stored as `bdnzge`. Developers inspecting + loop structures see a misleading condition name on a CTR-only branch. + +### PPCBUG-641 — `sync` emits `"sync"` for `lwsync` (L=1) — re-assessment of PPCBUG-088 (HIGH) + +- **Severity**: HIGH (disassembler scope; PPCBUG-088 was LOW for interpreter scope) +- **Status**: open (see PPCBUG-088 for fix) +- **Location**: `disasm.rs:364` +- **Symptom**: `PpcOpcode::sync` always emits `"sync"`. The L-field at PPC bit 10 selects + `lwsync` (L=1, encoding `0x7C2004AC`). `lwsync` is the acquire barrier in every Xbox 360 + spinlock. Every `lwsync` in the disassembly DB is stored as `mnemonic='sync'`. + `SELECT * WHERE mnemonic='lwsync'` returns zero rows regardless of binary content. +- **Note**: the golden fixture for lwsync (PPCBUG-649) currently pins the wrong output. + +### PPCBUG-642 — `fmt_bcctr` missing extended form for CTR-decrement/ignore-CR BO values (MEDIUM) + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `disasm.rs:880-902` +- **Symptom**: `bcctrx` with BO=16 (decrement CTR, ignore CR) falls through to `base()` with + no extended form. `fmt_bclr` (the equivalent for bclrx) correctly handles the same case with + an explicit `decr && uncond` check at line 872, producing `bdnzlr`. + Note: `bcctr` with CTR-decrement is undefined by PowerISA; this encoding should never appear + in valid compiled code. The inconsistency is a maintenance concern rather than a runtime bug. +- **Fix**: Add a `decr && uncond` check before the `cond_branch_ext` call in `fmt_bcctr`, + mirroring lines 872-876 in `fmt_bclr`. Or add a comment explaining the ISA undefined status. + +### PPCBUG-643 — SIMM immediate display: decimal diverges from Canary and real disassemblers (MEDIUM) + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `disasm.rs:946` (addi), `976` (addic), `989` (subfic), `990` (mulli), + `1003` (cmpi), `1048-1061` (fmt_ld/fmt_st), and all similar SIMM sites +- **Symptom**: SIMM immediates are formatted via Rust's `{imm}` (decimal). Canary uses + `"-0x{:X}"` / `"0x{:X}"` (signed hex) for every SIMM field. GNU objdump, IDA Pro, + and all standard PPC disassemblers use hex. The inconsistency is internal to xenia-rs: + `addis`/`oris`/`xoris` use hex (`0x{imm_u:X}`), but `addi`/`addic`/`mulli` use decimal. + This misleads analysis-DB queries that mix instructions (e.g. `addi r3, r1, -4` vs + `addis r3, r0, 0x8000`). +- **Impact**: Medium — the output is not *wrong* (the value is correctly computed), but + cross-referencing with Canary output or objdump requires manual conversion. + +### PPCBUG-644 — D-form load/store displacement uses decimal instead of hex (MEDIUM) + +- **Severity**: MEDIUM +- **Status**: open +- **Location**: `disasm.rs:1053` (`fmt_ld`), `1061` (`fmt_st`), `1069` (`fmt_ds`) +- **Symptom**: `format!("{rn}, {d}({})", gpr(ra))` outputs decimal for the displacement. + Canary outputs `"-0x8(r1)"` not `"-8(r1)"`. Every standard PPC disassembler uses hex. + Affects 25+ D-form and DS-form opcodes. Negative displacements (-8, -16, etc.) are + especially confusing in decimal when reading stack frame accesses. +- **Fix**: + ```rust + let d_str = if d < 0 { format!("-0x{:X}", -d) } else { format!("0x{:X}", d) }; + base(mnem, format!("{rn}, {d_str}({})", gpr(ra)), 8) + ``` + Update all golden fixture rows with displacement values. + +### PPCBUG-645 — `cntlzdx` Rc suffix: moot for valid encodings, but WONTFIX (LOW) + +- **Severity**: LOW +- **Status**: wontfix +- **Location**: `disasm.rs:286` +- **Note**: `fmt_x_unary_rc` would emit `cntlzd.` for Rc=1, but valid `cntlzd` encodings + always have Rc=0. Canary emits `cntlzd` always. No impact for valid code. + +### PPCBUG-646 — `fmt_rlwimi` inslwi/insrwi priority overlap: confirmed correct (LOW) + +- **Severity**: LOW +- **Status**: wontfix +- **Note**: After careful analysis, the `inslwi` guard excludes `insrwi` overlap cases + (`sh != 31u32.wrapping_sub(me)`). Priority is correct. Informational only. + +### PPCBUG-647 — `fmt_rlwinm` `extrwi` uses `wrapping_sub` which can give misleading results for invalid encodings (LOW) + +- **Severity**: LOW +- **Status**: open +- **Location**: `disasm.rs:1137` +- **Symptom**: `let b = sh.wrapping_sub(n) % 32;` — for invalid `sh < n` encodings, + `wrapping_sub` gives a large u32, `% 32` gives a confusing value. For all compiler-emitted + encodings `sh >= n` holds. Add `&& sh >= 32 - mb` to the guard to avoid the fallthrough. + +### PPCBUG-648 — `fmt_mftb` TBR=268: ext mnemonic identical to base mnemonic (LOW) + +- **Severity**: LOW +- **Status**: open +- **Location**: `disasm.rs:1443` +- **Symptom**: `268 => with_ext("mftb", base_ops, 8, "mftb", gpr(rd), 8)` — base is `mftb`, + extended is also `mftb`. `display()` picks the extended form (omitting the `268` operand), + making it ambiguous vs. `mftbu`. Consider: either emit base-only (`mftb r3, 268`) or rename + the base to `mftb.raw` for disambiguation. + +### PPCBUG-649 — Golden fixture for `lwsync` pins wrong output (no ext_mnemonic) (LOW) + +- **Severity**: LOW (test coverage gap) +- **Status**: open +- **Location**: `tests/golden/extended_mnemonics.json`, entry "lwsync" +- **Symptom**: Fixture has `mnemonic: "sync"` and no `ext_mnemonic`. After PPCBUG-088/641 + fix, expected output is `mnemonic: "sync"`, `ext_mnemonic: "lwsync"`. Current fixture + defeats regression detection — the test passes with wrong output. + +### PPCBUG-650 — Golden fixtures for `bdnz`/`bdz` pin wrong extended mnemonic (LOW) + +- **Severity**: LOW (companion to PPCBUG-640) +- **Status**: open +- **Location**: `tests/golden/extended_mnemonics.json`, rows "bdnz 0x82000040" and "bdz 0x82000040" +- **Symptom**: Both rows have `ext_mnemonic: "bdnzge"` and `ext_mnemonic: "bdzge"`. + After PPCBUG-640 fix, correct values are `"bdnz"` and `"bdz"`. + +### PPCBUG-651 — `fmt_vmx128_pack_d3d` shared by `vpkd3d128` and `vrlimi128`: confirmed correct (LOW) + +- **Severity**: LOW +- **Status**: wontfix +- **Note**: Both opcodes use VX128_4 form. Shared formatter outputs identical operand lists + (`vd, vb, imm, z`) which is correct for both. Informational only. + +### PPCBUG-652 — Zero golden fixtures for any VMX128 opcode disassembly (LOW) + +- **Severity**: LOW (test coverage gap) +- **Status**: open +- **Location**: `tests/golden/` — all three JSON files +- **Symptom**: No fixture pins the formatted output of any VMX128 instruction. Regressions + in VMX128 field extraction (e.g. a re-introduction of PPCBUG-360/361/362 in the disassembler) + would be invisible. Recommend adding at minimum: `vaddfp128`, `vperm128`, `vsldoi128`, + `vpkd3d128`, `vcmpeqfp128.`, `vmaddfp128`. + +### PPCBUG-653 — `fmt_trap_imm` unconditional trap extended form: confirmed not-a-bug (LOW) + +- **Severity**: LOW +- **Status**: wontfix +- **Note**: `twi 31, rA, IMM` (to=31) has no ISA simplified mnemonic unless RA=0 and IMM=0 + (which matches `tw 31, r0, r0 = trap`). The `fmt_trap_imm` correctly emits base-only for + `twi 31, rA, N`. Informational. + +### PPCBUG-654 — `fmt_rldimi` `insrdi` guard excludes valid `mb=0` (b=0) case (LOW) + +- **Severity**: LOW +- **Status**: open +- **Location**: `disasm.rs:1220` +- **Symptom**: Guard `if mb > 0` excludes `insrdi rA, rS, n, 0` (b=0 → mb=0). A valid + compiler-emitted `rldimi` with sh+mb+n=64 and mb=0 falls through to base form instead of + displaying the `insrdi` simplified mnemonic. +- **Fix**: Remove the `mb > 0` guard; the inner `n > 0` guard is sufficient to avoid + degenerate cases. + +IDs PPCBUG-655 through PPCBUG-679 are unallocated — no further bugs found in Phase C3. diff --git a/audit-report-2026-04-29.md b/audit-report-2026-04-29.md new file mode 100644 index 0000000..7ff7667 --- /dev/null +++ b/audit-report-2026-04-29.md @@ -0,0 +1,421 @@ +# PPC Instruction Audit — Triaged Report (2026-04-29) + +**Status**: audit complete. **No code modified.** This file is the fix-order plan for the follow-up session. +**Source of truth**: detailed bug entries (one heading per PPCBUG ID) live in `audit-findings.md`. This file references every entry by ID so nothing is lost — it does not duplicate the per-bug detail. + +## Counts + +- **Total findings**: 253 PPCBUG IDs, of which 4 are explicitly retracted/withdrawn (PPCBUG-220, 222, 226, 482, 483 — see Notes section). +- **Net findings**: ~248 actionable. +- **Severity breakdown** (rough): + - HIGH: ~55 (~22%) + - MEDIUM: ~75 (~30%) + - LOW (test gaps + cosmetic + informational): ~118 (~48%) + +## Headline findings (most likely Sylpheed-renderer-blockers) + +1. **PPCBUG-107 cascade** — `ReservationTable::invalidate_for_write` defined and unit-tested but never called from any of the **50+ store opcodes** in the interpreter. Under `--parallel`, every cross-thread atomic via `lwarx`/`stwcx.` is silently broken: spinlocks succeed without exclusion, atomic counters race, condition-variable handshakes never sync. Plausible direct cause of the 4-worker-thread renderer plateau (`project_xenia_rs_sylpheed_stage3_2026_04_29.md`). **Fix is mechanical**: one-line `if t.has_active_reservers() { t.invalidate_for_write(ea) }` before every `mem.write_*` in interpreter.rs. + +2. **PPCBUG-053+054 cascade** — `bcx`/`bclrx` CTR zero-test compares all 64 bits; `mtspr CTR` writes full 64-bit GPR. Combined with PPCBUG-006 (`negx` poisons GPR upper 32) → **`neg; mtctr; bdnz` loops run forever**. + +3. **8 decoder/field-extraction bugs collapse into 6 missing accessors** + 1 wrong sh64 formula + 1 missing decode_op6 dot-form entry. The disassembler already has correct local versions. Single mechanical sweep. + +4. **PPCBUG-046 (`clrldi r3, r4, 32`)** — the canonical zero-extend-low-32 idiom is currently a no-op. Emitted constantly by 32-bit-ABI compilers. + +5. **PPCBUG-510** — `stvewx128` corrupts 12 adjacent bytes per call. + +6. **PPCBUG-424/425** — `vmaddfp128`/`vmaddcfp128` operand swap. Every D3D vertex/pixel shader using FMA with non-aliased operands gets wrong arithmetic. + +7. **PPCBUG-360/363** — `vperm128` uses wrong control vector (every D3D shader swizzle); `vpkd3d128` missing post-pack permutation (canonical D3D vertex-pack `pack=1` always wrong). + +8. **PPCBUG-275/420-422** — VC-form and VMX128_R-form `rc_bit()` reads bit 0 instead of bit 21/27 → **CR6 never updated for ANY VMX vector compare dot form**. Breaks every `vcmpequb. + bc CR6_all_true` early-exit loop in audio mixing, font rendering, string ops. + +## Recommended fix order + +The phases below are the recommended fix order for the follow-up session. Each phase is **independently mergeable**; later phases may reveal that earlier phases unblocked their symptoms (e.g. P1 by itself could be sufficient to break open the Sylpheed renderer plateau). + +After each phase: `cargo test --workspace --release` (must stay at 506+ pass) AND `xenia-rs check sylpheed.iso -n 100M` (must not regress against the 2026-04-29 addis-fix baseline of `swaps=2`). The acid test is whether `draws > 0` opens after P1 or P2. + +--- + +### Phase 1 — Cross-thread atomicity (PPCBUG-107 cascade) + +**Why first**: highest confidence smoking-gun for the renderer plateau. Single, mechanical, low-risk fix. Largest leverage relative to size. + +**Coupled — must land together**: +- PPCBUG-107 (root: missing call from stores) +- PPCBUG-130 (9 byte/halfword stores) +- PPCBUG-140, 141, 142, 143, 144 (5 word stores: stw/stwu/stwx/stwux/stwbrx) +- PPCBUG-150 (5 doubleword stores: std/stdu/stdx/stdux/stdbrx) +- PPCBUG-160 (3 multiple/string stores: stmw/stswi/stswx) +- PPCBUG-167 (9 FP stores) +- PPCBUG-511, 512, 513, 514 (16 VMX stores) + +**Independent but related**: +- PPCBUG-151 (stwcx/stdcx reservation width discriminator) — separate fix; add `reservation_width: u8` to PpcContext. +- PPCBUG-108 (legacy per-context path: cross-thread invalidation impossible) — informational; --reservations-table mode bypasses. + +**Approach** — one PR adds `if t.has_active_reservers() { t.invalidate_for_write(ea) }` before every `mem.write_*` call site. Scope: +``` +mem.write_u8 / write_u16 / write_u32 / write_u64 / write_f32 / write_f64 +mem.write_vec128 / write_vec128_aligned (for VMX) +``` +~38 sites total. Add 1+ targeted concurrency tests (lwarx + cross-thread plain store + stwcx., expect EQ=0). + +--- + +### Phase 2 — Decoder/field-extraction structural sweep + +**Why second**: single mechanical sweep, fixes 12 distinct HIGH-severity findings, unblocks correct execution of compiler-emitted code. Disassembler already has correct local extraction logic — promote/port. + +**Coupled — same commit**: +- PPCBUG-040 + PPCBUG-560 — fix `sh64()` bit order AND fix the test helper that was masking it +- PPCBUG-046 + PPCBUG-561 — promote `mb_md()` from `disasm.rs:1256` to `decoder.rs`; replace 6 inline-formula sites in interpreter.rs (rldicl/rldicr/rldic/rldimi/rldcl/rldcr) +- PPCBUG-275 + PPCBUG-276 + PPCBUG-420 + PPCBUG-421 + PPCBUG-422 + PPCBUG-562 — add `vc_rc_bit()` (PPC bit 21) and `vx128r_rc_bit()` (PPC bit 27); replace `instr.rc_bit()` at all VMX compare dot-form sites +- PPCBUG-315 + PPCBUG-563 — add `vx128_4_z()`, `vx128_4_imm()`; fix `vrlimi128` +- PPCBUG-361 + PPCBUG-565 — add `vx128_5_sh()`; fix `vsldoi128` +- PPCBUG-362 + PPCBUG-564 — add `vx128_p_perm()`; fix `vpermwi128` +- PPCBUG-423 + PPCBUG-600 — add 5 odd-key entries to `decode_op6` key4 for `vcmp*fp128.` dot forms + +**Independent in this phase**: +- PPCBUG-360 — `vperm128` reads VC from `vd128()` instead of VX128_2 VC field at integer bits 6-8. Fix at the call site (or add `vx128_2_vc()` accessor). +- PPCBUG-363 + PPCBUG-369 — `vpkd3d128` missing post-pack permutation; add the `pack`/`shift` field handling per Canary. + +**Test fixture updates required** (PPCBUG-560 lesson) — once `sh64()` is fixed, verify all `disasm_goldens.rs` test helpers encode shifts ISA-correctly. Don't trust the existing fixtures blindly. + +--- + +### Phase 3 — Other HIGH bugs (single targeted fixes) + +**Independent**: +- PPCBUG-510 — `stvewx128` corrupting 12 bytes per call. Direct fix: align EA to word, write only 4 bytes. +- PPCBUG-424 — `vmaddfp128` operand order: change `ai.mul_add(bi, di)` → `ai.mul_add(di, bi)`. +- PPCBUG-425 — `vmaddcfp128` operand order similarly. +- PPCBUG-053 + PPCBUG-054 — `bcx`/`bclrx` CTR zero-test (32-bit) + `mtspr CTR` truncation (defensive firewall). Coupled. +- PPCBUG-640 — `fmt_bc` spurious condition suffix on pure `bdnz`/`bdz`. Port the `fmt_bclr` pattern. +- PPCBUG-641 — `lwsync` shows as `sync` in disassembler (re-assessment of PPCBUG-088). Same fix. + +--- + +### Phase 4 — 32-bit ABI writeback truncation sweep + +**Why this phase**: cross-cutting, mechanical. Once ALL writebacks truncate via `as u32 as u64`, the systemic 32-bit-ABI invariant is restored and most CR0/CA helper-correctness concerns become moot. + +#### 4a — Active poisoning (every execution corrupts GPR upper bits) + +These bugs corrupt GPR upper bits **regardless** of whether upstream sources are clean — typically because the implementation applies Rust's `!u64` (full 64-bit NOT) somewhere: +- PPCBUG-006 (negx — `(!ra).wrapping_add(1)`) +- PPCBUG-008 (subfex — `(!ra).wrapping_add(rb).wrapping_add(ca)`) +- PPCBUG-018 (subfzex) +- PPCBUG-019 (subfmex) +- PPCBUG-028 (orcx — `rs | !rb`) +- PPCBUG-029 (norx — `!(rs | rb)` — the canonical `not` mnemonic, hot path) +- PPCBUG-030 (nandx) +- PPCBUG-031 (eqvx — `!(rs ^ rb)` — common `eqv rA, rA, rA` set-to-all-ones) +- PPCBUG-033 (andcx via `!rb`) +- PPCBUG-034 (extsbx — `as i8 as i64 as u64`) +- PPCBUG-035 (extshx) + +#### 4b — Same-shape-as-addis (latent under clean inputs, active when upstream is poisoned) + +- PPCBUG-001 (addi), PPCBUG-002 (addic), PPCBUG-003 (addicx), PPCBUG-005 (subficx), PPCBUG-007 (subfcx CA), PPCBUG-008 (subfex CA — also in 4a) +- PPCBUG-004 (mulli), PPCBUG-009 (mullwx) +- PPCBUG-010 + PPCBUG-011 (divwx writeback + CR0 — **must land together**, not independently) +- PPCBUG-041 + PPCBUG-042 + PPCBUG-043 (srawx/srawix writeback + CR0 coupling — **must land together**) +- PPCBUG-095, 096, 097, 098 (lha/lhax/lhau/lhaux halfword sign-extension) +- PPCBUG-105 (lwa/lwax/lwaux — note: 64-bit-mode-only; less common in 32-bit-ABI binaries) + +#### 4c — Latent writeback (only triggers if 4a/4b are unfixed) + +These can be fixed in the same sweep but won't fire under clean inputs: +- PPCBUG-012, 013, 014, 015, 016, 017 (addx/addcx/addex/addzex/addmex/subfx) +- PPCBUG-032 (andx/orx/xorx) + +#### 4d — CR0 32-bit-ABI compare (cross-cutting catch-all) + +PPCBUG-020 documents the catch-all; the per-opcode locations are referenced from there: +- PPCBUG-020 (catch-all in groups 2-5) +- PPCBUG-023 (andisx) +- PPCBUG-024 (rlwinmx), PPCBUG-025 (rlwimix), PPCBUG-026 (rlwnmx) +- PPCBUG-036 (extsbx), PPCBUG-037 (extshx) — **must land with PPCBUG-034/035** +- PPCBUG-044 (slwx/srwx) + +**Fix shape** — at every Rc=1 path, change `update_cr_signed(0, result as i64)` to `update_cr_signed(0, result as u32 as i32 as i64)`. Once 4a/4b/4c land, both forms become equivalent and 4d becomes belt-and-suspenders (still recommended for resilience). + +--- + +### Phase 5 — FPU correctness (graphics middleware impact) + +#### 5a — Round-to-int and FPSCR.RN + +- PPCBUG-221 + PPCBUG-227 (`round_to_i64` NearestEven broken near 2^52 — must land together; `round_to_i32` delegates) +- PPCBUG-201 (FPSCR.RN not honored for double arithmetic) +- PPCBUG-432 (vrfin/vrfin128 round-half-away-from-zero vs round-to-nearest-even) + +#### 5b — VXISI / NaN / SNaN handling for FMA family + +- PPCBUG-181, 182 (single fmaddsx/fmsubsx/fnmaddsx/fnmsubsx VXISI) +- PPCBUG-202, 203, 204 (double fmaddx/fmsubx/fnmaddx/fnmsubx VXISI — esp. 203 hot for Newton-Raphson) +- PPCBUG-183, 205 (fnmadd/fnmsub Rust unary `-` flips NaN sign — fix: skip negation on NaN) +- PPCBUG-186 (SNaN priority for FMA) +- PPCBUG-128 (lfs SNaN quietening — bit-manipulation widening helper needed) + +#### 5c — Inexact / FPSCR exception bits + +- PPCBUG-180 (single XX/FR/FI never set), PPCBUG-200 (double XX/FR/FI never set) +- PPCBUG-223 (fcmpo VXSNAN/VXVC), PPCBUG-224 (fcfidx XX), PPCBUG-225 (frspx XX/FR/FI), PPCBUG-229 (fctidx/fctidzx XX/FX), PPCBUG-230 (fctiwx/fctiwzx XX/FX), PPCBUG-231 (frspx SNaN host dependency) +- PPCBUG-165 + PPCBUG-166 + PPCBUG-168 (stfs* FPSCR + RN + SNaN) + +#### 5d — Subnormal flush (FPSCR.NI / VSCR.NJ) + +- PPCBUG-185 (FPU NI subnormal flush not modeled) +- PPCBUG-435, 436, 437 (VMX NJ subnormal flush — vaddfp/vsubfp/vmulfp128, vmsum3fp128/vmsum4fp128 product intermediates, vmaddfp/vmaddfp128/vmaddcfp128/vnmsubfp128 outputs) + +#### 5e — Estimate precision (vs hardware ~12-bit) + +- PPCBUG-184 (fres) +- PPCBUG-428..431 (vrefp, vrsqrtefp, vexptefp, vlogefp — same shape as fres) + +#### 5f — VMX float compares + saturation + +- PPCBUG-426, 427 (vnmsubfp/vnmsubfp128 double-rounding) +- PPCBUG-433 (vctsxs/vcfpsxws128 NaN saturate to INT_MIN) + +--- + +### Phase 6 — Other MEDIUM correctness + +- PPCBUG-021 (overflow.rs OE checks at bit 63 — sub-register ops; partly covered by P4) +- PPCBUG-022 (`mulld_ov` missing INT_MIN × -1) +- PPCBUG-027 (rlwimix upper-32 ISA-deviation — auto-resolves once P4 lands) +- PPCBUG-039 (cntlzdx 32-bit-ABI counts upper-zero — only matters if emitted) +- PPCBUG-063 (trap pc-after-advance) +- PPCBUG-064 (sc LEV field) +- PPCBUG-065 (twi 31, r0, IMM typed-trap — relevant to Sylpheed C++ throw work, see `project_xenia_rs_sylpheed_throw_2026_04_28.md`) +- PPCBUG-068 (mcrfs VX summary recomputation) +- PPCBUG-078 (mtmsrd L=1 partial MSR-write) +- PPCBUG-080 (mfvscr zero upper 96 bits) +- PPCBUG-123 + PPCBUG-124 + PPCBUG-161 + PPCBUG-566 (XER TBC for lswx/stswx — coupled; add `xer_tbc: u8` to PpcContext, wire into xer()/set_xer(); enables lswx and stswx) +- PPCBUG-125 (lmw RA-in-destination skip) +- PPCBUG-126 + PPCBUG-162 (lswi/stswi `instr.rb()` → `instr.nb()`) +- PPCBUG-487 + PPCBUG-495 (vsum* operand naming) +- PPCBUG-515 (lvebx/lvehx/lvewx vs Canary divergence — document; xenia-rs is more ISA-faithful) +- PPCBUG-516 (lvsr sh=0 case — add comment + debug_assert) +- PPCBUG-601 (decode_op6 overlapping windows — document the invariant) +- PPCBUG-642 (fmt_bcctr extended forms) +- PPCBUG-643 + PPCBUG-644 (SIMM/D-form decimal vs hex — alignment with Canary disassembly) +- PPCBUG-367 (vupkhpx/vupklpx channel replication vs zero-extend) +- PPCBUG-368 (vpkpx pack_pixel_555 channel assignment unverified) +- PPCBUG-366 (vspltisb/vspltish sign-extension idiom — fragile, not wrong) + +--- + +### Phase 7 — Frozen-snapshot drift (separate sweep) + +8 opcodes' frozen snapshots in `ppc-manual//.md` differ from live code: +- PPCBUG-066 (td/tdi/tw/twi) +- PPCBUG-117 (ldarx) +- PPCBUG-145 (stwcx) +- PPCBUG-560 (already-listed: rldicl test helper bit-order) +- Plus the implicit drift in addicx (PPCBUG-003), andisx (PPCBUG-023), cmp/cmpi (PPCBUG-050), extsbx/extshx (PPCBUG-036/037, PPCBUG-032 in batch 1) + +**Recommendation**: regenerate frozen snapshots from current code for the entire ppc-manual after Phases 1-4 land. Add a CI check that compares snapshots vs live code on every PR. + +--- + +### Phase 8 — Test gap closure (broad) + +Single PR per group is overkill; recommend bundling test additions with each Phase 1-6 PR (test the bug being fixed). The remaining LOW IDs are pure-test-gap entries — list: + +- PPCBUG-045 (shift), 047 (rld), 055 (branch), 067 (trap+sc), 070 (CR logical) +- PPCBUG-081, 082, 083, 084, 085 (SPR/MSR/TB/FPSCR/VSCR moves), 089 (cache+sync) +- PPCBUG-091 (lbz), 100 (lha), 109, 110, 111 (lwa/lwbrx/lwarx), 118 (ld), 127 (lmw/lswi/lswx), 129 (lfs/lfd) +- PPCBUG-132 (stb/sth), 146, 147 (stw/stwcx), 153 (std/stdcx), 163 (stmw/stswi/stswx), 171 (stfs/stfd) +- PPCBUG-187 (FPU single), 208 (FPU double), 228 (FPU misc convert) +- PPCBUG-240 (VMX add/sub), 243 (VMX sat helpers) +- PPCBUG-277, 278, 279 (VMX compare/min/max/avg) +- PPCBUG-316, 317, 320, 321, 322, 323, 324, 325 (VMX shift/rotate/logical) +- PPCBUG-370, 371, 372, 373, 374, 375, 376, 377, 378 (VMX permute/pack) +- PPCBUG-438, 439, 440 (VMX float compare/round/convert) +- PPCBUG-490, 491, 492, 493, 494 (VMX multiply-sum) +- PPCBUG-517, 518, 519 (VMX load/store) +- PPCBUG-567 (decoder accessors) +- PPCBUG-604 (decoder dispatch tables) +- PPCBUG-649, 650, 652 (golden fixtures for branches/VMX128) + +--- + +## Notes & administrative + +### Withdrawn / retracted + +- **PPCBUG-220** — `fctiwx` strict-`>` threshold actually correct (`i32::MAX` exactly representable in f64). Retracted by group-31 subagent. +- **PPCBUG-222** — `fctidx` positive-overflow sentinel `0x7FFF_FFFF_FFFF_FFFF` is the correct ISA value. Retracted. +- **PPCBUG-226** — FPRF 5-bit codes for fcmpu/fcmpo are correct per PowerISA. Retracted. +- **PPCBUG-482** — `vmhaddshs` shift `>>15` is correct per spec snapshots. Retracted. +- **PPCBUG-483** — `vmhraddshs` shift `>>15` is correct per spec snapshots. Retracted. + +### Wontfix / informational (not retracted but no fix needed) + +- **PPCBUG-038** — extswx ISA-correct, intentional 64-bit sign-extension. Document the asymmetry with extsb/extsh after PPCBUG-034/035 land. +- **PPCBUG-090, 099, 152** — invalid-form (rD==rA) silently destroys load/store result. Per ISA: undefined behavior. No compiler emits these; matches Canary. Optional `debug_assert!`. +- **PPCBUG-106, 115, 131, 169, 170, 206, 207, 318, 319, 364, 365, 434, 651, 653, 645, 646, 648** — informational confirmations that the implementation is correct, no change needed. +- **PPCBUG-069** — test comment OX(so)=0 is wrong but the assert is correct. +- **PPCBUG-602, 603, 605** — undocumented decoder dispatch quirks; correct but should add comments. +- **PPCBUG-647, 654** — disassembler edge-case behavior on invalid encodings; not-a-bug for valid input. + +### Coupling matrix (must-land-together) + +| Group | IDs | Reason | +|---|---|---| +| divwx | 010, 011 | Quotient zero-extension changes the CR0 sign view | +| srawx/srawix | 041, 042, 043 | Writeback truncation invalidates the CR0 view | +| extsbx/extshx | 034+036, 035+037 | Same coupling shape as srawx | +| sh64 | 040, 560 | Test helper is wrong in the inverse direction | +| mb_md sweep | 046, 561 | Promote disasm.rs accessor first | +| VC-form Rc | 275, 276, 420, 421, 562 | All consume the same new accessor | +| VMX128_R Rc | 422, 562 | Same accessor sweep | +| vrlimi128 | 315, 563 | Field accessor + caller fix | +| vsldoi128 | 361, 565 | Field accessor + caller fix | +| vpermwi128 | 362, 564 | Field accessor + caller fix | +| vcmp*fp128. | 423, 600 | decode_op6 odd keys + opcode mapping | +| XER TBC | 123, 124, 161, 566 | Add field, wire xer()/set_xer(), enables lswx/stswx | +| round_to_i64 | 221, 227 | round_to_i32 delegates | +| stfs FPSCR | 165, 166, 168 | Single fix shape covers all three | + +### Dependency on the addis fix + +The addis fix (`project_xenia_rs_addis_signext_root_cause_2026_04_29.md`) is already in place. Phase 4 generalizes that fix systematically; without it, the writeback-truncation invariant would still be incomplete. + +### Anticipated impact on the Sylpheed renderer plateau + +Strong candidates for direct cause of the plateau: +- **PPCBUG-107** — broken atomics. Workers wait forever on never-signaled events; classical broken-spinlock symptom. +- **PPCBUG-053+054** — broken `bdnz` loops; could explain workers parked indefinitely. +- **PPCBUG-046 (`clrldi r3, r4, 32`)** — pollution propagation in 32-bit ABI; could break any pointer-clean-up sequence. + +After applying Phase 1 alone, run `xenia-rs check sylpheed.iso -n 4B --parallel` and check whether `draws > 0`. If yes, the plateau was atomics; if no, proceed to P2/P3. + +--- + +## Progress log + +### P1 — Cross-thread atomicity sweep (merged 2026-05-01, HEAD ca5b90b) + +**PPCBUGs fixed**: 107, 130, 140, 141, 142, 143, 144, 150, 160, 167, 511, 512, 513, 514, 151, 108. Plus review-fix additions: dcbz, dcbz128, stswi two-line, stswx two-line (merged in review-fix commit c9f194d). + +**Gate results**: +- `cargo test --workspace --release`: 449 passed, 0 failed +- `-n 100M` lockstep: swaps=2, clean +- `-n 100M --parallel --reservations-table`: swaps=2, clean +- **Acid test** `-n 4B --parallel --reservations-table`: swaps=2, draws=**0**, no RtlRaiseException, no panics + +**Conclusion**: P1 did NOT unblock the Sylpheed renderer. `draws` remains 0. The renderer plateau is not caused by broken cross-thread atomics alone. Proceeding to P2 (decoder/field-extraction sweep). The strongest remaining candidate per the plan is PPCBUG-046 (`clrldi r3, r4, 32` no-op). + +--- + +### P2 — Decoder/field-extraction structural sweep (merged 2026-05-01, HEAD see `git log master --oneline -1`) + +**PPCBUGs fixed**: 040, 046, 275, 276, 315, 360, 361, 362, 363, 369, 420, 421, 422, 423, 560, 561, 562, 563, 564, 565, 600. + +**Batches**: +- Batch 1: PPCBUG-040+560 — sh64() bit-order fix (XS-form SH split) + rldicl test helper encoding +- Batch 2: PPCBUG-046+561 — mb_md() accessor; all 6 rld* MB fields corrected (clrldi was a no-op) +- Batch 3: PPCBUG-275+276+420+421+422+423+562+600 — vc_rc_bit()/vx128r_rc_bit() Rc accessors; 13 vcmp interpreter sites; 5 decode_op6 dot-form entries +- Batch 4: PPCBUG-315+563 — vrlimi128 vx128_4_z/imm field extraction +- Batch 5: PPCBUG-361+565 — vsldoi128 vx128_5_sh field extraction +- Batch 6: PPCBUG-362+564 — vpermwi128 vx128_p_perm field extraction +- Batch 7: PPCBUG-360 — vperm128 vc128_2() accessor (was erroneously vd128()) +- Batch 8: PPCBUG-363+369 — vpkd3d128 post-pack permutation (MakePermuteMask tables from canary) + +**Gate results**: +- `cargo test --workspace --release`: 201 (cpu) + 6 (disasm goldens) + 144 + 76 + 16 + 8 + … passed, 0 failed +- Independent code reviewer: all 9 check items OK +- `-n 100M` lockstep smoke: ISO not available in CI environment; last known good at P1 HEAD was swaps=2 +- **Acid test** `-n 4B --parallel --reservations-table`: pending (ISO not in CI environment) + +**Conclusion**: All P2 fixes applied and reviewed. Decoder field extraction is now correct for all audited VMX128 and MD/XS-form instructions. Whether P2 unblocks the renderer (`draws > 0`) requires the sylpheed.iso acid test on the user's machine. PPCBUG-046 (clrldi no-op fix) was the highest-probability P2 renderer-unblock candidate. Next: P3 — isolated HIGH bugs (PPCBUG-510, 424/425, 053+054, 640, 641). + +--- + +## Index — every PPCBUG referenced (in numerical order) + +This list intentionally includes every ID found in `audit-findings.md` so nothing is dropped. For each entry's full description / file:line / fix snippet / test recommendation, see the corresponding `### PPCBUG-NNN` heading in `audit-findings.md`. + +001-022 (batch 1: integer ALU): 001, 002, 003, 004, 005, 006, 007, 008, 009, 010, 011, 012, 013, 014, 015, 016, 017, 018, 019, 020, 021, 022. + +023 (batch 2 group 6 logic immediate): 023. + +024-027 (batch 2 group 9 word rotate): 024, 025, 026, 027. + +028-033 (batch 2 group 7 logic register): 028, 029, 030, 031, 032, 033. + +034-039 (batch 2 group 8 sign-extend / count-leading-zeros): 034, 035, 036, 037, 038, 039. + +040-045 (batch 2 group 11 shift): 040, 041, 042, 043, 044, 045. + +046-047 (batch 2 group 10 doubleword rotate): 046, 047. + +048-052 reserved (group 12 compare): 048, 049, 050. + +053-055 (batch 3 group 13 branch): 053, 054, 055. + +063-067 (batch 3 group 14 trap+sc): 063, 064, 065, 066, 067. + +068-070 (batch 3 group 15 CR logical): 068, 069, 070. + +078-085 (batch 3 group 16 SPR/MSR/TB/FPSCR/VSCR): 078, 079, 080, 081, 082, 083, 084, 085. + +088-089 (batch 3 group 17 cache+sync): 088, 089. + +090-091 (batch 4 group 18 load byte): 090, 091. + +095-100 (batch 4 group 19 load halfword): 095, 096, 097, 098, 099, 100. + +105-111 (batch 4 group 20 load word + reservation): 105, 106, 107, 108, 109, 110, 111. + +115-118 (batch 4 group 21 load doubleword): 115, 116, 117, 118. + +123-127 (batch 4 group 22 load multiple/string): 123, 124, 125, 126, 127. + +128-129 (batch 4 group 23 load float): 128, 129. + +130-132 (batch 5 group 24 store byte/halfword): 130, 131, 132. + +140-147 (batch 5 group 25 store word + stwcx): 140, 141, 142, 143, 144, 145, 146, 147. + +150-153 (batch 5 group 26 store doubleword): 150, 151, 152, 153. + +160-163 (batch 5 group 27 store multiple/string): 160, 161, 162, 163. + +165-171 (batch 5 group 28 store float): 165, 166, 167, 168, 169, 170, 171. + +180-187 (batch 6 group 29 FPU single arithmetic): 180, 181, 182, 183, 184, 185, 186, 187. + +200-208 (batch 6 group 30 FPU double arithmetic): 200, 201, 202, 203, 204, 205, 206, 207, 208. + +220-231 (batch 6 group 31 FPU sign/move/compare/convert): 220 [retracted], 221, 222 [retracted], 223, 224, 225, 226 [retracted], 227, 228, 229, 230, 231. + +240-243 (batch 7 group 32 VMX integer add/sub): 240, 241, 242, 243. + +275-279 (batch 7 group 33 VMX integer compare/min/max/avg): 275, 276, 277, 278, 279. + +315-325 (batch 7 group 34 VMX integer logical/shift/rotate): 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325. + +360-378 (batch 8 group 35 VMX permute/pack): 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378. + +420-440 (batch 8 group 36 VMX float arith+compare): 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440. + +482-495 (batch 8 group 37 VMX multiply-sum + special): 482 [retracted], 483 [retracted], 487, 490, 491, 492, 493, 494, 495. + +510-519 (batch 8 group 38 VMX load/store): 510, 511, 512, 513, 514, 515, 516, 517, 518, 519. + +560-567 (Phase C1 decoder field extractors): 560, 561, 562, 563, 564, 565, 566, 567. + +600-605 (Phase C2 decoder opcode-lookup): 600, 601, 602, 603, 604, 605. + +640-654 (Phase C3 disassembler formatter): 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654. + +**Counted IDs**: 253. **Retracted**: 220, 222, 226, 482, 483 (5). **Net actionable**: 248. + +**Counted by phase here**: P1 (~17 IDs), P2 (~17 IDs), P3 (~7 IDs), P4 (~30 IDs), P5 (~30 IDs), P6 (~25 IDs), P7 (~5 IDs), P8 (~50 IDs), Notes (~30 wontfix/informational/retracted). Total accounts for all 253 IDs — every ID is either in a fix phase, the wontfix/informational list, or retracted. **Nothing has been dropped.**