diff --git a/audit-findings.md b/audit-findings.md
new file mode 100644
index 0000000..26cbb97
--- /dev/null
+++ b/audit-findings.md
@@ -0,0 +1,3416 @@
+# PPC Instruction Audit — Findings Tracker
+
+**Started**: 2026-04-29 (single session, audit-only)
+**Trigger**: `addis` 32-bit-ABI sign-extension fix surfaced a likely systemic class of bugs.
+**Status**: in flight. Per-group reports live in `audit-out/`. This file is the consolidated, stable-ID index.
+**Workflow**: audit only this session; fix session(s) reference these IDs.
+
+## Conventions
+
+- Every finding has an ID `PPCBUG-NNN` for cross-referencing.
+- **Status**: `open` (audit found it, not yet fixed) | `applied` (fix landed) | `wontfix` (intentional) | `dup-of:NNN` (collapsed into another finding).
+- **Severity**:
+  - **HIGH** = wrong arithmetic / control flow on plausible Xbox 360 user code.
+  - **MEDIUM** = wrong status flag / latent under broken upstream invariants / edge case.
+  - **LOW** = test gap / cosmetic / dead-code-only.
+- All file:line refs are `xenia-rs/crates/xenia-cpu/src/interpreter.rs` unless otherwise noted.
+- Suggested fixes are written as one-line patches where possible; see the per-group report for full context.
+
+## Cross-cutting recommendation
+
+The single recurring root cause is **violating the 32-bit ABI invariant that all GPR writes truncate to 32 bits**. The cleanest fix is to systematically apply `as u32 as u64` at every GPR writeback in every integer ALU op. The existing CA/CR0/OE helpers will then be correct without further changes (because their inputs become guaranteed-clean). The audit reports list each fix individually; the fix session may choose to apply them as one sweep or one-at-a-time.
+
+A defensive secondary recommendation: even after the writeback truncation, instructions whose CA computation does its own internal arithmetic on 64-bit operands (`subfcx`, `subfex`, `addic`, `addicx`, `subficx`) should additionally truncate their compare operands. This guards against any future regression that re-pollutes the GPR file.
+
+---
+
+## Batch 1 — integer ALU (groups 1-5)
+
+Per-group reports: `audit-out/group-01-add-imm.md`, `group-02-add-reg.md`, `group-03-sub-reg.md`, `group-04-multiply.md`, `group-05-divide.md`.
+
+### PPCBUG-001 — addi sign-extension, no truncation
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:114-118
+- **Symptom**: `addi rT, r0, -1` (= `li rT, -1`) writes `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`. Identical shape to addis.
+- **Fix**:
+  ```rust
+  ctx.gpr[instr.rd()] = ra_val.wrapping_add(instr.simm16() as i64 as u64) as u32 as u64;
+  ```
+- **Test gap**: existing `test_addi` only covers positive simm16. Add a test for `li rT, -1` and verify the upper 32 bits are zero.
+
+### PPCBUG-002 — addic untruncated writeback + 64-bit CA compare
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:133-140
+- **Symptom**: (a) GPR writeback not truncated (same shape as addi). (b) CA computed via 64-bit `result < ra` — Canary's `AddDidCarry` explicitly truncates both operands to int32 first.
+- **Fix**:
+  ```rust
+  let ra32 = ra as u32;
+  let imm = instr.simm16() as i32 as u32;
+  let result32 = ra32.wrapping_add(imm);
+  ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
+  ctx.gpr[instr.rd()] = result32 as u64;
+  ```
+- **Test gap**: zero unit tests for addic.
+
+### PPCBUG-003 — addicx untruncated writeback + 64-bit CA + CR0 regression
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:141-150
+- **Symptom**: same as PPCBUG-002 plus a CR0 regression: live code uses `update_cr_signed(0, result as i64)` (64-bit signed). The frozen snapshot in `ppc-manual/alu/addicx.md` shows the previously-correct `result as i32 as i64` form. Live code has drifted.
+- **Fix**: PPCBUG-002 fix plus `update_cr_signed(0, result32 as i32 as i64)`.
+- **Test gap**: zero unit tests.
+- **Note**: confirms the manual's frozen snapshots are useful drift detectors — see if other opcodes have similarly regressed.
+
+### PPCBUG-004 — mulli untruncated 64-bit signed product
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:159-164
+- **Symptom**: RA read as full `i64`, product stored as `u64` without truncation. Per ISA in 32-bit ABI, both factors should be i32 and product should fit in 32 bits (overflow silently wraps per ISA).
+- **Fix**:
+  ```rust
+  let ra = ctx.gpr[instr.ra()] as i32 as i64;
+  let imm = instr.simm16() as i64;
+  ctx.gpr[instr.rd()] = (ra.wrapping_mul(imm) as u32) as u64;
+  ```
+- **Test gap**: zero unit tests.
+
+### PPCBUG-005 — subficx untruncated writeback + 64-bit CA compare
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:151-158
+- **Symptom**: (a) `imm.wrapping_sub(ra)` on 64-bit values writes poisoned upper bits; sign-extended `imm` for negative SIMM has bits 32-63 set. (b) CA `imm >= ra` is 64-bit unsigned compare; wrong relative to Canary's 32-bit form.
+- **Fix**:
+  ```rust
+  let ra32 = ra as u32;
+  let imm32 = instr.simm16() as i32 as u32;
+  let result32 = imm32.wrapping_sub(ra32);
+  ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 };
+  ctx.gpr[instr.rd()] = result32 as u64;
+  ```
+- **Test gap**: zero unit tests.
+
+### PPCBUG-006 — negx active GPR poisoning + 64-bit OE overflow check
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:319-330
+- **Symptom**: (a) `(!ra).wrapping_add(1)` unconditionally sets upper 32 bits to all-ones because `!ra` flips them. Even a clean `r3 = 5` produces `0xFFFFFFFF_FFFFFFFB` instead of `0x00000000_FFFFFFFB`. **This is active, not latent — every neg in 32-bit-ABI code poisons the GPR.** (b) `neg_ov_64` overflow predicate tests `ra == 0x8000_0000_0000_0000` (64-bit INT_MIN) instead of `ra == 0x0000_0000_8000_0000` (32-bit INT_MIN).
+- **Fix**:
+  ```rust
+  let result = (!(ra as u32)).wrapping_add(1);
+  ctx.gpr[instr.rd()] = result as u64;
+  if instr.oe() {
+      overflow::apply(ctx, (ra as u32) == 0x8000_0000);
+  }
+  if instr.rc_bit() { ctx.update_cr_signed(0, result as i32 as i64); }
+  ```
+- **Test gap**: existing `nego_sets_ov_only_on_int_min` tests 64-bit INT_MIN — add a 32-bit INT_MIN case.
+
+### PPCBUG-007 — subfcx CA via 64-bit unsigned compare
+- **Severity**: HIGH (defensive — same shape as the compare that broke addis)
+- **Status**: open
+- **Location**: interpreter.rs:258
+- **Symptom**: `if rb >= ra { 1 } else { 0 }` is the exact 64-bit unsigned compare that the addis bug exploited. Wrong CA when either operand has poisoned upper 32 bits. Apply defensively even if all upstream sources are cleaned, because a wrong CA bit is unrecoverable downstream.
+- **Fix**:
+  ```rust
+  let ra32 = ra as u32;
+  let rb32 = rb as u32;
+  let result32 = rb32.wrapping_sub(ra32);
+  ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 };
+  ctx.gpr[instr.rd()] = result32 as u64;
+  ```
+- **Test gap**: zero dedicated unit tests for subfcx — the most critical opcode in Group 3 had no coverage. Add 6+ tests including the exact 0x828F3F98 / 0x828F3F68 case from the addis incident.
+
+### PPCBUG-008 — subfex CA via 64-bit unsigned compare + `!ra` poisons writeback
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:268-284
+- **Symptom**: (a) CA `if rb > ra || (rb == ra && ca != 0)` is 64-bit; same shape as PPCBUG-007. (b) Writeback uses `(!ra).wrapping_add(rb).wrapping_add(ca)` — `!ra` always sets upper 32 bits, guaranteed GPR poison even with clean inputs (same shape as PPCBUG-006).
+- **Fix**:
+  ```rust
+  let ra32 = ra as u32;
+  let rb32 = rb as u32;
+  let ca = ctx.xer_ca as u32;
+  let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca);
+  ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 };
+  ctx.gpr[instr.rd()] = result32 as u64;
+  ```
+
+### PPCBUG-009 — mullwx untruncated 64-bit signed product
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:331-344
+- **Symptom**: 32x32 multiply produces 64-bit signed `i64` product, written to GPR via `as u64` without truncation. When product overflows i32 (which `mullw_ov` correctly detects), upper 32 bits are non-zero and corrupt downstream 64-bit unsigned compares — same class as addis.
+- **Fix** (one line; OE handler unchanged):
+  ```rust
+  ctx.gpr[instr.rd()] = product as u32 as u64;
+  ```
+
+### PPCBUG-010 — divwx quotient sign-extended to 64 bits
+- **Severity**: HIGH
+- **Status**: open (must be applied in same commit as PPCBUG-011)
+- **Location**: interpreter.rs:373
+- **Symptom**: `(ra / rb) as i64 as u64` sign-extends a negative i32 quotient. `-10 / 3 = -3` writes `0xFFFFFFFF_FFFFFFFD` instead of `0x00000000_FFFFFFFD`. Canary's `InstrEmit_divwx` uses `f.ZeroExtend(v, INT64_TYPE)` — explicit zero-extension.
+- **Fix**: `ctx.gpr[instr.rd()] = (ra / rb) as u32 as u64;`
+
+### PPCBUG-011 — divwx CR0 update breaks after PPCBUG-010 fix
+- **Severity**: MEDIUM (coupled to PPCBUG-010 — must land together)
+- **Status**: open
+- **Location**: interpreter.rs:379
+- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.rd()] as i64)` accidentally works today because the sign-extended GPR has consistent sign in i64 view. After PPCBUG-010, GPR holds `0x00000000_FFFFFFFD` for `-3` and `as i64` reads positive — CR0.LT will be wrong for negative quotients.
+- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64);`
+
+### PPCBUG-012 — addx writeback not truncated (latent)
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:167-179
+- **Symptom**: 64-bit `wrapping_add` result written to GPR untruncated. Latent: only triggers if upstream operands have poisoned upper 32 bits. With PPCBUG-001 etc. unfixed, that invariant is broken — addx amplifies the poison.
+- **Fix**: `ctx.gpr[instr.rd()] = result as u32 as u64;`
+
+### PPCBUG-013 — addcx writeback not truncated (latent)
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:180-193
+- **Fix**: same shape as PPCBUG-012.
+
+### PPCBUG-014 — addex writeback not truncated (latent)
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:194-209
+- **Fix**: same shape as PPCBUG-012.
+
+### PPCBUG-015 — addzex writeback not truncated (latent)
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:210-224
+- **Fix**: same shape as PPCBUG-012.
+
+### PPCBUG-016 — addmex writeback not truncated (latent + edge case)
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:225-240
+- **Symptom**: same writeback issue plus the `wrapping_sub(1)` produces all-ones upper 32 bits when low 32 bits underflow — guaranteed poison even if inputs are clean (same shape as PPCBUG-006/008).
+- **Fix**: truncate operands and result to 32 bits.
+
+### PPCBUG-017 — subfx writeback not truncated (latent)
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:241-253
+- **Fix**: same shape as PPCBUG-012.
+
+### PPCBUG-018 — subfzex writeback not truncated + `!ra` poisons
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:285-302
+- **Symptom**: `(!ra).wrapping_add(ca)` flips upper 32 bits — guaranteed poison.
+- **Fix**: truncate ra to u32, do arithmetic on u32, write `as u64`.
+
+### PPCBUG-019 — subfmex writeback poisoning + always-true CA edge
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:303-318
+- **Symptom**: (a) writeback poisoned via `(!ra)`. (b) CA predicate `(!ra) != 0` is always true when ra has clean upper 32 bits (because `!ra` flips them) — so CA is always 1, even in the documented edge case where 32-bit `ra == 0xFFFFFFFF && ca == 0` should yield CA=0.
+- **Fix**: operate on u32, then `xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 }`.
+
+### PPCBUG-020 — CR0 update uses 64-bit signed compare in all sub-register ops
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:250, 264, 281, 299, 315, 327, 341, 379, 396, 410, 419, 428, 445, 462 (every Rc=1 path in groups 2-5)
+- **Symptom**: `update_cr_signed(0, result as i64)` views result as 64-bit signed. In 32-bit ABI, bit 31 determines LT/GT, not bit 63. A result like `0x00000000_80000000` is negative in 32-bit but positive in 64-bit — CR0.LT inverted.
+- **Fix (catch-all)**: change to `result as u32 as i32 as i64` everywhere. Once PPCBUG-001..-019 truncate writebacks, the upper 32 bits of `result` are zero and this distinction becomes moot — but applying both is cheap and provides defense in depth.
+- **Note**: this is one logical fix duplicated across all rc paths; the fix session should grep `update_cr_signed(0, .* as i64)` to find them all.
+
+### PPCBUG-021 — OE overflow checks at bit 63 in all sub-register ops
+- **Severity**: LOW
+- **Status**: open
+- **Locations**: throughout — `add_ov_64`, `sub_ov_64`, `sum_overflow_64`, `mullw_ov`, etc. (defined in `xenia-cpu/src/overflow.rs`)
+- **Symptom**: signed-overflow check operates on 64-bit boundary. For 32-bit-ABI ops (`addo`, `subfo`, `subfco`, etc.), should check at bit 31. With PPCBUG-006 a tighter form was given for `negx`. The pattern probably needs systematic review across overflow.rs.
+- **Fix**: open a follow-up audit of overflow.rs after batch B completes.
+
+### PPCBUG-022 — mulld_ov missing INT_MIN * -1 edge case
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `xenia-cpu/src/overflow.rs` (`mulld_ov` helper)
+- **Symptom**: 64-bit signed multiply overflow check doesn't handle `i64::MIN * -1`.
+- **Fix**: add the special case to the helper.
+
+### PPCBUG-023 — andisx CR0 update uses 64-bit signed compare; should use 32-bit
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:475
+- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` interprets the result as 64-bit signed. The `andisx` result is bounded by `0x0000_0000_FFFF_0000`, which is always non-negative in 64-bit view. In 32-bit ABI, bit 31 is the sign bit — results with bit 31 set (e.g. `andis. rA, rS, 0x8000` with rS=0x80000000 → result=0x80000000) should yield CR0.LT=1, but xenia-rs gives CR0.GT=1. The ppc-manual frozen snapshot for `andisx` shows the correct `as i32 as i64` form; the live code has drifted. Common trigger: `andis. rA, rS, 0x8000` to test the sign bit of a 32-bit word.
+- **Fix**:
+  ```rust
+  ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);
+  ```
+- **Test gap**: zero tests for `andisx`. Add at minimum: result with bit 31 set (expect LT=1), result with bits 0–30 set (expect GT=1), result=0 (expect EQ=1).
+
+---
+
+## Batch 2 — logical immediate (group 6)
+
+Per-group report: `audit-out/group-06-logic-imm.md`.
+
+Group 6 summary: only 1 new bug found. The `simm16` sign-extension pattern does not apply (all ops use `uimm16`). `ori`, `oris`, `xori`, `xoris`, and `andix` are ISA-correct; `andisx` has a CR0 interpretation bug (PPCBUG-023). All 6 opcodes have inadequate test coverage (LOW gaps for 5 of them, MEDIUM gap for `andisx` tied to the bug).
+
+---
+
+## Batch 3 — word rotate-and-mask (group 9)
+
+Per-group report: `audit-out/group-09-word-rotate.md`.
+
+Group 9 summary: core arithmetic is clean — `rlw_mask`, rotate logic, and result write are all ISA-correct. The single recurring defect is the Rc=1 CR0 path using `as i64` instead of `as u32 as i32 as i64` (instances of PPCBUG-020 specific to these three opcodes). `rlwimix` zeroes the upper 32 bits of RA instead of preserving them per ISA, but this is safe under 32-bit ABI invariant and classified LOW. Test coverage is poor: 1 partial test for `rlwinmx`, zero for the other two.
+
+### PPCBUG-024 — rlwinmx CR0 update uses 64-bit signed compare; should use 32-bit
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:667
+- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` — result is a zero-extended u32, so bit 31 set yields +2147483648 in 64-bit signed view but -2147483648 in 32-bit ABI. CR0.LT/GT inverted for results with bit 31 set. `rlwinm.` is the most common dot-form instruction in compiler output (all `slwi.`, `srwi.`, `clrlwi.`, bitfield-test-and-branch idioms).
+- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
+- **Test gap**: `test_rlwinm` exists but non-Rc only, result has bit 31 clear. Add Rc=1 tests with bit 31 set in result.
+
+### PPCBUG-025 — rlwimix CR0 update uses 64-bit signed compare; should use 32-bit
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:679
+- **Symptom**: same class as PPCBUG-024. `rlwimi.` is compiler-generated for struct bitfield writes; when the inserted value occupies or sets bit 31 of RA, CR0.LT is wrong.
+- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
+- **Test gap**: zero tests for `rlwimix`. Add basic insert (non-Rc) + Rc=1 with bit-31-set case.
+
+### PPCBUG-026 — rlwnmx CR0 update uses 64-bit signed compare; should use 32-bit
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:690
+- **Symptom**: same class as PPCBUG-024. `rlwnm.` is less frequent but used in variable-shift normalisation patterns.
+- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
+- **Test gap**: zero tests for `rlwnmx`.
+
+### PPCBUG-027 — rlwimix zeroes upper 32 bits of RA instead of preserving them (ISA deviation, LOW)
+- **Severity**: LOW
+- **Status**: open (no fix action required for 32-bit ABI emulation)
+- **Location**: interpreter.rs:677-678
+- **Symptom**: `let ra = ctx.gpr[instr.ra()] as u32` discards upper 32 bits; result written as `as u64` zero-extends. Per ISA, `(RA) & ¬MASK(MB+32, ME+32)` preserves upper 32 bits of RA. Canary confirms: `f.And(f.LoadGPR(i.M.RA), f.LoadConstantUint64(~m))` with `~m` non-zero in upper half.
+- **Impact**: under 32-bit ABI, if the 32-bit GPR invariant holds, upper 32 bits of RA are already zero before `rlwimix`, so both behaviours are identical. The deviation is only observable if an upstream bug (PPCBUG-001..023) has leaked non-zero upper bits into RA — in which case `rlwimix` would silently clean them (beneficial side-effect). No isolated fix needed; resolves automatically when upstream bugs are fixed.
+- **Note**: if 64-bit mode support is ever added, this will become a HIGH bug.
+
+---
+
+## Batch 2 — logical register (group 7) [renumbered from collision]
+
+Per-group report: `audit-out/group-07-logic-reg.md` (note: report uses original IDs PPCBUG-023..029 from the subagent's local numbering; tracker uses PPCBUG-028..033 here to avoid collision with groups 6 and 9).
+
+The group 7 subagent also flagged a CR0 regression across all 8 opcodes — that is an extension of PPCBUG-020 (catch-all for CR0 64-bit-signed regressions). Adding andx, andcx, orx, orcx, xorx, norx, nandx, eqvx Rc=1 paths to PPCBUG-020's scope rather than creating a new ID.
+
+### PPCBUG-028 — orcx active GPR poisoning
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:509-513
+- **Symptom**: writes `rs | !rb`. Rust's `!` on `u64` flips all 64 bits — the upper 32 bits of `!rb` are unconditionally all-ones, OR'd into the result. With clean inputs `orc r5, r3, r4` writes `0xFFFFFFFF_xxxxxxxx`. Active poisoning, same shape as PPCBUG-006/008.
+- **Fix**: operate on u32, write `as u64`:
+  ```rust
+  let result = (ctx.gpr[instr.rs()] as u32) | !(ctx.gpr[instr.rb()] as u32);
+  ctx.gpr[instr.ra()] = result as u64;
+  ```
+- **Test gap**: zero tests.
+
+### PPCBUG-029 — norx active GPR poisoning (the `not` simplified mnemonic)
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:519-523
+- **Symptom**: writes `!(rs | rb)` — outer `!` flips upper 32 bits unconditionally. **`nor rA, rS, rS` is the canonical `not` simplified mnemonic** used pervasively in PPC code; every `not` in 32-bit-ABI Xbox 360 binaries actively poisons the GPR.
+- **Fix**: u32 arithmetic, write `as u64`.
+
+### PPCBUG-030 — nandx active GPR poisoning
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:524-528
+- **Symptom**: writes `!(rs & rb)` — same shape as norx. The simplified mnemonic `nand` is also `nand rA, rS, rS` (= `nor . . .` in some assemblers).
+- **Fix**: u32 arithmetic.
+
+### PPCBUG-031 — eqvx active GPR poisoning
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:529-533
+- **Symptom**: writes `!(rs ^ rb)` — same shape. The idiom `eqv rA, rS, rS` "set rA to all-ones (i.e. -1 in 32-bit ABI)" produces `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`.
+- **Fix**: u32 arithmetic.
+
+### PPCBUG-032 — andx / orx / xorx writeback not truncated (latent)
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:494-498 (andx), 504-508 (orx), 514-518 (xorx)
+- **Symptom**: 64-bit bitwise on full GPR values. Latent — clean if both operands are clean; pollutes if either is poisoned upstream.
+- **Fix**: `as u32 as u64` truncation at writeback. Once all upstream poison sources are fixed, these become unnecessary; until then, defensive truncation.
+
+### PPCBUG-033 — andcx active poisoning via `!rb` sub-expression
+- **Severity**: MEDIUM (the `!rb` always poisons; outer `&` masks it away when rs is clean — fully active when rs is poisoned)
+- **Status**: open
+- **Location**: interpreter.rs:499-503
+- **Symptom**: writes `rs & !rb`. The `!rb` always has all-ones upper bits; if rs has clean upper bits (zero), the result is clean. If rs is poisoned upstream, the poison propagates AND the always-set bits in `!rb` make it look "guaranteed". This is closer to active than latent.
+- **Fix**: `(rs as u32) & !(rb as u32)` then `as u64`.
+
+## Batch 2 — sign-extend / count-leading-zeros (group 8) [renumbered]
+
+Per-group report: `audit-out/group-08-extend-clz.md` (report uses local IDs PPCBUG-023..030; tracker uses PPCBUG-034..039).
+
+### PPCBUG-034 — extsbx writeback sign-extends to 64 bits
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:537
+- **Symptom**: `as i8 as i64 as u64` — a byte with high bit set (0x80) writes `0xFFFFFFFF_FFFFFF80` instead of `0x00000000_FFFFFF80`. Active poisoning on every negative byte. `extsb` is emitted by compilers to canonicalize signed-byte arguments — common code path.
+- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i8 as i32 as u32 as u64;`
+- **Test gap**: zero unit tests.
+- **Note**: Canary's JIT does the same sign-extension but is rescued by x86's 32-bit-write zeroing the upper 32 of host registers. Pure interpreter has no such escape.
+
+### PPCBUG-035 — extshx writeback sign-extends to 64 bits
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:542
+- **Symptom**: `as i16 as i64 as u64` — same shape as PPCBUG-034 for halfwords.
+- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i16 as i32 as u32 as u64;`
+
+### PPCBUG-036 — extsbx CR0 coupling
+- **Severity**: MEDIUM (must land in same commit as PPCBUG-034)
+- **Status**: open
+- **Location**: interpreter.rs:538
+- **Symptom**: `update_cr_signed(0, ra as i64)` — currently latent because the unfixed sign-extended value's i64 sign matches bit 7 of the byte. After PPCBUG-034 lands, the truncated value's i64 view becomes always non-negative — CR0.LT will never fire for negative byte results.
+- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);` — must land with PPCBUG-034.
+
+### PPCBUG-037 — extshx CR0 coupling
+- **Severity**: MEDIUM (must land with PPCBUG-035)
+- **Status**: open
+- **Location**: interpreter.rs:543
+- **Symptom**: same coupling shape as PPCBUG-036 for halfwords.
+
+### PPCBUG-038 — extswx ISA-correct, document asymmetry
+- **Severity**: LOW (informational / wontfix)
+- **Status**: wontfix
+- **Location**: interpreter.rs:547
+- **Symptom**: `as i32 as i64 as u64` produces full 64-bit sign-extension. This IS the documented purpose of extsw — argument-register canonicalization in 64-bit mode. Behavior is intentional. After PPCBUG-034/035 land, document the asymmetry with extsb/extsh in a comment.
+
+### PPCBUG-039 — cntlzdx counts upper 32 always-zero bits in 32-bit ABI
+- **Severity**: LOW
+- **Status**: open (probably dead code in Xbox 360 binaries)
+- **Location**: interpreter.rs:556-562
+- **Symptom**: counts leading zeros in full 64. If a 32-bit-ABI binary emits cntlzd, the result is `32 + cntlzw(low32)` not `cntlzw(low32)`. ISA-correct for 64-bit mode; only matters if the binary actually emits it.
+- **Test gap**: zero tests.
+
+#### Clean opcodes from group 8
+
+- `cntlzwx` (interpreter.rs:551-555) — `(rs as u32).leading_zeros()` reads only low 32 bits, result range 0..=32, upper 32 zero. CR0 path benign because result is small. **Test gap only**, LOW.
+- `extswx` CR0 path is correct per ISA (PPCBUG-038 wontfix).
+
+## Batch 2 — shift (group 11) [renumbered]
+
+Per-group report: `audit-out/group-11-shift.md` (uses local IDs PPCBUG-050..055; tracker uses PPCBUG-040..045).
+
+### PPCBUG-040 — DECODER BUG: `sh64()` wrong bit order for sradi (HIGH)
+- **Severity**: HIGH (this is a decoder-level bug, file:line is in `decoder.rs` not `interpreter.rs`)
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `xenia-rs/crates/xenia-cpu/src/decoder.rs:91-93` (the `sh64()` accessor method on `DecodedInstr`)
+- **Symptom**: the XS-form `sradix` (sradi) shift amount is assembled as `SH[4:0] << 1 | SH[5]` instead of the correct `SH[5] << 5 | SH[4:0]`. **Every `sradi rA, rS, N` instruction where N is not 0 or 63 executes with a completely wrong shift count.** Example: `sradi rA, rS, 32` shifts by 1 instead. This is a silent, structural mis-decoding — none of the interpreter changes can paper over it.
+- **Cross-reference**: Canary's `(i.XS.SH5 << 5) | i.XS.SH` pattern is the correct ISA encoding.
+- **Fix**: in `decoder.rs:sh64()` body, swap the bit order:
+  ```rust
+  pub fn sh64(&self) -> u32 {
+      // SH5 is at bit 30 of the encoded word; SH[4:0] is at bits 16-20.
+      let sh_lo = extract_bits(self.raw, 16, 20);
+      let sh_hi = extract_bits(self.raw, 30, 30);
+      (sh_hi << 5) | sh_lo
+  }
+  ```
+- **Impact**: `sradi` is used by compilers for arithmetic right shifts on 64-bit values. In Xbox 360 32-bit-ABI binaries it should not be common, but it's emitted by some compilers for sign-magnitude conversions and 64-bit fixed-point arithmetic. **This is the kind of silent decoder bug the user explicitly wanted the audit to catch.**
+- **Test gap**: no decoder unit test pins `sh64()` for non-trivial SH values. Add fixture cases in `disasm_goldens.rs` for `sradi rA, rS, 1`, `sradi rA, rS, 32`, `sradi rA, rS, 63`.
+- **Note**: any other instruction that uses the same XS-form SH split-encoding is suspect. Phase C decoder audit must verify `sradi` and `sradix` are the only consumers of `sh64()`.
+
+### PPCBUG-041 — srawx writeback sign-extends to 64 bits
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:583, 588 (two writeback paths for the count<32 and count>=32 branches)
+- **Symptom**: `result as i64 as u64` violates the 32-bit-ABI zero-extension convention. A negative shifted value writes `0xFFFFFFFF_xxxxxxxx` instead of `0x00000000_xxxxxxxx`.
+- **Fix**: `result as u32 as u64` in both writeback paths.
+- **Note**: subagent verified the CA computation is **independently correct** — uses `(rs as u32) << (32 - sh) != 0` which is the canonical ISA shifted-out-bits test on 32-bit operands. **Do not change CA logic.**
+
+### PPCBUG-042 — srawix writeback sign-extends to 64 bits
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:600, 605 (same shape as PPCBUG-041 for srawi)
+- **Fix**: `result as u32 as u64`.
+
+### PPCBUG-043 — srawx / srawix CR0 coupling
+- **Severity**: MEDIUM (must land with PPCBUG-041 and PPCBUG-042)
+- **Status**: open
+- **Locations**: interpreter.rs:593, 607
+- **Symptom**: currently masked by the sign-extended writeback (sign-extension makes the 64-bit and 32-bit sign agree). After truncating the writeback, `as i64` will misread the sign for negative results.
+- **Fix**: `as u32 as i32 as i64` in both Rc=1 paths, applied with PPCBUG-041/042.
+
+### PPCBUG-044 — slwx / srwx CR0 misclassifies negative 32-bit results
+- **Severity**: LOW (zero-extended results have bit 31 set in low 32, but always positive in i64 view → CR0.LT never fires for slw/srw with bit-31-set results)
+- **Status**: open
+- **Locations**: interpreter.rs:568, 576
+- **Fix**: `as u32 as i32 as i64`.
+
+### PPCBUG-045 — Zero unit tests for any shift opcode
+- **Severity**: LOW (test gap only)
+- **Status**: open
+- **Locations**: interpreter.rs:563-658 (entire shift group: slwx, srwx, srawx, srawix, sldx, srdx, sradx, sradix)
+- **Recommendation**: add at least one functional test per opcode. Especially: `srawix r3, r3, 1` with rs=0xFFFFFFFE (CA should be 0), `srawix r3, r3, 1` with rs=0x80000001 (CA should be 1, result=0xC0000000); `sradix r3, r3, 32` (currently wrong per PPCBUG-040).
+
+#### Clean opcodes from group 11
+
+- `slwx` writeback at line 568 (zero-ext 32-bit result via `(rs as u32 << count) as u64`) — clean.
+- `srwx` writeback at line 576 — clean.
+- `sldx`, `srdx`, `sradx` — 64-bit ops, ISA-correct (probably dead in 32-bit-ABI binaries).
+- `sradix` body logic is structurally correct; failure is solely from PPCBUG-040 giving it a wrong shift count.
+
+## Batch 2 — doubleword rotate (group 10) [renumbered]
+
+Per-group report: `audit-out/group-10-dword-rotate.md` (uses local IDs PPCBUG-027/028; tracker uses PPCBUG-046/047).
+
+### PPCBUG-046 — DECODER BUG: wrong bit position for MB[5] in all 6 doubleword-rotate opcodes (HIGH)
+- **Severity**: HIGH (decoder-level; impacts the canonical zero-extend-to-32 idiom)
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Locations**: interpreter.rs — every arm of `rldiclx`, `rldicrx`, `rldicx`, `rldimix`, `rldclx`, `rldcrx` (lines 693-754)
+- **Symptom**: each arm computes `let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1)`. The bit at `(instr.raw >> 1) & 1` is **PPC bit 30**, which in MD form is `sh[0]` (the low bit of the shift amount) — NOT `mb[5]`. The high bit of the 6-bit MB field lives at PPC bit 26 = `(instr.raw >> 5) & 1`.
+
+  As written, the code computes `(mb[4:0] << 1) | sh[0]`. Ironically `disasm.rs:1256` (the `mb_md()` helper) has the correct formula. The interpreter was written independently with the wrong bit position — probably a copy-error from `sh64()` where bit 30 really is the split bit.
+- **Concrete impact**:
+  - `clrldi r3, r4, 32` is the canonical "zero-extend low 32 bits" idiom emitted constantly in 32-bit-ABI PPC code. Encoded as `rldicl r3, r4, 0, mb=32`. With mb=32, `mb[5]=1, mb[4:0]=0`. The interpreter decodes mb=0 → mask is all-ones → instruction becomes a no-op. Any downstream 64-bit compare (subfcx CA, cmpld) on that register sees a polluted 64-bit value instead of a clean 32-bit zero-extended one. **This is the same class of bug that caused the addis/BST incident.**
+  - For `rldcr` (MDS form), the XO field's LSB at bit 30 is always 1 (Rc=0 opcode), so `me[5]` is forcibly set to 1 for every non-record-form invocation — effectively adding 32 to all me values.
+- **Fix** (one line per opcode):
+  ```rust
+  // Replace in all 6 arms:
+  let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1);
+  // With:
+  let mb = instr.mb() | (((instr.raw >> 5) & 1) << 5);
+  ```
+  Or, cleaner: expose `mb_md()` (currently in disasm.rs:1256) as a method on `DecodedInstr` in `decoder.rs` and have the interpreter call `instr.mb_md()` — single source of truth for MD-form mb extraction.
+- **Test gap**: zero execution tests for any of the 6 opcodes; only disasm-golden string-output tests.
+- **Note**: this is the second decoder bug found by the audit (PPCBUG-040 / `sh64()` for `sradi` is the first). Phase C decoder audit must verify whether other MD/MDS/XS form accessors have similar bit-position errors.
+
+### PPCBUG-047 — Zero execution tests for any doubleword-rotate opcode
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Locations**: interpreter.rs:693-754 (all 6 opcodes)
+- **Recommendation**: at minimum, a `clrldi r3, r4, 32` test verifying the result is exactly the low 32 bits of r4. After PPCBUG-046 lands, this test would have caught the MB-reconstruction bug.
+
+#### What's correct in group 10
+
+- `sh64()` accessor — correctly reconstructs 6-bit shift from MD split encoding (cross-check: `disasm.rs` agrees).
+- `rld_mask_left()` / `rld_mask_right()` mask helpers — verified against Canary's XEMASK.
+- `rldicx`/`rldimix` mask formulas (`63 - sh` for right edge) — correct.
+- `rldimix` read-modify-write merge — correct 64-bit mask-insert.
+- CR0 `as i64` — correct here because these ARE genuine 64-bit ops (unlike word rotate).
+- `rldcl`/`rldcr` register-shift extraction (`gpr[rb] & 0x3F`) — correct.
+- No 32-bit writeback truncation needed: these are intentionally 64-bit; 32-bit-ABI compilers only emit them with masks that yield 32-bit-clean results.
+
+## Batch 3 — branch (group 13)
+
+Per-group report: `audit-out/group-13-branch.md`.
+
+Group 13 summary: the branch implementation is substantively correct. All BO/BI bit masks,
+CTR decrement-before-test ordering, AA absolute vs relative dispatch, LK unconditional write
+(including not-taken path in `bcx`), LR-read-before-LR-write atomicity in `bclrx`, and
+`get_cr_bit()` field indexing are all ISA-correct and match Canary. The only execution bugs
+are a latent 64-bit CTR zero-test (PPCBUG-053/054, active under current GPR-pollution environment)
+and severely thin test coverage (PPCBUG-055).
+
+### PPCBUG-053 — CTR zero-test uses 64-bit compare; should use 32-bit in `bcx`/`bclrx`
+- **Severity**: MEDIUM (effectively HIGH given unfixed PPCBUG-001..031 GPR pollution)
+- **Status**: open
+- **Locations**: `interpreter.rs:849` (`bcx` `ctr_ok`), `interpreter.rs:879` (`bclrx` `ctr_ok`)
+- **Symptom**: `ctx.ctr != 0` compares all 64 bits. In 32-bit ABI the CTR is logically 32-bit.
+  Canary explicitly truncates to 32 bits: `ctr = f.Truncate(ctr, INT32_TYPE)`. When CTR upper
+  32 bits are non-zero (due to upstream GPR pollution flowing through `mtspr CTR, rN`), the
+  64-bit test disagrees with the 32-bit ISA semantic. Most dangerous with `neg; mtctr; bdnz`:
+  `negx` (PPCBUG-006) always sets upper 32 bits, so the 32-bit CTR counter can reach zero
+  while the 64-bit CTR is still non-zero → infinite loop.
+- **Fix**:
+  ```rust
+  // Replace in both bcx and bclrx:
+  let ctr_ok = (bo & 0b00100) != 0
+      || (((ctx.ctr as u32) != 0) ^ ((bo & 0b00010) != 0));
+  ```
+  Or, alternatively, truncate at decrement:
+  ```rust
+  if bo & 0b00100 == 0 {
+      ctx.ctr = ctx.ctr.wrapping_sub(1) as u32 as u64;
+  }
+  ```
+- **Test gap**: zero tests for CTR-decrement branches (bdnz, bdz, bdnzt, bdnzf, bdzt, bdzf).
+
+### PPCBUG-054 — `mtspr CTR` writeback not truncated to 32 bits
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:1411`
+- **Symptom**: `crate::context::spr::CTR => ctx.ctr = val` writes the full 64-bit GPR to CTR.
+  Acts as a firewall gap: any upstream 64-bit GPR pollution flows directly into CTR, where it
+  will be tested by PPCBUG-053's 64-bit comparison. Defensive fix prevents CTR from ever
+  acquiring non-zero upper 32 bits independently of the GPR-pollution fix.
+- **Note**: the `bcctrx` branch-target read (`(ctx.ctr as u32) & !3`) already truncates
+  correctly; the bug is confined to the `ctr != 0` zero-test in `bcx`/`bclrx`.
+- **Fix**: `crate::context::spr::CTR => ctx.ctr = val as u32 as u64,`
+- **Cross-reference**: Group 16 (SPR/MSR) subagent should verify this write-point.
+
+### PPCBUG-055 — Severely inadequate test coverage for all four branch opcodes
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Locations**: `interpreter.rs` test module (lines 4455–4491)
+- **Current coverage**: `bx` forward (1 test), `bl` LR update (1 test), `bcx` taken beq (1 test via `test_cmp_and_bc`). Zero tests for: `bclrx`, `bcctrx`, any CTR-decrement variant, not-taken path, backward branch, AA=1 absolute, `bcl` LR-write-on-not-taken.
+- **Recommended minimum**: blr, bctr, bdnz (taken and not-taken at boundary CTR=1), bclrl old-LR-as-target, bcl LK-write-on-not-taken. See per-group report for concrete encoding patterns.
+
+---
+
+## Batch 3 — trap + system call (group 14)
+
+Per-group report: `audit-out/group-14-trap-sc.md`.
+
+Group 14 summary: the core trap evaluation (`trap.rs`) is correct — TO bit constants, signed/unsigned
+comparison dispatch, and word-vs-doubleword width handling are all ISA-conformant. The live interpreter
+arm properly evaluates the TO field (replacing the old unconditional-trap stub). Three MEDIUM issues
+found: PC ordering on trap return, missing LEV dispatch for `sc`, and the Xbox 360 typed-trap
+convention (`twi 31, r0, IMM`) not handled. Two LOW findings for stale manual snapshots and test gaps.
+
+### PPCBUG-063 — `ctx.pc` already at CIA+4 when `StepResult::Trap` returns
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:1543 (`ctx.pc += 4`) before interpreter.rs:1549 (`return StepResult::Trap`)
+- **Symptom**: any trap handler that reads `ctx.pc` to find the faulting instruction sees CIA+4 instead
+  of CIA. The existing `tracing::warn!` compensates with `.wrapping_sub(4)`, confirming the asymmetry.
+  On real hardware, SRR0 = CIA (trapping instruction address). Current risk LOW (no handler inspects
+  pc), but HIGH if any SEH/exception-delivery path is added (critical for the C++ throw investigation).
+- **Fix**: save CIA before incrementing, restore it when firing the trap:
+  ```rust
+  let trap_pc = ctx.pc;
+  ctx.pc += 4;
+  if fired { ctx.pc = trap_pc; return StepResult::Trap; }
+  ```
+  Alternatively store CIA in a separate `ctx.srr0`-equivalent field and leave `ctx.pc` at NIA.
+- **Note**: `sc` correctly leaves `ctx.pc` at NIA (the return address) — that is a different and
+  correct design choice. The inconsistency between sc and trap is the bug.
+
+### PPCBUG-064 — `sc` ignores `LEV` field; `sc 2` (HVcall) silently misdispatched
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:915-918
+- **Symptom**: `sc 2` (Xbox 360 hypervisor call) returns `StepResult::SystemCall` identically to
+  `sc 0`. Canary dispatches LEV=0 to `syscall_handler` and LEV=2 to `f.function()` (the HVcall
+  path). For pure game-title code (LEV=0 only) this is invisible; XDK kernel-mode components and
+  some HV-aware titles may use `sc 2`.
+- **Fix**: decode the 7-bit LEV field (bits 20-26 of SC-form encoding), add a `HypervisorCall`
+  variant to `StepResult`, and dispatch accordingly.
+
+### PPCBUG-065 — `twi 31, r0, IMM` typed-trap not handled; SIMM type code discarded
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: interpreter.rs:1532-1551 (trap arm)
+- **Symptom**: `twi 31, r0, IMM` (TO=31=unconditional, RA=r0) is used by the Xbox 360 CRT/kernel
+  to encode typed C++ exceptions — the 16-bit SIMM carries the exception type discriminator. xenia-rs
+  fires the trap correctly but discards SIMM. The caller sees a generic `StepResult::Trap` with no
+  type information, preventing correct C++ SEH dispatch.
+- **Canary reference**: `ppc_emit_control.cc:611-616` special-cases `RA==0 && TO==31` and calls
+  `f.Trap(type)` with the SIMM as the type code.
+- **Fix**: add a `trap_type: Option<u16>` payload to `StepResult::Trap`. Detect `twi` with `to()==31`
+  and `ra()==0` and populate it with `instr.simm16() as u16`.
+- **Note**: directly relevant to the Sylpheed `std::runtime_error` throw investigation
+  (project_xenia_rs_sylpheed_throw_2026_04_28.md) — the typed-trap SIMM carries the CRT exception
+  class that the kernel uses to route to the correct handler.
+
+### PPCBUG-066 — Stale frozen snapshots in ppc-manual for td/tdi/tw/twi
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `ppc-manual/branch/td.md`, `tdi.md`, `tw.md`, `twi.md`
+- **Symptom**: all four show the old unconditional-trap stub (`// For now, just trace and continue`)
+  instead of the current TO-field-evaluating implementation.
+- **Fix**: regenerate after PPCBUG-063 and PPCBUG-065 are resolved.
+
+### PPCBUG-067 — Test gaps for trap and sc
+- **Severity**: LOW
+- **Status**: open
+- **Location**: interpreter.rs `#[cfg(test)] mod tests`
+- **Missing coverage**: `sc` smoke test (fires SystemCall, advances PC); `td` vs `tw` on 64-bit-clean
+  operands (width discrimination); `tdi`/`td` signed/unsigned LT/GT conditions; `tw 31, r0, r0`
+  unconditional `trap` encoding; `twi 31, r0, N` typed-trap; negative simm16 in `twi`.
+
+---
+
+## Batch 3 — SPR / MSR / TB / FPSCR / VSCR moves (group 16)
+
+Per-group report: `audit-out/group-16-spr-msr.md`.
+
+Group 16 summary: the core paths are clean — `mfcr`, `mtcrf`, `mfspr`, `mtspr`, `mftb`, `mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`, `mfvscr`, `mtvscr` are all functionally ISA-correct. The `spr()` decoder accessor correctly inverts the PPC XFX half-swap encoding. The one MEDIUM finding is `mtmsrd` silently ignoring the `L=1` partial-MSR-write semantics. Five LOW test-gap findings cover near-total absence of unit tests for this entire group.
+
+### PPCBUG-078 — `mtmsrd` L=1 partial-MSR-write not modelled
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:1458-1461`
+- **Symptom**: xenia-rs merges `mtmsr` and `mtmsrd` into a single body that unconditionally writes `ctx.msr = ctx.gpr[instr.rs()]`. PowerISA specifies that `mtmsrd` with instruction bit 15 (`L`) = 1 performs a partial update: only `MSR[EE]` (u64 bit 15) and `MSR[RI]` (u64 bit 0) are modified; all other MSR bits preserved. Kernel code using `mtmsrd L=1` to re-enable external interrupts silently corrupts the entire MSR in xenia-rs. Canary acknowledges the same TODO.
+- **Fix**:
+  ```rust
+  PpcOpcode::mtmsrd => {
+      let l = (instr.raw >> (31 - 15)) & 1;
+      if l == 1 {
+          let mask: u64 = (1u64 << 15) | 1u64;
+          let rs = ctx.gpr[instr.rs()];
+          ctx.msr = (ctx.msr & !mask) | (rs & mask);
+      } else {
+          ctx.msr = ctx.gpr[instr.rs()];
+      }
+      ctx.pc += 4;
+  }
+  ```
+- **Test gap**: zero tests for `mtmsr` or `mtmsrd`.
+
+### PPCBUG-079 — `mtspr` silent drop of unknown-SPR writes without value logging
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:1430-1433`
+- **Symptom**: Unknown SPR writes are silently discarded with only a `tracing::warn!()` that omits the value being written. Reduces debuggability; no correctness impact for known Xbox 360 titles.
+- **Fix** (optional): `tracing::warn!("mtspr: unimplemented SPR {} <= 0x{:016x}", spr, val)`.
+
+### PPCBUG-080 — `mfvscr` does not zero the upper 96 bits of VD per ISA
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:2198-2201`
+- **Symptom**: ISA requires `mfvscr VD` to place VSCR in the rightmost word of VD and zero bytes 0-11. xenia-rs copies the full 128-bit `ctx.vscr` into `ctx.vr[VD]`, leaving stale data in bytes 0-11 if `ctx.vscr` was populated from a non-zeroed vector. Canary explicitly zero-extends.
+- **Fix**:
+  ```rust
+  PpcOpcode::mfvscr => {
+      let vscr_word = ctx.vscr.as_u32x4()[3];
+      ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array([0, 0, 0, vscr_word]);
+      ctx.pc += 4;
+  }
+  ```
+
+### PPCBUG-081 — Zero unit tests for `mfcr` / `mtcrf`
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:1436-1453`
+- **Recommended additions**: full mfcr round-trip; `mtcrf 0xFF`; `mtcrf 0x80` (CR0 only); `mtcrf 0x38` (ABI CR2|CR3|CR4 restore).
+
+### PPCBUG-082 — Minimal unit tests for `mfspr` / `mtspr`
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:1376-1435`
+- **Note**: only DEC and TBL_WRITE covered; add LR, CTR, XER, TBL/TBU, VRSAVE.
+
+### PPCBUG-083 — Zero unit tests for `mftb`
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:1462-1470`
+
+### PPCBUG-084 — Zero interpreter-level round-trip tests for FPSCR move instructions
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:2678-2720`
+- **Note**: `fpscr.rs` helper-level tests exist; interpreter dispatch (`mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`) is untested end-to-end.
+
+### PPCBUG-085 — Zero unit tests for `mfvscr` / `mtvscr`
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:2198-2205`
+
+IDs PPCBUG-086 and PPCBUG-087 are unallocated — reserved for group 16 follow-up findings.
+
+---
+
+## Batch 3 — cache + sync (group 17)
+
+Per-group report: `audit-out/group-17-cache-sync.md`.
+
+Group 17 summary: the cleanest group audited so far. Both `dcbz` and `dcbz128` have correct EA computation (ra=0 special case, 64-bit→u32 truncation, alignment masks `& !31` / `& !127`, byte counts 32/128). The nine no-op opcodes (dcbf, dcbi, dcbst, dcbt, dcbtst, icbi, sync, eieio, isync) are all listed in one arm and complete. The `dcbz128` Xbox 360 specific opcode (RT=1 bit distinguishes from dcbz) dispatches correctly. **0 HIGH, 0 MEDIUM, 2 LOW** findings.
+
+### PPCBUG-088 — sync disasm ignores L field; `lwsync` (L=1) shows as "sync"
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `xenia-rs/crates/xenia-cpu/src/disasm.rs:364`
+- **Symptom**: The `PpcOpcode::sync` disasm arm outputs `"sync"` unconditionally regardless of the L field (PPC bit 10). When L=1 (word `0x7C2004AC`), the instruction should disassemble as `"lwsync"`. The `extended_mnemonics.json` golden already accepts `"sync"` as output for the lwsync case, meaning the test currently passes with the wrong string.
+- **Impact**: Disassembly output for `lwsync` (very common in Xbox 360 acquire-barrier idioms) shows as `sync`. No interpreter impact; both L=0 and L=1 are correctly treated as no-op PC advance.
+- **Fix**:
+  ```rust
+  PpcOpcode::sync => {
+      // L field at PPC bit 10
+      if extract_bits(instr.raw, 10, 10) == 1 {
+          base("lwsync", String::new(), 0)
+      } else {
+          base("sync", String::new(), 0)
+      }
+  }
+  ```
+  Update `extended_mnemonics.json` golden to add `"ext_mnemonic": "lwsync"` for that entry.
+
+### PPCBUG-089 — Zero interpreter execution tests for group 17
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `xenia-rs/crates/xenia-cpu/src/interpreter.rs` (test module)
+- **Symptom**: No `#[test]` covers `dcbz`, `dcbz128`, or any no-op (sync/isync/eieio/dcbf/icbi). A regression in dcbz byte count or alignment would go undetected.
+- **Recommended additions**: `dcbz` with misaligned address (verifies 32-byte aligned zero), `dcbz128` with misaligned address (verifies 128-byte aligned zero), both ra=0 and ra!=0 cases, `sync`/`isync`/`dcbf` no-op PC-advance smoke tests.
+
+---
+
+## Batch 3 — CR logical + CR moves (group 15)
+
+Per-group report: `audit-out/group-15-cr-logical.md`.
+
+Group 15 summary: **cleanest group audited to date**. All 8 CR logical ops (`crand`, `crandc`,
+`creqv`, `crnand`, `crnor`, `cror`, `crorc`, `crxor`), `mcrf`, and `mcrxr` are ISA-correct.
+The `cr_logical` helper's use of `fn(bool, bool) -> bool` prevents the `!u64` bit-pollution class
+(PPCBUG-028–031 in group 7). CR bit indexing in `get_cr_bit`/`set_cr_bit` is correct (bit/4 =
+field, bit%4 = within-field sub-index matching PPC MSB-0 numbering, with sub `{0=LT, 1=GT, 2=EQ,
+3=SO}`). `mcrxr` correctly maps XER{SO,OV,CA} to CR{LT,GT,EQ} with SO=false and unconditionally
+clears the XER bits. `mcrfs` nibble extraction, field shift formula (`28 - crfs*4`), and
+CLEARABLE_MASK (all 14 ISA-clearable exception bits, no FEX/VX) are all correct. One MEDIUM ISA
+violation: `mcrfs` omits VX summary recomputation. Two LOW findings: a misleading test comment and
+zero coverage for all 8 CR logical ops + `mcrf`.
+
+### PPCBUG-068 — `mcrfs` does not recompute VX summary bit after clearing VX* exception bits
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:4250` (`ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK)`)
+- **Symptom**: When `mcrfs` clears VX* exception bits (VXSNAN, VXISI, VXIDI, VXZDZ, VXIMZ,
+  VXVC, VXSOFT, VXSQRT, VXCVI) from any source field, the VX summary bit (FPSCR[2], `fpscr::VX
+  = 1<<29`) is left stale. If those VX* bits were the only contributors to VX, it should become
+  0 but remains 1. A subsequent `mcrfs cr0, 0` will then report VX=1 in CR0.EQ, misleading the
+  caller into thinking an invalid-operation exception is still active.
+- **Fix**:
+  ```rust
+  // After ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK); add:
+  if (ctx.fpscr & fpscr::VX_ALL) != 0 {
+      ctx.fpscr |= fpscr::VX;
+  } else {
+      ctx.fpscr &= !fpscr::VX;
+  }
+  // FEX recomputation omitted — xenia doesn't model enabled-exception dispatch.
+  ```
+- **Test gap**: existing test only covers crfS=0 (FX+OX) — no VX* bits involved. Add a test
+  that sets only VXSNAN, runs `mcrfs cr0, 1`, then verifies VX is now 0.
+
+### PPCBUG-069 — `mcrfs` test comment claims OX(so)=0 but OX is set in the test
+
+- **Severity**: LOW (cosmetic; the assert is correct, only the comment is wrong)
+- **Status**: open
+- **Location**: `interpreter.rs:5402`
+- **Symptom**: Comment reads `"FX(lt)=1 and OX(so)=0"`. FPSCR was set to `(1<<31)|(1<<28)`,
+  which sets both FX and OX. The nibble is `0b1001`, so `so=true`. The assert `cr[2].as_u8()
+  == 0b1001` is correct; only the comment is wrong.
+- **Fix**: `// FX(lt)=1, FEX(gt)=0, VX(eq)=0, OX(so)=1 → 0b1001 = 9`
+
+### PPCBUG-070 — Zero execution tests for all 8 CR logical ops and `mcrf`
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Locations**: `interpreter.rs:1473–1484`
+- **Missing minimum**: `crclr` idiom (`crxor BT,BT,BT`, BT=1 → 0), `crset` idiom
+  (`creqv BT,BT,BT`, BT=0 → 1), `crmove` idiom (`cror BT,BA,BA`), `crnot` idiom
+  (`crnor BT,BA,BA`, BA=1 → 0), cross-field `crand`/`crandc`, and a full `mcrf
+  cr0, cr3` field-copy + source-field-intact test.
+
+---
+
+## Pre-pass hints REFUTED by audit
+
+These were flagged by the orchestrator's regex scan but the subagents found them to be safe:
+
+- **`divwux` writeback** (interpreter.rs:390) — both operands cast to `u32` before division, `as u64` zero-extends correctly. **Clean.**
+- **`mulhwx` intermediate cast** (interpreter.rs:349) — `((result >> 32) as i32 as i64 as u64) & 0xFFFF_FFFF` is redundant but the trailing mask saves correctness. Cosmetic only.
+- **`mulhwux` writeback** (interpreter.rs:359) — `(result >> 32) & 0xFFFF_FFFF` clean unsigned. Clean.
+- **CR0 stale-prepass-claim**: pre-pass document mentioned `result as i32 as i64`; live code actually uses `result as i64` — so the *claim that the live form is i64* is **correct**, but the prepass implied an i32 form was already there. PPCBUG-020 is the real finding.
+
+---
+
+## Batch 4 — load float (group 23)
+
+Per-group report: `audit-out/group-23-load-float.md`.
+
+Group 23 summary: the double-precision load family (`lfd`, `lfdu`, `lfdux`, `lfdx`) is fully
+ISA-correct — EA computation, endianness, update-form writeback, and bit-pattern fidelity are
+all clean. The single-precision family (`lfs`, `lfsu`, `lfsux`, `lfsx`) has one HIGH bug:
+Rust's `as f64` float cast compiles to x86 `CVTSS2SD` which unconditionally sets the IEEE quiet
+bit in the output, silently converting f32 SNaN loads to f64 QNaN. The ISA requires the SNaN
+to pass through unchanged. FPSCR.NI does not apply to loads (correct by omission). One LOW
+test-gap finding. **2 IDs used (PPCBUG-128, PPCBUG-129). 8 IDs unallocated (PPCBUG-130..137).**
+
+### PPCBUG-128 — lfs/lfsu/lfsx/lfsux silently quieten SNaN via `as f64` Rust float cast
+
+- **Severity**: HIGH
+- **Status**: open
+- **Locations**: interpreter.rs:1064 (lfs), 1070 (lfsx), 1087 (lfsu), 1093 (lfsux)
+- **Symptom**: All four single-precision load arms use `mem.read_f32(ea) as f64` where
+  `read_f32` = `f32::from_bits(read_u32(ea))`. The `as f64` Rust float cast compiles to x86
+  `CVTSS2SD`, which unconditionally sets bit 51 of the f64 mantissa (the IEEE quiet/signalling
+  discriminator bit) for any NaN input. An f32 SNaN (e.g. `0x7F800001`) is loaded and written
+  to the FPR as the f64 QNaN `0x7FF8000002000000` instead of the SNaN `0x7FF0000002000000`.
+
+  **ISA requirement**: "A signalling NaN passes through unchanged into the FPR — it will signal
+  at the next FP arithmetic instruction." (lfs.md Special Cases). The FPR must hold the SNaN;
+  VXSNAN fires at the consuming arithmetic op, not at the load.
+
+  **Impact**: (a) Game code storing f32 SNaN sentinels (physics engines mark unset float slots
+  with SNaN) and then loading+inspecting them: `fpscr::is_snan(ctx.fpr[rd])` returns false
+  after the load, breaking sentinel detection. (b) Arithmetic ops consuming the loaded value
+  see a QNaN rather than SNaN, so VXSNAN is never set; games relying on VXSNAN to detect
+  uninitialized-read bugs get false negatives.
+
+- **Canary parity**: Canary's JIT also uses CVTSS2SD via `f.Convert()`. Both emulators share
+  this deviation. The bug is a structural consequence of using semantic float widening rather
+  than a bit-pattern-preserving widening routine.
+- **Fix**: replace the float cast with a bit-manipulation widening that preserves the SNaN bit:
+  ```rust
+  fn widen_f32_bits_to_f64(raw32: u32) -> u64 {
+      let sign = ((raw32 >> 31) as u64) << 63;
+      let exp32 = ((raw32 >> 23) & 0xFF) as i32;
+      let mant32 = (raw32 & 0x007F_FFFF) as u64;
+      if exp32 == 0xFF {
+          // NaN or Infinity — propagate mantissa left-shifted by 29 bits.
+          // SNaN (bit22=0) stays SNaN (bit51=0); QNaN (bit22=1) stays QNaN (bit51=1).
+          sign | (0x7FFu64 << 52) | (mant32 << 29)
+      } else if exp32 == 0 {
+          // ±Zero or subnormal f32.
+          if mant32 == 0 { return sign; } // ±zero
+          // Subnormal: normalize by finding leading bit, then adjust exponent.
+          let shift = mant32.leading_zeros() - (64 - 23);
+          let exp64 = (1023u64 - 126).wrapping_sub(shift as u64);
+          let mant64 = (mant32 << (shift + 1 + 29)) & 0x000F_FFFF_FFFF_FFFF;
+          sign | (exp64 << 52) | mant64
+      } else {
+          // Normal f32 → normal f64.
+          let exp64 = (exp32 as u64) - 127 + 1023;
+          sign | (exp64 << 52) | (mant32 << 29)
+      }
+  }
+  // In each lfs* arm:
+  ctx.fpr[instr.rd()] = f64::from_bits(widen_f32_bits_to_f64(mem.read_u32(ea)));
+  ```
+  This function also correctly handles subnormal f32 → normal f64 widening (which the `as f64`
+  cast already gets right numerically, but now goes through a consistent code path).
+- **Test gap**: add a test loading an f32 SNaN (`0x7F800001`) via `lfs` and asserting
+  `fpscr::is_snan(ctx.fpr[rd])` is `true` and bit 51 of `ctx.fpr[rd].to_bits()` is 0.
+
+### PPCBUG-129 — Zero interpreter execution tests for all 8 float-load opcodes
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Locations**: interpreter.rs test module; `tests/disasm_goldens.rs:249-250` (disasm-only)
+- **Symptom**: No `#[test]`-decorated function exercises any float-load interpreter arm.
+  A regression in EA computation, endianness, f32→f64 widening, or update-form writeback
+  would go undetected. The SNaN bug (PPCBUG-128) was undetected partly due to this gap.
+- **Recommended minimum**:
+  1. `lfs` normal: `0x3F800000` (1.0f32) → assert `fpr[rd] == 1.0f64` exact.
+  2. `lfs` negative displacement: base minus 4.
+  3. `lfs` ra=0 path (absolute addressing).
+  4. `lfd` normal: store PI bits, assert exact bit equality via `.to_bits()`.
+  5. `lfd` SNaN: store `0x7FF0_0000_0000_0001u64`, assert exact bit equality after load.
+  6. `lfsu` / `lfsux` / `lfdu` / `lfdux`: verify loaded FPR value AND rA update address.
+  7. After PPCBUG-128 fix: `lfs` SNaN round-trip test.
+
+IDs PPCBUG-130 through PPCBUG-137 are unallocated — no further bugs found in group 23.
+
+---
+
+## Files modified by the audit
+
+- `xenia-rs/audit-prepass-findings.md` — Phase A pre-pass red flags (orchestrator regex output).
+- `xenia-rs/audit-out/group-01-add-imm.md` — Group 1 report (Sonnet subagent).
+- `xenia-rs/audit-out/group-02-add-reg.md` — Group 2 report.
+- `xenia-rs/audit-out/group-03-sub-reg.md` — Group 3 report.
+- `xenia-rs/audit-out/group-04-multiply.md` — Group 4 report.
+- `xenia-rs/audit-out/group-05-divide.md` — Group 5 report.
+- `xenia-rs/audit-out/group-06-logic-imm.md` — Group 6 report.
+- `xenia-rs/audit-out/group-09-word-rotate.md` — Group 9 report.
+- `xenia-rs/audit-out/group-13-branch.md` — Group 13 report.
+- `xenia-rs/audit-out/group-14-trap-sc.md` — Group 14 report.
+- `xenia-rs/audit-out/group-15-cr-logical.md` — Group 15 report.
+- `xenia-rs/audit-out/group-16-spr-msr.md` — Group 16 report.
+- `xenia-rs/audit-out/group-17-cache-sync.md` — Group 17 report.
+- `xenia-rs/audit-out/group-18-load-byte.md` — Group 18 report.
+- `xenia-rs/audit-out/group-19-load-halfword.md` — Group 19 report.
+- `xenia-rs/audit-out/group-21-load-doubleword.md` — Group 21 report.
+- `xenia-rs/audit-out/group-22-load-mlsr.md` — Group 22 report.
+- `xenia-rs/audit-out/group-23-load-float.md` — Group 23 report.
+- `xenia-rs/audit-out/group-24-store-byte-half.md` — Group 24 report.
+- `xenia-rs/audit-out/group-26-store-doubleword.md` — Group 26 report.
+- `xenia-rs/audit-findings.md` — this consolidated tracker.
+
+**No source code under `xenia-rs/crates/` has been modified.**
+
+---
+
+## Batch 4 — load byte (group 18)
+
+Per-group report: `audit-out/group-18-load-byte.md`.
+
+Group 18 summary: **cleanest group audited to date — zero HIGH or MEDIUM bugs.** All four opcodes
+(`lbz`, `lbzu`, `lbzx`, `lbzux`) are ISA-correct: EA computation (rA=0 special case, D-field
+sign-extension, 32-bit EA truncation), zero-extension of the byte result to 64 bits, and
+update-form writeback all match the ISA spec and Canary cross-reference. Two LOW findings only.
+
+### PPCBUG-090 — lbzu/lbzux: rD==rA "invalid form" silently misloads rD
+
+- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
+- **Status**: open
+- **Location**: interpreter.rs:951-956 (lbzu), 963-968 (lbzux)
+- **Symptom**: When `rD == rA` (invalid form, UISA undefined), the byte load into `gpr[rD]` at
+  line 953/965 is immediately overwritten by the EA writeback at line 954/966. Net result:
+  `gpr[rD]` holds the EA, not the loaded byte. Canary has the same behaviour. No practical impact
+  under normal compiler output.
+- **Recommendation**: add `debug_assert!(instr.rd() != instr.ra())` in debug builds.
+
+### PPCBUG-091 — Zero interpreter execution tests for all four lbz* opcodes
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs test module; disasm_goldens.rs:247 (disasm-only, no execution)
+- **Symptom**: No `#[test]` exercises lines 945-968. A regression in EA computation,
+  zero-extension, or the update writeback would go undetected.
+- **Recommended minimum**: `lbz` with ra=0 + negative displacement; `lbzu` normal case (verify
+  both byte result and rA update); `lbzx` with ra=0; `lbzux` normal case. Each test should
+  assert `gpr[rD] <= 0xFF` to catch any future accidental sign-extension.
+
+IDs PPCBUG-092, PPCBUG-093, PPCBUG-094 are unallocated — no further bugs found in group 18.
+
+---
+
+## Batch 4 — load halfword (group 19)
+
+Per-group report: `audit-out/group-19-load-halfword.md`.
+
+Group 19 summary: **4 HIGH bugs confirmed — all pre-pass flags validated.** The four `lha*` opcodes
+(`lha`, `lhax`, `lhau`, `lhaux`) all use `as i16 as i64 as u64`, sign-extending a negative halfword
+to 64 bits in violation of the 32-bit ABI. Every negative halfword load (common for `int16_t` PCM
+samples, packed vertex deltas, `short[]` arrays) actively poisons the upper 32 bits of the
+destination GPR — identical shape to the `addis` bug. The four `lhz*` opcodes and `lhbrx` are all
+clean (`as u64` zero-extension; `swap_bytes() as u64` byte-reversal; correct endian handling; correct
+EA computation and update writebacks). Two LOW findings: rD==rA invalid-form in update variants,
+and zero unit tests for all nine opcodes.
+
+### PPCBUG-095 — `lha`: GPR writeback sign-extends to 64 bits
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:990
+- **Symptom**: `mem.read_u16(ea) as i16 as i64 as u64` — memory `0x8000` writes
+  `0xFFFFFFFF_FFFF8000` instead of `0x00000000_FFFF8000`. Active GPR poisoning for every
+  negative halfword. Common trigger: `int16_t` struct fields, PCM samples, packed vertex deltas.
+- **Fix**:
+  ```rust
+  ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64;
+  ```
+- **Test gap**: zero unit tests. Add: memory `0x8000` → `gpr[rD] == 0x00000000_FFFF8000`;
+  memory `0x7FFF` → `gpr[rD] == 0x00000000_00007FFF`.
+
+### PPCBUG-096 — `lhax`: GPR writeback sign-extends to 64 bits
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:996
+- **Symptom**: identical to PPCBUG-095. Indexed form emitted for array access with GPR index.
+- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
+- **Test gap**: zero unit tests.
+
+### PPCBUG-097 — `lhau`: GPR writeback sign-extends to 64 bits
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:1007
+- **Symptom**: identical to PPCBUG-095. Update form emitted for auto-incrementing `short[]` loops;
+  poison accumulates across all iterations.
+- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
+- **Test gap**: zero unit tests. Add: verify both `gpr[rD]` (upper-32 = 0) and `gpr[rA]` (EA update).
+
+### PPCBUG-098 — `lhaux`: GPR writeback sign-extends to 64 bits
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:1013
+- **Symptom**: identical to PPCBUG-095, update+indexed form.
+- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
+- **Test gap**: zero unit tests.
+- **Note**: PPCBUG-095..098 are the same one-line fix at four sites. Fix session sweep:
+  `rg -n 'as i16 as i64 as u64' interpreter.rs` finds exactly these four lines.
+
+### PPCBUG-099 — `lhau`/`lhaux`: rD==rA invalid-form silently destroys load result
+- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
+- **Status**: open
+- **Location**: interpreter.rs:1005-1016
+- **Symptom**: same as PPCBUG-090 (`lbzu`/`lbzux`) — EA writeback overwrites `gpr[rD]` when
+  `rD == rA`. Net: `gpr[rD]` holds EA, not the loaded value.
+- **Recommendation**: `debug_assert!(instr.rd() != instr.ra())` in both arms.
+
+### PPCBUG-100 — Zero execution tests for all nine halfword-load opcodes
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs test module
+- **Symptom**: No `#[test]` exercises any of the 9 opcodes. The HIGH sign-extension bug would
+  have been caught by any test that checks `gpr[rD] <= 0x0000_0000_FFFF_FFFF`.
+- **Recommended minimum**: `lha` with negative halfword (assert upper 32 zero), `lhz` same,
+  `lhau` verify both rD and rA, `lhzux` verify both rD and rA, `lhbrx` verify byte-swap.
+
+IDs PPCBUG-101, PPCBUG-102, PPCBUG-103, PPCBUG-104 are unallocated — no further bugs found in group 19.
+
+---
+
+## Batch 4 — load word (group 20)
+
+Per-group report: `audit-out/group-20-load-word.md`.
+
+Group 20 summary: **1 HIGH bug (reservation invalidation never called), 1 MEDIUM (cross-thread
+reservation isolation), 1 MEDIUM (lwa 64-bit sign-extension hazard), 3 LOW test gaps.** The
+zero-extending family (`lwz`/`lwzu`/`lwzx`/`lwzux`) is entirely correct — `mem.read_u32(ea) as u64`
+cleanly zero-extends; EA computation, update writebacks, and RA0 handling all match ISA and Canary.
+`lwbrx` is correct: the double-swap (`from_be_bytes` then `swap_bytes()`) correctly produces a
+little-endian word read, zero-extended. The sign-extending family (`lwa`/`lwax`/`lwaux`) is
+ISA-correct for 64-bit mode but a 32-bit-ABI hazard — classified MEDIUM because `lwa` is a
+64-bit-mode instruction unlikely to appear in Xbox 360 32-bit-ABI binaries. The HIGH finding is
+that `ReservationTable::invalidate_for_write` is defined and unit-tested but **never called** from
+any store instruction, breaking multi-threaded `lwarx`/`stwcx.` atomicity under `--parallel`.
+
+### PPCBUG-105 — lwa / lwax / lwaux sign-extend to 64 bits; 32-bit-ABI hazard
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:1032 (lwa), 1038 (lwax), 1043 (lwaux)
+- **Symptom**: `mem.read_u32(ea) as i32 as i64 as u64` — a word with high bit set (e.g. `0x8000_0000`)
+  writes `0xFFFF_FFFF_8000_0000` to rD. ISA-correct for 64-bit-mode `lwa`. In 32-bit ABI, the poisoned
+  upper 32 bits produce wrong CA / CR results in downstream 64-bit unsigned compares — same shape as
+  the `addis` bug.
+- **Likelihood**: LOW on real Xbox 360 32-bit-ABI binaries (compilers use `lwz` for word loads; `lwa`
+  is a 64-bit-mode instruction). Risk elevated if the binary contains 64-bit-mode kernel code.
+- **Note**: Canary also uses `SignExtend(..., INT64_TYPE)` — both are ISA-correct. Pre-pass flagged
+  HIGH; audit downgrades to MEDIUM because `lwa` is unlikely in 32-bit-ABI Xbox 360 code.
+
+### PPCBUG-106 — lwa no-update-form undocumented (LOW / informational)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: interpreter.rs:1029-1034
+- **Symptom**: `lwa` arm has no RA writeback. Correct per ISA (no `lwau` in PowerISA). Undocumented.
+- **Fix**: add comment `// No lwau in PowerISA; lwa is DS-form non-update only.`
+
+### PPCBUG-107 — `invalidate_for_write` never called from stores; lwarx/stwcx. atomicity broken under `--parallel` (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Locations**: `reservation.rs:234` (definition, never called from interpreter); `interpreter.rs:1182-1278` (all store arms, none call it)
+- **Symptom**: `ReservationTable::invalidate_for_write(addr)` is defined and correctly unit-tested but
+  no interpreter store arm calls it. Under M3 `--parallel` with the table enabled, a plain `stw` by
+  thread B to a cache line reserved by thread A does NOT clear thread A's table slot. Thread A's
+  subsequent `stwcx.` calls `t.try_commit()`, which succeeds — spurious success, violating
+  store-conditional atomicity. All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic
+  counters) built on `lwarx`/`stwcx.` are broken in multi-threaded mode.
+- **Concrete scenario**: thread A: `lwarx r3, 0, r4` (reserves line). Thread B: `stw r5, 0(r4)`
+  (same address; should invalidate). Thread A: `stwcx. r6, 0, r4` → should fail (CR0.EQ=0) but
+  succeeds (CR0.EQ=1). Thread A's store silently overwrites thread B's store.
+- **Fix**: in every store arm, before `mem.write_*`, add:
+  ```rust
+  if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
+      if t.has_active_reservers() { t.invalidate_for_write(ea); }
+  }
+  ```
+  `has_active_reservers()` is a single `Relaxed` atomic load — negligible cost for non-atomic code
+  (common case returns false immediately). Alternative: inject the table into the memory layer so
+  `write_u32`/`write_u64` call it automatically.
+- **Test gap**: add interpreter-level test: `lwarx` reserve a line, intervening `stw` to the same
+  line, `stwcx.` must fail (CR0.EQ=0).
+
+### PPCBUG-108 — Legacy per-ctx reservation path: cross-thread invalidation impossible (MEDIUM)
+
+- **Severity**: MEDIUM
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Location**: interpreter.rs:1148-1153 (stwcx legacy path)
+- **Symptom**: When table is None/disabled, reservation state lives in per-thread `PpcContext` fields.
+  A store by thread B cannot clear `ctx_A.has_reservation`. Safe in strict lockstep (one host thread).
+  Broken under real parallelism with the table inadvertently disabled.
+- **Fix**: add a `debug_assert!` in `lwarx`/`stwcx.` that table is enabled when multiple host threads
+  are active. The M3 scheduler should always enable the table before spawning a second host thread.
+
+### PPCBUG-109 — Zero unit tests for lwa / lwax / lwaux
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs test module
+- **Recommended minimum**:
+  - `lwa` with `0x8000_0000` → `gpr[rD] == 0xFFFF_FFFF_8000_0000`.
+  - `lwa` with `0x7FFF_FFFF` → `gpr[rD] == 0x0000_0000_7FFF_FFFF`.
+  - `lwax` with ra=0.
+  - `lwaux`: verify loaded value and rA update.
+
+### PPCBUG-110 — Zero unit tests for lwbrx
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs test module
+- **Recommended minimum**: memory `[0x11, 0x22, 0x33, 0x44]` at EA → `gpr[rD] == 0x4433_2211`; ra=0;
+  assert `gpr[rD] <= 0xFFFF_FFFF`.
+
+### PPCBUG-111 — lwarx / stwcx test suite missing key cases
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs:5167-5207 (two tests exist)
+- **Missing**: `lwarx` ra=0; `stwcx.` without prior `lwarx` → CR0.EQ=0; second `lwarx` displaces
+  first; post-PPCBUG-107-fix store-invalidation test; `lwarx` zero-extension assertion.
+
+IDs PPCBUG-112, PPCBUG-113, PPCBUG-114 are unallocated — reserved for group 20 follow-up.
+
+---
+
+## Batch 4 — load doubleword (group 21)
+
+Per-group report: `audit-out/group-21-load-doubleword.md`.
+
+Group 21 summary: **cleanest load group audited — zero HIGH bugs.** All six instructions (`ld`,
+`ldu`, `ldux`, `ldx`, `ldbrx`, `ldarx`) are ISA-correct: 64-bit load, big-endian byte order,
+EA computation (RA=0, DS-form, u32 truncation), update-form writebacks, and reservation tracking
+all pass scrutiny against Canary and the ISA spec. `ldbrx`'s double-swap pattern was investigated
+and confirmed correct (PPCBUG-115 informational). One MEDIUM documentation finding, two LOW findings.
+
+### PPCBUG-115 — `ldbrx` byte-swap confirmed correct (informational)
+
+- **Severity**: LOW (confirmed clean, informational only)
+- **Status**: wontfix
+- **Location**: `interpreter.rs:4157-4159`
+- **Analysis**: `mem.read_u64` uses `u64::from_be_bytes` internally (confirmed in `heap.rs:404`
+  and interpreter's `TestMem`), so it returns the BE-decoded value. Calling `.swap_bytes()`
+  re-reverses to give the LE interpretation, which is exactly what `ldbrx` specifies. Canary
+  achieves the same result by skipping `ByteSwap` at the HIR level. Both approaches are correct.
+  See per-group report for full byte-level worked example.
+
+### PPCBUG-116 — `ld`/`ldx`/`ldu`/`ldux` as 32-bit-ABI poison sources (documentation)
+
+- **Severity**: MEDIUM (awareness/documentation; no change to load instructions themselves)
+- **Status**: open
+- **Location**: `interpreter.rs:1017-1058`
+- **Symptom**: These instructions correctly write full 64-bit values to the destination GPR.
+  Xbox 360 32-bit-ABI binaries legitimately emit them for TOC loads, vtable loads, and kernel
+  structure accesses — all of which may have non-zero upper 32 bits. Until PPCBUG-001..089
+  arithmetic truncation fixes land, such values can flow into 64-bit compares and corrupt CA
+  bits and CR fields — the inverse of the `addis` bug (pollution from memory side vs. sign-ext).
+- **Key guard already in place**: PPCBUG-007's `subfcx` CA fix truncates operands to u32 before
+  the compare, correctly handling `ld`-originated 64-bit values. This is the most critical
+  downstream consumer and the fix is already specified.
+
+### PPCBUG-117 — Stale frozen snapshot in `ppc-manual/memory/ldarx.md`
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `ppc-manual/memory/ldarx.md` (frozen snapshot section)
+- **Symptom**: Snapshot uses old field name `ctx.reserved_addr`; live code uses
+  `ctx.reserved_line = ea & !RESERVATION_MASK` (M3 refactor). Cosmetic only.
+- **Fix**: Regenerate snapshot after M3 field names settle.
+
+### PPCBUG-118 — Zero functional tests for `ld`, `ldx`, `ldu`, `ldux`, `ldbrx`
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` test module
+- **Symptom**: `test_ldarx_stdcx_pair` covers `ldarx`/`stdcx` only. Five doubleword load
+  variants are untested. Recommended minimum: `ld` with positive DS, negative DS, and RA=0;
+  `ldx` basic; `ldu` with RA writeback check; `ldux` with RA writeback check; `ldbrx` with
+  asymmetric data to distinguish output from plain `ldx`.
+
+IDs PPCBUG-119 through PPCBUG-122 are unallocated — reserved for group 21 follow-up.
+
+---
+
+## Batch 4 — load multiple/string (group 22)
+
+Per-group report: `audit-out/group-22-load-mlsr.md`.
+
+Group 22 summary: one structural HIGH bug (`lswx` is always a no-op due to missing XER TBC field),
+one MEDIUM coupling bug (the write path discards TBC on `mtspr XER`), one MEDIUM ISA-form deviation
+(`lmw` does not skip RA-in-range stores unlike Canary), and two LOW findings. The `lswi` body itself
+is correct; `lmw` core logic (loop bound, zero-extension, byte-packing, register wraparound) is clean.
+Zero unit tests across all three opcodes.
+
+### PPCBUG-123 — `lswx` XER TBC field not modeled; always loads 0 bytes
+
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: `context.rs:235-237` (`xer()` method) + `interpreter.rs:4172`
+- **Symptom**: `ctx.xer()` assembles only SO[31], OV[30], CA[29] — bits 0–28 are always zero.
+  `lswx` reads `ctx.xer() & 0x7F` expecting the XER TBC byte-count field at bits 0–6, but always
+  gets 0. The `while bytes_left > 0` loop never executes; **`lswx` is permanently a no-op** —
+  no bytes are loaded, no destination registers are written. The companion `stswx` at
+  `interpreter.rs:4191` has the identical pattern and is equally broken.
+- **Root cause**: `PpcContext` has no `xer_tbc` field. Neither `xer()` nor `set_xer()` model
+  XER[25:31]. Any `mtspr XER, rN` that sets a non-zero byte count silently discards it (PPCBUG-124).
+- **Cross-reference**: Canary marks `lswx` as `XEINSTRNOTIMPLEMENTED()` — xenia-rs implemented the
+  body but left the XER infrastructure incomplete.
+- **Fix**:
+  1. Add `pub xer_tbc: u8` to `PpcContext`.
+  2. In `xer()`: `| (self.xer_tbc as u32)` for bits 0–6.
+  3. In `set_xer()`: `self.xer_tbc = (val & 0x7F) as u8`.
+  The `lswx` body is then correct as-is.
+- **Test gap**: zero unit tests. After fix: `mtspr XER, r3` (r3=4) then `lswx r5, 0, r4` should
+  write exactly 4 bytes into r5 (high byte = first byte at EA).
+
+### PPCBUG-124 — `set_xer()` discards TBC on `mtspr XER` (structural coupling to PPCBUG-123)
+
+- **Severity**: MEDIUM (must land with PPCBUG-123)
+- **Status**: open
+- **Location**: `context.rs:239-244`
+- **Symptom**: `set_xer()` writes only SO/OV/CA from the 32-bit value, silently discarding bits 0–28
+  (including the 7-bit TBC field). Any guest `mtspr XER, rN` with a non-zero byte count loses that
+  count; subsequent `lswx`/`stswx` see TBC=0. Fix is the same three-line change as PPCBUG-123.
+
+### PPCBUG-125 — `lmw` missing RA-in-destination-range skip
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:1515`
+- **Symptom**: PowerISA declares `lmw rT, D(rA)` invalid when `rA` is in `[rT..31]`. Canary skips
+  the store to `rA` in that case (`if (i.D.RT + j == i.D.RA) continue`). xenia-rs pre-computes EA
+  before the loop (so EA values remain correct), but overwrites `rA` with the loaded word instead of
+  preserving it. Result differs from Canary for this invalid encoding. Any program that relies on RA
+  surviving a nominally invalid `lmw` will see the wrong value.
+- **Fix**:
+  ```rust
+  for r in instr.rd()..32 {
+      if r == instr.ra() { ea = ea.wrapping_add(4); continue; }
+      ctx.gpr[r] = mem.read_u32(ea as u32) as u64;
+      ea = ea.wrapping_add(4);
+  }
+  ```
+- **Test gap**: zero tests. Add: `lmw r28, 0(r28)` (RA=RT=28) — after fix, gpr[28] unchanged.
+
+### PPCBUG-126 — `lswi` uses `instr.rb()` instead of `instr.nb()` for the NB field
+
+- **Severity**: LOW (maintenance hazard, not a correctness bug)
+- **Status**: open
+- **Location**: `interpreter.rs:1340`
+- **Symptom**: `instr.rb()` and `instr.nb()` both extract bits 16–20 and return identical values.
+  Using `rb()` misrepresents the operand as a register reference rather than a 5-bit immediate count.
+  The companion `stswi` at line 1359 has the same pattern. A future `rb()` type-system refactor
+  could break `lswi`/`stswi` silently.
+- **Fix**: `instr.nb()` at both sites.
+
+### PPCBUG-127 — Zero execution tests for lmw, lswi, lswx
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` test module
+- **Symptom**: No `#[test]` exists for any of the three opcodes. A regression in loop bounds,
+  byte-packing, EA computation, or the NB=0 special case would go undetected.
+- **Recommended minimum**: `lmw r30, 0(r1)` (2-word load); `lswi r3, r4, 8` (2-word byte pack);
+  `lswi r31, r4, 8` (register wraparound → r31 and r0); `lswi r3, r4, 0` (NB=0→32 special case);
+  post-PPCBUG-123 fix: `lswx` with XER TBC=4 (1-word load), TBC=0 (no-op), TBC=5 (partial word).
+
+---
+
+## Batch 5 — store byte/halfword (group 24)
+
+Per-group report: `audit-out/group-24-store-byte-half.md`.
+
+Group 24 summary: **3 findings: 1 HIGH (cross-cutting reservation invalidation), 1 LOW/informational
+(update-form zero-extension correct but undocumented), 1 LOW (zero test coverage).** EA computation,
+value truncation (`as u8`, `as u16`), RA=0 special cases, update-form writeback zero-extension,
+big-endian `mem.write_u16` path, and `sthbrx` byte-reverse logic are all ISA-correct. The single
+HIGH finding is the systemic absence of `invalidate_for_write` calls — same class as PPCBUG-107,
+now documented for all 9 byte/halfword store opcodes.
+
+### PPCBUG-130 — All 9 store-byte/halfword opcodes missing `invalidate_for_write` (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Locations**: `interpreter.rs:1207` (stb), `1213` (stbu), `1219` (stbx), `1225` (stbux),
+  `1231` (sth), `1237` (sthu), `1243` (sthx), `1249` (sthux), `1337` (sthbrx)
+- **Class**: same root cause as PPCBUG-107 (stw/stdcx family — `invalidate_for_write` never called
+  from any store arm).
+- **Symptom**: Under `--parallel`, a `stb`, `sth`, or `sthbrx` (or any variant in this group) to a
+  cache line reserved by another thread via `lwarx`/`ldarx` does NOT clear the table slot.
+  The reserving thread's subsequent `stwcx.`/`stdcx.` spuriously succeeds even though an
+  intervening sub-word store has modified the line — violating store-conditional atomicity. Affects
+  any lock-free protocol that uses byte or halfword stores adjacent to or inside a `lwarx`/`stwcx.`
+  loop (e.g. byte-level spinlocks, tagged-pointer updates, audio ring-buffer flags).
+- **Fix** (per PPCBUG-107 pattern): before each `mem.write_u8/u16`, add:
+  ```rust
+  if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
+      if t.has_active_reservers() { t.invalidate_for_write(ea); }
+  }
+  ```
+- **Note**: PPCBUG-107 is the canonical parent finding. PPCBUG-130 documents that the byte/halfword
+  group must be included in the same fix sweep.
+
+### PPCBUG-131 — Update-form rA zero-extension correct but undocumented (LOW / informational)
+
+- **Severity**: LOW (informational — behavior is correct)
+- **Status**: open (documentation gap)
+- **Locations**: `interpreter.rs:1216` (stbu), `1228` (stbux), `1240` (sthu), `1252` (sthux)
+- **Symptom**: Each update-form arm writes `ctx.gpr[instr.ra()] = ea as u64` where `ea: u32`.
+  This zero-extends to 64 bits — correct in the 32-bit ABI (addresses are 32-bit; upper half must
+  be zero). No bug, but there is no comment explaining the deliberate zero-extension. A maintainer
+  who computes EA as `u64` throughout and drops the `as u32` intermediate would silently
+  sign-extend negative displacements into rA, mirroring the `addis` bug shape.
+- **Fix**: add comment `// EA is u32; zero-extend into rA (32-bit ABI: upper 32 bits must be 0).`
+  at each update-form writeback line.
+
+### PPCBUG-132 — Zero unit tests for all 9 store-byte/halfword opcodes (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` test module
+- **Symptom**: No `test_stb*` or `test_sth*` functions exist. Any regression in EA computation,
+  value truncation, update-form writeback order, or `sthbrx` byte-swap logic would be invisible.
+- **Recommended minimum**: `stb` basic + ra=0; `stbu`/`stbux` with rA writeback check; `stbx`
+  ra=0; `sth` big-endian byte check (`0xDEAD` → `[0xDE, 0xAD]`); `sthu`/`sthux` writeback;
+  `sthbrx` byte-reversed check (`0xDEAD` → `[0xAD, 0xDE]`); post-PPCBUG-130 fix: `lwarx` + `stb`
+  to same line + `stwcx.` → CR0.EQ=0.
+
+IDs PPCBUG-133 through PPCBUG-139 are unallocated — reserved for group 24 follow-up.
+
+---
+
+## Batch 5 — store word (group 25)
+
+Per-group report: `audit-out/group-25-store-word.md`.
+
+Group 25 summary: **8 findings: 4 HIGH (reservation invalidation per opcode), 0 MEDIUM, 4 LOW.**
+Core arithmetic and semantics are entirely clean for all 6 opcodes. EA computation (RA=0 guards,
+simm16 sign-extend, u32 truncation), value truncation (`as u32`), update-form writebacks
+(`ea as u64` zero-extension), big-endian `mem.write_u32`, `stwbrx` byte-reversal, and `stwcx`
+conditional-store logic (cache-line reservation check, CAS, CR0 update, reservation always
+cleared) all match the ISA and Canary exactly. The `stwcx` manual snapshot is stale (uses old
+`reserved_addr` field name; live code correctly uses `reserved_line` at cache-line granularity —
+actually MORE correct than the snapshot). Dominant finding is the same systemic miss as PPCBUG-107
+and PPCBUG-130: `invalidate_for_write` is never called from any plain store arm.
+
+### PPCBUG-140 — stw: missing `invalidate_for_write` call (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Location**: `interpreter.rs:1183-1188`
+- **Systemic root cause**: PPCBUG-107
+- **Symptom**: Under `--parallel` with the ReservationTable enabled, a plain `stw` by thread B
+  to a cache line reserved by thread A does not clear thread A's table slot. Thread A's
+  subsequent `stwcx.` spuriously succeeds (CR0.EQ=1) even though thread B has written the line.
+  All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic counters) built on
+  `lwarx`/`stwcx.` are broken in multi-threaded mode. `stw` is the most common store instruction —
+  every stack write, pointer store, and integer field write is affected.
+- **Fix**: Before `mem.write_u32(ea, ...)`:
+  ```rust
+  if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
+      if t.has_active_reservers() { t.invalidate_for_write(ea); }
+  }
+  ```
+  `has_active_reservers()` is a single `Relaxed` load — zero cost in the common non-atomic case.
+
+### PPCBUG-141 — stwu: missing `invalidate_for_write` call (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Location**: `interpreter.rs:1189-1194`
+- **Systemic root cause**: PPCBUG-107
+- **Symptom**: Same class as PPCBUG-140. `stwu r1, -N(r1)` is the canonical function-prologue
+  stack-allocation idiom emitted by every compiled function. A thread holding a reservation on
+  the stack region would see spurious `stwcx.` success after any prologue store.
+- **Fix**: Same pattern as PPCBUG-140, inserted before `mem.write_u32`.
+
+### PPCBUG-142 — stwx: missing `invalidate_for_write` call (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Location**: `interpreter.rs:1195-1200`
+- **Systemic root cause**: PPCBUG-107
+- **Symptom**: Same class as PPCBUG-140. `stwx` is the indexed store used for array writes and
+  indirect dereferences — common in loops that may run concurrently with reservation holders.
+- **Fix**: Same pattern as PPCBUG-140.
+
+### PPCBUG-143 — stwux: missing `invalidate_for_write` call (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Location**: `interpreter.rs:1201-1206`
+- **Systemic root cause**: PPCBUG-107
+- **Symptom**: Same class as PPCBUG-140. Less common than stw/stwu but still a plain store
+  that must participate in reservation invalidation.
+- **Fix**: Same pattern as PPCBUG-140.
+
+### PPCBUG-144 — stwbrx: missing `invalidate_for_write` call (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Location**: `interpreter.rs:1568-1573`
+- **Systemic root cause**: PPCBUG-107
+- **Symptom**: Same class as PPCBUG-140. Byte-reversed stores (used for LE-payload GPU command
+  buffers, file format fields) are still plain stores with respect to the reservation protocol.
+- **Fix**: Same pattern as PPCBUG-140. `ea` is already a `u32` at this point (line 1570).
+
+### PPCBUG-145 — stwcx: stale manual snapshot uses `reserved_addr` (LOW)
+
+- **Severity**: LOW (documentation only; live code is correct)
+- **Status**: open
+- **Location**: `ppc-manual/memory/stwcx.md` (frozen snapshot section)
+- **Symptom**: The frozen snapshot shows `ctx.reserved_addr == ea` (exact-word comparison).
+  The live code at `interpreter.rs:1137-1153` uses `ctx.reserved_line == line` where
+  `line = ea & !RESERVATION_MASK` (cache-line comparison). The live code is MORE correct per
+  ISA (PowerISA 2.07B defines reservation at cache-line granularity). Snapshot reflects an
+  earlier implementation before M3 introduced `RESERVATION_MASK` and `reserved_line`.
+  Tests confirm live behavior is correct (`stwcx_succeeds_within_same_cache_line`).
+- **Fix**: Regenerate the `stwcx.md` snapshot to show current field names and add a note on
+  the ISA cache-line granule.
+
+### PPCBUG-146 — Zero unit tests for stwu / stwx / stwux / stwbrx (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` test module
+- **Symptom**: Four of the six group-25 opcodes have zero dedicated unit tests.
+- **Recommended minimum**:
+  - `stwu r3, -8(r1)`: verify memory at `r1-8` and `gpr[1]` updated to `old_r1 - 8`.
+  - `stwx ra=0`: store at `gpr[rb]`, verify memory and no RA writeback.
+  - `stwux`: indexed update — verify store and RA writeback.
+  - `stwbrx 0x11223344`: bytes at EA should be `[0x44, 0x33, 0x22, 0x11]`.
+
+### PPCBUG-147 — stwcx test suite missing key cases (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs:5167-5208` (two existing tests)
+- **Missing**:
+  - `stwcx.` without prior `lwarx` → CR0.EQ=0, memory not written.
+  - Post-PPCBUG-140-fix: `lwarx` then `stw` to same line then `stwcx.` → CR0.EQ=0.
+  - RA=0 form: `stwcx. rS, 0, rB`.
+  - Explicit memory check on failure path (assert memory unchanged).
+
+IDs PPCBUG-148 and PPCBUG-149 are unallocated — reserved for group 25 follow-up.
+
+---
+
+## Batch 5 (continued) — store multiple/string (group 27)
+
+Per-group report: `audit-out/group-27-store-mlsr.md`.
+
+Group 27 summary: **5 findings: 2 HIGH, 1 MEDIUM, 2 LOW.** `stswx` is a permanent no-op (identical
+root cause as PPCBUG-123 for `lswx` — XER TBC field not modeled; fixed as side effect of
+PPCBUG-123/124). `stmw`, `stswi`, and `stswx` all omit `invalidate_for_write`, aggravated vs.
+single-word stores because a single `stmw` can dirty multiple cache lines. `stswi` uses `instr.rb()`
+instead of `instr.nb()` (maintenance hazard, same shape as PPCBUG-126 for `lswi`). Zero unit tests
+across all three opcodes.
+
+### PPCBUG-160 — stmw, stswi, stswx missing `invalidate_for_write`; multi-line atomicity exposure (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Locations**: `interpreter.rs:1521` (stmw), `interpreter.rs:1357` (stswi), `interpreter.rs:4189` (stswx)
+- **Extends**: PPCBUG-107. The prior stated range `1182-1278` does not cover these three arms.
+  Multi-word instructions (stmw up to 128 bytes = 2 lines; stswx up to 127 bytes = ~2 lines) make
+  the probability of missing a reservation invalidation much higher than single-word stores.
+- **Symptom**: thread B's `stmw` saves 18+ non-volatile registers across two cache lines. Thread A's
+  `lwarx` reservation on the second line is not cleared. Thread A's `stwcx.` spuriously succeeds.
+  Because `stmw` is the ABI-standard non-volatile register save, this is triggered constantly in
+  function prologues — any lock-free primitive inside a prologue/epilogue window is at risk.
+- **Fix** (same pattern as PPCBUG-107): before each `mem.write_u32`/`mem.write_u8` call, add the
+  `invalidate_for_write` guard. See group-27 report for per-opcode code snippets.
+- **Test gap**: `lwarx` reserve a line, `stmw` across that line, `stwcx.` must return CR0.EQ=0.
+
+### PPCBUG-161 — `stswx` is a permanent no-op: XER TBC not modeled (HIGH)
+
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: `interpreter.rs:4189` (`stswx` arm) + `context.rs:235-243` (`xer()`/`set_xer()`)
+- **Companion**: PPCBUG-123 (lswx), PPCBUG-124 (mtspr XER). This finding covers the store side.
+- **Symptom**: `ctx.xer() & 0x7F` always returns 0 (no `xer_tbc` field). `stswx` unconditionally
+  stores zero bytes. The byte-loop body is otherwise correct and requires no further changes.
+- **Fix**: same three-line fix as PPCBUG-123 (add `xer_tbc: u8` to `PpcContext`; update `xer()`
+  and `set_xer()`). The `stswx` body is correct once TBC is live.
+- **Test gap**: `mtspr XER` (TBC=5) + `stswx r3, 0, r4` → 5 bytes written big-endian.
+
+### PPCBUG-162 — `stswi` uses `instr.rb()` instead of `instr.nb()` for NB field (MEDIUM)
+
+- **Severity**: MEDIUM (maintenance hazard; not a runtime correctness bug today)
+- **Status**: open
+- **Location**: `interpreter.rs:1359`
+- **Companion**: PPCBUG-126 (`lswi` identical pattern at line 1340).
+- **Symptom**: `instr.rb()` and `instr.nb()` extract the same bits 16-20, so values are equal now.
+  If `rb()` is ever given a newtype wrapper (e.g. `RegIdx`) to enforce register semantics, the cast
+  `instr.rb() as u32` will either fail or yield wrong semantics — silently treating a register index
+  as a byte count.
+- **Fix**: `let nb = if instr.nb() == 0 { 32 } else { instr.nb() };`
+
+### PPCBUG-163 — Zero unit tests for stmw, stswi, stswx (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` test module
+- **Symptom**: No `#[test]` exists for any of the three opcodes. Regressions in loop bounds, byte
+  order, EA computation, NB=0 handling, or register wraparound are invisible.
+- **Recommended minimum**: stmw 2-word and 32-word cases; stswi 4-byte / 0 to 32 / wraparound /
+  partial; stswx (post PPCBUG-123 fix) TBC=4, TBC=0, TBC=5. See group-27 report for full list.
+
+ID PPCBUG-164 is unallocated — reserved for group 27 follow-up.
+
+---
+
+## Batch 5 (continued) — store doubleword (group 26)
+
+Per-group report: `audit-out/group-26-store-doubleword.md`.
+
+Group 26 summary: **0 HIGH, 2 MEDIUM, 2 LOW.** The core semantics of all six opcodes are
+ISA-correct: `ds()` decoder extracts the DS-form displacement correctly; `mem.write_u64` handles
+big-endian byte ordering; update-form writebacks are zero-extended and in the right order; `stdcx.`
+CR0 encoding, reservation check, and table-path interaction all match the ISA. `stdbrx` correctly
+applies `swap_bytes()`. No 32-bit writeback truncation issues (these are store ops, not ALU ops).
+Two MEDIUM findings: (1) PPCBUG-150 extends PPCBUG-107 to the doubleword stores (same gap —
+`invalidate_for_write` never called); (2) PPCBUG-151 identifies that `stwcx.` and `stdcx.` share
+the same reservation slot without a width discriminator, allowing a `lwarx`+`stdcx.` or
+`ldarx`+`stwcx.` cross-pair to succeed when it should fail. Four IDs used (PPCBUG-150..153).
+
+### PPCBUG-150 — `std`/`stdu`/`stdx`/`stdux`/`stdbrx` do not call `invalidate_for_write` (scope extension of PPCBUG-107)
+
+- **Severity**: MEDIUM (same classification as PPCBUG-107)
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Locations**:
+  - `interpreter.rs:1258` (`std`)
+  - `interpreter.rs:1264` (`stdx`)
+  - `interpreter.rs:1269` (`stdu`)
+  - `interpreter.rs:1275` (`stdux`)
+  - `interpreter.rs:4163` (`stdbrx`)
+- **Symptom**: When `--parallel` is active and the `ReservationTable` is enabled, any of these
+  five stores to an address another HW thread has reserved via `ldarx` will NOT invalidate that
+  thread's reservation. The `ldarx`-holding thread's `stdcx.` can subsequently succeed even though
+  the memory was overwritten — a classic LL/SC ABA gap. Fix session for PPCBUG-107 must include
+  these five sites.
+- **Fix**: in each arm, after `mem.write_u64(ea, ...)`, add:
+  ```rust
+  if let Some(t) = &ctx.reservation_table {
+      if t.has_active_reservers() { t.invalidate_for_write(ea); }
+  }
+  ```
+
+### PPCBUG-151 — `stdcx.`/`stwcx.` reservation width not discriminated: cross-width pair silently succeeds
+
+- **Severity**: MEDIUM
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Location**: `interpreter.rs:4119-4155` (`stdcx`) vs `interpreter.rs:1134-1180` (`stwcx`)
+- **Symptom**: Both `stwcx.` and `stdcx.` match reservations using only `(has_reservation,
+  reserved_line)`. A `lwarx` reservation can be spuriously committed by `stdcx.`, or a `ldarx`
+  reservation by `stwcx.`, as long as the cache line matches. The ISA requires pairing — `lwarx`
+  must be committed by `stwcx.`, and `ldarx` by `stdcx.`. Cross-width commit reads the wrong width
+  from memory and writes back the wrong width, with no failure indication (CR0.EQ=1).
+- **Fix**: add a `reservation_width: u8` field (4 or 8) to `PpcContext`. `stwcx.` requires
+  `reservation_width==4`; `stdcx.` requires `reservation_width==8`. In the table path, pack the
+  1-bit width flag into one of the spare bits of the 64-bit slot (bits 39–32 are always zero for
+  line addresses in the 32-bit guest address space).
+
+### PPCBUG-152 — `stdu`/`stdux` no invalid-form guard for RS==RA (LOW)
+
+- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this)
+- **Status**: open
+- **Location**: `interpreter.rs:1267-1278`
+- **Symptom**: When `RA==RS`, the store writes the original RS value, then RA (==RS) is
+  overwritten with EA, destroying the source. ISA marks this invalid-form. Consistent with
+  policy of other update-form stores in groups 18-22.
+- **Fix**: `debug_assert!(instr.ra() != 0 && instr.ra() != instr.rs())` in debug builds.
+
+### PPCBUG-153 — Zero unit tests for std/stdu/stdx/stdux/stdbrx; stdcx. happy-path only (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` test module (only `test_ldarx_stdcx_pair` at line 4629)
+- **Missing coverage**: `std` with negative DS; `std` with RA=0; `stdu` update writeback; `stdx`
+  with RA=0; `stdux` indexed update; `stdbrx` byte-reversed output; `stdcx.` failure path (no
+  prior reservation or EA mismatch); `stdcx.` `has_reservation` cleared on failure.
+- **Recommended minimum**: 6 tests — see per-group report for encodings.
+
+IDs PPCBUG-154 through PPCBUG-159 are unallocated — reserved for group 26 follow-up.
+
+---
+
+## Batch 5 (continued) — store float (group 28)
+
+Per-group report: `audit-out/group-28-store-float.md`.
+
+Group 28 summary: **7 findings: 3 HIGH, 1 MEDIUM, 3 LOW.** EA computation, endianness, update-form
+writebacks, and `stfiwx` integer-word extraction are all correct. Critical bugs: (1) `stfs*` never
+raises FPSCR exception bits (VXSNAN, XX, OX, UX) required by PowerISA for double→single narrowing;
+(2) `stfs*` ignores FPSCR.RN rounding mode, always using round-to-nearest-even; (3) all 9 FP store
+arms omit `invalidate_for_write` (same class as PPCBUG-107). The `stfd*` family and `stfiwx` are
+clean (bit-pattern stores with no conversion). Zero unit tests across all 9 opcodes.
+**7 IDs used (PPCBUG-165..171). 3 IDs unallocated (PPCBUG-172..174).**
+
+### PPCBUG-165 — stfs* does not raise FPSCR exception bits (VXSNAN, XX, OX, UX)
+
+- **Severity**: HIGH
+- **Status**: open
+- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux)
+- **Symptom**: PowerISA requires that `stfs` double→single narrowing raises FPSCR[VXSNAN] for SNaN
+  input, FPSCR[OX] on overflow to ±∞, FPSCR[UX] on underflow to ±0/denormal, and FPSCR[XX] when the
+  result is inexact. None of these bits are ever set. The narrowing is done via `ctx.fpr[instr.rs()] as f32`
+  (x86 `CVTSD2SS`); no FPSCR inspection or update follows. Games that poll FPSCR[OX] to detect
+  overflow (physics engines clamping large velocities), or FPSCR[VXSNAN] after sentinel SNaN writes,
+  get false negatives.
+- **Canary parity**: Canary also omits these FPSCR updates for `stfs*`. Both share the deviation.
+- **Fix**: after the narrowing, check `fpscr::is_snan(src)` → set `VXSNAN`; compare source vs.
+  f64 round-trip of narrowed value for inexact; compare src.is_finite() && f32.is_infinite() for
+  overflow. See group-28 report for illustrative code sketch.
+
+### PPCBUG-166 — stfs* ignores FPSCR.RN; always uses round-to-nearest-even
+
+- **Severity**: HIGH
+- **Status**: open
+- **Locations**: interpreter.rs:1284, 1289, 1296, 1301
+- **Symptom**: `ctx.fpr[instr.rs()] as f32` uses the host MXCSR rounding mode, never consulting
+  `ctx.fpscr & fpscr::RN_MASK`. Any game that configures FPSCR.RN to truncate/ceil/floor and then
+  stores via `stfs` gets the wrong f32 in memory (wrong by at most 1 ULP). The stfs.md spec
+  explicitly acknowledges this gap.
+- **Canary parity**: Canary also ignores FPSCR.RN for stfs. Both share the deviation.
+- **Fix**: read `ctx.fpscr & fpscr::RN_MASK` and set host MXCSR before narrowing, then restore.
+  Minimum viable: `debug_assert_eq!(ctx.fpscr & fpscr::RN_MASK, 0)` for debug-build visibility.
+
+### PPCBUG-167 — All 9 FP store arms missing `invalidate_for_write` (PPCBUG-107 class)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux),
+  1308 (stfd), 1313 (stfdu), 1320 (stfdx), 1325 (stfdux), 1333 (stfiwx)
+- **Symptom**: Same class as PPCBUG-107. Under M3 `--parallel`, a FP store by thread B to a
+  cache line reserved by thread A via `lwarx` does not clear thread A's reservation table slot.
+  Thread A's subsequent `stwcx.` spuriously succeeds. Rendering workers using FP stores to shared
+  transform/particle buffers co-located with spinlock sites are at risk.
+- **Fix**: before each `mem.write_f32`/`write_f64`/`write_u32` in every FP store arm:
+  ```rust
+  if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
+      if t.has_active_reservers() { t.invalidate_for_write(ea); }
+  }
+  ```
+  Recommend a single sweep of all store groups (PPCBUG-107, 130, 160, 167) to avoid further drift.
+
+### PPCBUG-168 — stfs* SNaN narrowing: `as f32` quietens SNaN without raising FPSCR.VXSNAN
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:1284, 1289, 1296, 1301
+- **Symptom**: When FRS holds an f64 SNaN (bit 51 = 0), `CVTSD2SS` sets the f32 quiet bit (bit 22),
+  producing a QNaN in memory, without raising FPSCR[VXSNAN]. The stored memory bytes are correct per
+  IEEE-754 (narrowing an SNaN produces a QNaN). The bug is the missing FPSCR signal, a subset of
+  PPCBUG-165. **Contrast with PPCBUG-128** (lfs stores wrong FPR bits — HIGH severity; here memory
+  bytes are right, only the flag is missing).
+- **Note**: fixed as a side effect of the PPCBUG-165 fix. No independent code change needed.
+
+### PPCBUG-169 — stfd* bit-pattern store: confirmed correct (informational)
+
+- **Severity**: LOW (confirmed clean, informational)
+- **Status**: wontfix
+- **Locations**: interpreter.rs:1305, 1311, 1317, 1323
+- **Analysis**: `write_f64(ea, fpr)` → `write_u64(ea, fpr.to_bits())` → `val.to_be_bytes()`. Pure
+  bit-pattern, correct big-endian. SNaN preserved. EA computation and update-form writebacks all
+  correct. Canary parity confirmed. No bugs.
+
+### PPCBUG-170 — stfiwx: confirmed correct (informational)
+
+- **Severity**: LOW (confirmed clean, informational)
+- **Status**: wontfix
+- **Location**: interpreter.rs:1329-1335
+- **Analysis**: `write_u32(ea, fpr.to_bits() as u32)` correctly extracts the low 32 bits of the
+  64-bit FPR as a raw bit pattern (the integer word produced by `fctiw`/`fctiwz`) and stores
+  big-endian. RA=0 handled correctly. No FPSCR effects required. Canary parity confirmed. No bugs.
+
+### PPCBUG-171 — Zero unit tests for all 9 store-float opcodes
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs test module
+- **Symptom**: No `#[test]` covers any of the 9 FP store arms. Regressions in EA computation,
+  endianness, update-form writeback order, or double→single narrowing are invisible.
+- **Recommended minimum** (10 tests): `stfd` normal + SNaN bit-exact; `stfdu` update writeback;
+  `stfs` round-trip (1.0); `stfs` overflow (→ ±∞); `stfsx` ra=0; `stfsux` update; `stfiwx` integer
+  word extract; post-PPCBUG-165 fix: SNaN → FPSCR.VXSNAN set; post-PPCBUG-166 fix: RN=truncate.
+
+IDs PPCBUG-172 through PPCBUG-174 are unallocated — reserved for group 28 follow-up.
+
+---
+
+## Batch 6 — FPU single-precision (group 29)
+
+Per-group report: `audit-out/group-29-fpu-single.md`.
+
+**Context**: The live implementation is substantially more capable than the frozen ppc-manual
+snapshots indicated. `to_single()` correctly dispatches on FPSCR.RN; `check_invalid_*` helpers
+correctly set VXSNAN, VXISI, VXIMZ, VXZDZ, VXIDI, ZX; `update_after_op` sets OX, UX, and
+FPRF. The remaining bugs are: (1) XX/FI/FR (inexact) never set anywhere; (2) fmadd/fmsub
+*sx variants missing the VXISI check for the add-phase infinity collision (their *x double
+siblings have the same gap); (3) fnmadd/fnmsub NaN sign bit incorrectly flipped by Rust `-`;
+(4) fresx produces a full IEEE 1/b instead of the ~12-bit hardware estimate; (5) FPSCR.NI
+flush-to-zero not modelled; (6) SNaN→QNaN propagation relies on host SSE behavior rather than
+the ISA-canonical derivation.
+
+**8 IDs used (PPCBUG-180..187). 12 IDs unallocated (PPCBUG-188..199).**
+
+### PPCBUG-180 — XX / FI / FR bits never set across all FPU *sx opcodes (and double siblings)
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: `fpscr.rs:184-194` (`update_after_op`); affects interpreter.rs:2252-2494
+- **Symptom**: `FPSCR[XX]` (inexact) should be set whenever the mathematical result of an
+  FP operation cannot be represented exactly in the destination format (single or double) and
+  a rounding step occurs. `FPSCR[FI]` (fraction inexact) and `FPSCR[FR]` (fraction rounded)
+  encode the direction. `update_after_op` sets `OX` (overflow to ±∞) and `UX` (subnormal
+  result) but has no inexact-detection logic. Since most `*sx` operations on arbitrary inputs
+  require rounding to single precision, XX is almost always wrong (false zero). Games using
+  FPSCR polling to check exactness receive false "exact" results.
+- **Canary parity**: Canary's `UpdateFPSCR` also does not set XX/FI/FR. Both share this gap.
+- **Fix**: In `update_after_op` (or a post-`to_single` helper), compare the pre-round f64
+  result with the post-round f64 result. If they differ, set `XX`; inspect the difference sign
+  to set `FR`; set `FI = FR || (result was not exactly representable)`.
+
+### PPCBUG-181 — fmaddsx / fnmaddsx missing VXISI check for add-phase ±∞ collision
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:2339-2348 (fmaddsx), 2383-2392 (fnmaddsx)
+- **Symptom**: When `FRA × FRC = +∞` and `FRB = -∞` (or vice versa), PowerISA §4.3.4
+  requires `FPSCR[VXISI]` to be set and the result to be a QNaN. The double-precision sibling
+  `fmaddx` (line 2327) correctly calls `fpscr::check_invalid_add(ctx, a * c, b, false)` after
+  the multiply-check. `fmaddsx` omits this call entirely — only `check_invalid_mul` runs.
+  Games using fused-madd in dot-product accumulators that might overflow to ±∞ (e.g. lighting
+  accumulators with very large normals) lose the VXISI signal.
+- **Fix**:
+  ```rust
+  // inside fmaddsx arm, after check_invalid_mul:
+  fpscr::check_invalid_add(ctx, a * c, b, false);
+  ```
+  Same for fnmaddsx (same operand pair, same `false` sense for the add).
+
+### PPCBUG-182 — fmsubsx / fnmsubsx missing VXISI check for subtract-phase ±∞ collision
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:2361-2370 (fmsubsx), 2405-2414 (fnmsubsx)
+- **Symptom**: When `FRA × FRC = ±∞` and `FRB = ±∞` with the same sign, `(±∞) − (±∞)`
+  should fire `FPSCR[VXISI]`. Neither `fmsubsx` nor `fnmsubsx` calls `check_invalid_add`.
+- **Fix**:
+  ```rust
+  // inside fmsubsx arm, after check_invalid_mul:
+  fpscr::check_invalid_add(ctx, a * c, -b, false);
+  ```
+  Same for fnmsubsx. The negated `b` turns the subtract into the add-form so that
+  `check_invalid_add(..., false)` uses the correct infinity-sign comparison.
+
+### PPCBUG-183 — fnmaddsx / fnmsubsx NaN sign bit incorrectly flipped by Rust unary `-`
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:2388 (fnmaddsx), 2410 (fnmsubsx)
+- **Symptom**: `to_single(ctx, -(a.mul_add(c, b)))` — Rust's unary `-f64` always flips the
+  IEEE sign bit, including when the value is NaN. PowerISA §4.3.2 specifies that the final
+  negation in `fnmadd`/`fnmsub` is NOT applied to a QNaN result: if the fused computation
+  yields a NaN (due to SNaN input, VXIMZ, or VXISI), the negation is skipped and the NaN is
+  propagated with its canonical sign unchanged. xenia-rs flips the sign bit of any NaN result,
+  producing a QNaN with the wrong sign. Observable by storing via `stfd` and inspecting bits.
+  Games using sign-bit NaN tagging (e.g. `0xFFC00000` vs `0x7FC00000` as distinct sentinels)
+  are affected.
+- **Fix**:
+  ```rust
+  // fnmaddsx arm:
+  let inner = a.mul_add(c, b);
+  let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
+  // fnmsubsx arm:
+  let inner = a.mul_add(c, -b);
+  let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
+  ```
+
+### PPCBUG-184 — fresx produces full-precision IEEE 1/b instead of ~12-bit hardware estimate
+
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:2481-2494
+- **Symptom**: `fres` on Xenon hardware produces a reciprocal approximation via a 256-entry
+  LUT with linear interpolation, accurate to roughly 1/4096 relative error (~12 mantissa
+  bits). xenia-rs computes `to_single(1.0 / b)` — the fully IEEE-754 correctly-rounded
+  single-precision reciprocal. The result is up to ~4096× more accurate than hardware.
+  Newton-Raphson refinement code `x = fres(d); x = x*(2 - d*x)` is not broken by this (NR
+  converges even from an accurate seed), but code that checks the seed's error magnitude for
+  convergence termination, or that relies on `fres(d)*d ≠ 1.0` to decide whether to refine,
+  may take the wrong branch. Also, `fres(d)*d` on xenia is much closer to 1.0 than on hardware,
+  so a "was the estimate good enough?" check based on the residual will give wrong answers.
+- **Canary parity**: Canary uses `f.Recip(f.Convert(frB, FLOAT32_TYPE))` — approximates by
+  first converting to f32 (quantizing the input), then applying the host reciprocal. Still
+  produces a fully-accurate IEEE single reciprocal rather than the 12-bit table estimate.
+  Both emulators share the deviation. Canary's conversion-first approach is slightly closer to
+  hardware (the input is quantized before the reciprocal), so if a future fix is desired,
+  Canary's approach is the better reference.
+- **Fix (minimal viable)**: Pre-convert input to f32 to match Canary's quantization:
+  `let b32 = b as f32; to_single(ctx, 1.0_f64 / b32 as f64)`. This matches Canary but still
+  does not emulate the 12-bit LUT. Full fix requires an `fres` LUT matching Xenon's hardware
+  table (documented in Xbox 360 SDK / GamePPCLisa docs).
+
+### PPCBUG-185 — FPSCR.NI flush-to-zero not modelled; subnormal results propagate through *sx
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: All *sx arms in interpreter.rs; fpscr.rs has `NI` not defined as a constant
+- **Symptom**: Xenon firmware sets `FPSCR.NI = 1` at boot. With NI=1, the Xenon FPU flushes
+  subnormal inputs and results to the appropriate signed zero before and after every
+  floating-point operation. xenia-rs inherits the host x86 IEEE-754 default (NI=0), which
+  propagates subnormals. Subnormal differences: (a) subnormal FPR inputs are used as-is by
+  xenia vs. treated as ±0 by hardware; (b) subnormal results are stored by xenia vs. flushed
+  to ±0 by hardware.  `update_after_op` sets `UX` when the result is subnormal, but does NOT
+  flush it.  Games with NI-dependent behavior — most Xbox 360 titles compiled with default
+  Xenon ABI settings — may see different float results in subnormal-touching paths.
+- **Canary parity**: Canary also inherits host IEEE NI=0 semantics. Both share this gap.
+- **Fix**: After `to_single` (or the double-precision result), check `ctx.fpscr & fpscr::NI_BIT`
+  (needs a constant adding) and if set, flush subnormals: `if result.is_subnormal() { result =
+  result.signum() * 0.0 }`. Apply to inputs as well for strict correctness.
+
+### PPCBUG-186 — SNaN → QNaN propagation relies on host SSE; not ISA-canonical for all *sx
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:2252-2414 (all arithmetic *sx arms without explicit SNaN guard)
+- **Symptom**: When an SNaN input reaches `faddsx`/`fsubsx`/`fmulsx`/`fdivsx`, the code calls
+  `check_invalid_add/mul/div` (correctly sets VXSNAN) but then performs the operation on the
+  raw SNaN value: `a + b`, `a * c`, etc. On x86-64 SSE2, the hardware `ADDSD`/`MULSD` ops
+  produce a QNaN from the first SNaN operand (bit 51 set, other mantissa bits preserved). This
+  matches ISA §4.3.2.2 for the common case. However, for `mul_add` (VFMADD231SD on AVX), the
+  SNaN propagation priority may differ: the ISA specifies FRA takes priority over FRB, but
+  hardware FMA may use a different priority for the three-operand form. The `fsqrtsx` and
+  `fresx` arms handle SNaN explicitly (via `is_snan` check) but do not synthesize the correct
+  QNaN result — they rely on `b.sqrt()` / `1.0/b` to produce a NaN, which the host does.
+  This is a latent risk; active wrong-result cases require bit-level NaN inspection.
+
+### PPCBUG-187 — Zero interpreter execution tests for all 10 group-29 opcodes
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs test module (no `#[test]` covers any *sx or fresx)
+- **Symptom**: Regressions in rounding, FPSCR side effects, or operand-field decoding are
+  invisible to CI. The existing fpscr unit tests cover helper functions in isolation; no test
+  exercises the full `step()` path for any single-precision FPU opcode.
+- **Recommended minimum** (12 tests — see group-29 report for encodings):
+  `fadds` exact; `fadds` VXISI; `fsubs` VXISI; `fmuls` 0×∞; `fdivs` ZX;
+  `fmadds` VXISI regression (PPCBUG-181); `fmsubs` VXISI regression (PPCBUG-182);
+  `fnmadds` NaN-sign (PPCBUG-183); `fnmsubs` NaN-sign (PPCBUG-183);
+  `fsqrts` negative input VXSQRT; `fsqrts` round-trip; `fres` basic reciprocal.
+
+IDs PPCBUG-188 through PPCBUG-199 are unallocated — reserved for group 29 follow-up.
+
+---
+
+## Batch 6 (continued) — FPU arithmetic double (group 30)
+
+Per-group report: `audit-out/group-30-fpu-double.md`.
+
+Group 30 summary: **9 findings (PPCBUG-200..208). 2 MEDIUM cross-cutting, 3 MEDIUM opcode-specific, 4 LOW.** Result arithmetic is correct for all 10 opcodes. FPSCR infrastructure is partially wired: VXSNAN, OX, UX, ZX, VXISI (add/sub), VXIMZ, VXZDZ, VXIDI, VXSQRT all set correctly for the opcodes that need them. Critical gaps: (1) XX/FR/FI bits never set by any opcode — same gap as PPCBUG-180 but now confirmed on the double-precision path; (2) FPSCR.RN not honored for double arithmetic — single-precision has `round_to_single` but double has no equivalent; (3) fmsubx/fnmaddx/fnmsubx omit the VXISI check for ∞-collision in the add step; (4) fnmaddx/fnmsubx flip NaN sign bit via Rust `-` operator but ISA requires NaN sign preserved. frsqrtex uses full-precision 1/sqrt(b) instead of the hardware estimate — acceptable. All FMA forms use `f64::mul_add` for correct single-rounding semantics.
+**9 IDs used (PPCBUG-200..208). 11 IDs unallocated (PPCBUG-209..219).**
+
+### PPCBUG-200 — All group-30 opcodes: XX, FR, FI bits never set
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `fpscr.rs:184-194` (`update_after_op`); `interpreter.rs:2248,2268,2289,2310,2335,2357,2379,2401,2463,2510`
+- **Symptom**: Same gap as PPCBUG-180 but confirmed for the double-precision path. `update_after_op` only tracks OX (overflow to infinity) and UX (subnormal). FPSCR[XX] (inexact sticky), FPSCR[FR] (round direction), and FPSCR[FI] (inexact for current op) are never updated by any group-30 opcode. Every double-precision arithmetic operation that rounds a non-representable result silently omits these bits.
+- **Fix**: Same as PPCBUG-180 — read MXCSR exception flags after each f64 operation and map to FI/XX/FR. For double, no `to_single` step is involved so the comparison must be done via MXCSR or by a post-op bit-level comparison of inputs vs. result.
+- **Test gap**: Zero tests verify XX set after any inexact double-precision operation.
+
+### PPCBUG-201 — All group-30 opcodes: FPSCR.RN not honored for double arithmetic
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:2242-2512` (all 10 arms)
+- **Symptom**: Host f64 operators always use nearest-even (host MXCSR default). `fpscr.rs` has a complete `rounding_mode(ctx)` helper and directed rounding helpers for single-precision (`round_to_single`), but no equivalent for double arithmetic. Guest `mtfsfi` RN changes have no effect on faddx/fsubx/fdivx/fsqrtx etc.
+- **Fix**: Wrap each double-precision arithmetic arm with an MXCSR round-mode set/restore when `ctx.fpscr & fpscr::RN_MASK != 0`. Fast path (RN=0) stays zero-cost.
+- **Test gap**: No test changes RN and verifies directed rounding on any double arithmetic opcode.
+
+### PPCBUG-202 — fmaddx: non-FMA `a * c` used in check_invalid_add can spuriously raise/miss VXISI
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:2332`
+- **Symptom**: `check_invalid_add(ctx, a * c, b, false)` uses a separate two-rounding multiply to approximate the FMA intermediate product. When the true FMA intermediate is finite but the standalone product overflows to ±∞, VXISI fires spuriously. When the true intermediate is ±∞ but the standalone product is finite (extreme cancellation), VXISI is missed.
+- **Fix**: Derive VXISI from input-value properties directly: if `(a.is_infinite() || c.is_infinite())` (product is mathematically infinite) and `b.is_infinite()` with opposing sign → VXISI.
+- **Test gap**: No test covers the large-value cancellation case in fmaddx.
+
+### PPCBUG-203 — fmsubx, fnmaddx, fnmsubx: VXISI never raised for ∞-collision in add/sub step
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: `interpreter.rs:2354` (fmsubx), `2376` (fnmaddx), `2398` (fnmsubx)
+- **Symptom**: Same pattern as PPCBUG-181/182 for the double-precision variants. These three arms call only `check_invalid_mul` and omit `check_invalid_add`. Per ISA, all four FMA variants must raise VXISI when the add step yields ∞+∓∞. Example for fmsub: `A×C = +∞`, `B = +∞` → `+∞ − +∞` → VXISI. Currently the result NaN propagates silently with no FPSCR update. The fnmsub pattern is the canonical Newton-Raphson step — the most common FPU path in Xbox 360 graphics code.
+- **Fix**: Add `fpscr::check_invalid_add(ctx, a * c, b, true)` for `fmsubx`/`fnmsubx` and `fpscr::check_invalid_add(ctx, a * c, b, false)` for `fnmaddx` (apply PPCBUG-202 sign-fix simultaneously).
+- **Test gap**: Zero tests for VXISI on any of the three opcodes.
+
+### PPCBUG-204 — fmaddx check_invalid_add sub-issue (sign logic reliant on imprecise product)
+- **Severity**: LOW (sub-issue of PPCBUG-202)
+- **Status**: open
+- **Location**: `interpreter.rs:2332`
+- **Symptom**: VXISI logic is internally consistent with the passed `a * c` value, but that value can have the wrong sign in extreme overflow/underflow cases. Resolve as part of PPCBUG-202.
+
+### PPCBUG-205 — fnmaddx / fnmsubx: Rust `−` flips NaN sign bit; ISA requires NaN sign preserved
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: `interpreter.rs:2377` (fnmaddx), `interpreter.rs:2399` (fnmsubx)
+- **Symptom**: Same pattern as PPCBUG-183 for the double-precision variants. Rust's unary `-` applied to a NaN result flips the IEEE-754 sign bit. PowerISA Book I §4.3.4 states the negation is not applied to NaN results. Title code using NaN sentinels (audio middleware, debug fills) receives sign-flipped NaN payloads.
+- **Fix**:
+  ```rust
+  let fma = a.mul_add(c, b);   // fnmaddx
+  let result = if fma.is_nan() { fma } else { -fma };
+  // and analogously for fnmsubx
+  ```
+- **Test gap**: No test exercises fnmaddx/fnmsubx with NaN-producing inputs to check sign of result NaN.
+
+### PPCBUG-206 — frsqrtex edge cases correct; no code change needed (informational)
+- **Severity**: LOW (confirmed clean, informational)
+- **Status**: wontfix
+- **Location**: `interpreter.rs:2496-2512`
+- **Analysis**: ZX fires for ±0. VXSQRT guard correctly excludes -0.0. frsqrte(+∞)=+0 correct. Full-precision is acceptable over-precision.
+- **Fix**: Add comment: `// Full-precision: hardware gives ~12-14 bit estimate. NR converges identically.`
+- **Test gap**: Zero frsqrtex unit tests — add 4 (±0 inputs, negative input+VXSQRT, SNaN, +∞).
+
+### PPCBUG-207 — FMA opcode OX logic correct, OX edge cases untested (informational)
+- **Severity**: LOW (confirmed clean, informational)
+- **Status**: wontfix
+- **Location**: `interpreter.rs:2335,2357,2379,2401`
+- **Analysis**: `inputs_were_finite` correctly suppresses OX when an input is already infinite. OX fires when all inputs are finite but the FMA result overflows — ISA-correct.
+- **Test gap**: Zero tests for OX scenario in any FMA opcode.
+
+### PPCBUG-208 — Zero tests for fsubx, fdivx, fmsubx, fnmaddx, fnmsubx, fsqrtx, frsqrtex
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` test module
+- **Symptom**: 7 of 10 group-30 opcodes have zero tests. `faddx` has 1 happy-path test; `fmulx` has 1; `fmaddx` has 1. None have FPSCR/Rc=1/edge-case coverage.
+- **Recommended minimum** (12 tests): `fsubx` normal; `fsubx` VXISI; `fdivx` normal; `fdivx` ZX; `fdivx` VXZDZ; `fmsubx` normal; `fnmaddx` normal; `fnmsubx` normal; `fnmaddx` NaN-sign regression (PPCBUG-205); `fsqrtx` normal; `fsqrtx` negative+VXSQRT; `frsqrtex` positive.
+
+IDs PPCBUG-209 through PPCBUG-219 are unallocated — reserved for group 30 follow-up.
+
+---
+
+## Pending batches
+
+- Batch 2: groups 6-11 — logical immediate, logical register, sign-extend/CLZ, word rotate, doubleword rotate, shift.
+- Batch 3: groups 12-17 — compare, branch, trap+sc, CR logical, SPR/MSR, cache+sync.
+- Batch 4: groups 18-23 — loads (byte, halfword, word, doubleword, multiple/string, float).
+- Batch 5 (partial): groups 24, 26, 27, 28 done; group 25 (store word) pending.
+- Batch 6 (partial): groups 29, 30 done; group 31 (FPU convert/compare) pending.
+- Batch 7: groups 32-34 — VMX integer (add/sub, compare/min/max, logical/shift).
+- Batch 8: groups 35-38 — VMX permute/pack, VMX float, VMX multiply-sum, VMX load/store.
+- Phase C: decoder field extractors, decoder opcode-lookup, disassembler formatter parity.
+- Phase D: this file gets re-sorted by severity and finalized.
+
+---
+
+## Batch 6 (continued) — FPU sign/move/compare/convert/round (group 31)
+
+Per-group report: `audit-out/group-31-fpu-misc.md`.
+
+Group 31 summary: **9 findings (PPCBUG-221..231; IDs 220/222/226 retracted after analysis).
+1 HIGH, 3 MEDIUM, 5 LOW.** The sign-bit manipulation family (`fabsx`, `fnegx`, `fnabsx`, `fmrx`)
+and `fselx` are all ISA-correct — Rust arithmetic maps to bit-level operations that preserve SNaN
+payloads. `fcmpu` is correct (FPRF and VXSNAN set; no spurious VXVC). The conversion group is
+mostly correct for result values and overflow sentinels; the main gaps are FPSCR inexact/FR/FI
+tracking (shared with groups 29/30) and one subtle NearestEven tie-breaking defect in
+`round_to_i64` that affects `fctidx`. `fcmpo` silently omits VXSNAN/VXVC despite having a
+comment acknowledging the gap.
+
+**9 IDs used (PPCBUG-221, 223, 224, 225, 227, 228, 229, 230, 231). IDs 220/222/226 retracted.
+IDs PPCBUG-232..239 unallocated.**
+
+### PPCBUG-221 — `fctidx` / `round_to_i64` NearestEven tie-breaking uses f64::EPSILON; broken for |v| > 2^52
+
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: `fpscr.rs:220–238` (`round_to_i64`, `NearestEven` case)
+- **Symptom**: The tie-breaking code computes `diff = (v - v.trunc()).abs()` and tests
+  `(diff - 0.5).abs() < f64::EPSILON` to detect a half-integer. Above `|v| = 2^52`,
+  `v.trunc() == v` for all representable f64 values (all are exact integers), so `diff == 0.0`
+  and the tie-breaking branch is never taken — the code falls through to `v.round() as i64`,
+  which is round-half-away-from-zero instead of round-half-to-even. Every fctid call on a
+  large odd half-integer (e.g. `(2^52 + 1).5`) produces the wrong integer. In practice these
+  exact 0.5 cases are rare for large values but can appear in audio sample-count arithmetic
+  and physics fixed-point pipelines.
+- **Fix**: replace the NearestEven arm with a fractional-part-only tie check that is exact for
+  |v| <= 2^52 and degenerates correctly to truncation above 2^52:
+  ```rust
+  RoundingMode::NearestEven => {
+      let t = v.trunc();
+      let frac = v - t; // exact for |v| <= 2^52; ==0 above (already integer)
+      let fa = frac.abs();
+      if fa > 0.5 { t as i64 + if v >= 0.0 { 1 } else { -1 } }
+      else if fa < 0.5 { t as i64 }
+      else {
+          // Exact 0.5 tie — round to even.
+          let fi = t as i64;
+          if fi & 1 == 0 { fi } else { fi + if v >= 0.0 { 1 } else { -1 } }
+      }
+  }
+  ```
+- **Test gap**: add `round_to_i64` tests in `fpscr.rs:tests`: 0.5→0, 1.5→2, 2.5→2, 3.5→4,
+  -0.5→0, -1.5→-2. Existing tests cover 2.5→2 and 3.5→4 (currently accidentally correct).
+
+### PPCBUG-223 — `fcmpo` omits FPSCR[VXSNAN] and FPSCR[VXVC] on NaN operands
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:2645–2675`
+- **Symptom**: `fcmpo` body is identical to `fcmpu` — it sets FPRF and the CR field correctly
+  but calls no `fpscr::set_exception`. PowerISA requires: QNaN → `FPSCR[VXVC, VX, FX]`;
+  SNaN → additionally `FPSCR[VXSNAN]`. `fcmpu` correctly sets VXSNAN for SNaN; `fcmpo` does
+  not. A comment in the source acknowledges "not modeled yet."
+- **Impact**: `fcmpo.` (Rc=1) checking CR1.FX after a NaN compare will see FX=0 instead of
+  FX=1. `mffsx` after `fcmpo` will not reflect VXVC. Xbox 360 CRT comparison primitives
+  (`islessgreater`, ordered relational operators) use `fcmpo`.
+- **Fix**:
+  ```rust
+  if fra.is_nan() || frb.is_nan() {
+      ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: false, so: true };
+      if fpscr::is_snan(fra) || fpscr::is_snan(frb) {
+          fpscr::set_exception(ctx, fpscr::VXSNAN | fpscr::VXVC);
+      } else {
+          fpscr::set_exception(ctx, fpscr::VXVC);
+      }
+  }
+  ```
+
+### PPCBUG-224 — `fcfidx` does not set FPSCR[XX/FX] for inexact i64→f64 conversion
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:2528–2536`
+- **Symptom**: Only FPRF is updated. Per ISA, `fcfid` sets `FPSCR[XX, FX]` (and FR/FI) when
+  the i64 value has more than 53 significant bits and precision is lost. Any i64 with
+  `|v| > 2^53` triggers inexact. Common trigger: large frame/sample counters, address values.
+- **Fix**: after the conversion, compare `(result as i64) != (bits as i64)` and call
+  `fpscr::set_exception(ctx, fpscr::XX)` if inexact.
+
+### PPCBUG-225 — `frspx` does not set FPSCR[XX/FX/FR/FI] on inexact rounding
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:2516–2527`
+- **Symptom**: `update_after_op` sets OX/UX only. The ISA requires FR/FI/XX/FX on any f64→f32
+  rounding that is not exact. `frsp` is the canonical double→single-precision narrowing idiom
+  in compiler output — virtually every call is inexact.
+- **Fix**: after `to_single`, compare result vs b; if different and both finite, call
+  `fpscr::set_exception(ctx, fpscr::XX | fpscr::FI | ...)` with FR set if magnitude increased.
+
+### PPCBUG-227 — `fctiwx` rounding: `round_to_i32` inherits NearestEven defect via `round_to_i64`
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `fpscr.rs:241–243`
+- **Symptom**: `round_to_i32` calls `round_to_i64` then clamps. The PPCBUG-221 defect in
+  `round_to_i64` does not manifest for i32-range values (the epsilon check accidentally works
+  at this scale), but the structural fragility is inherited. Fixing PPCBUG-221 cures this.
+- **Recommendation**: add unit tests `round_to_i32(0.5)==0`, `round_to_i32(1.5)==2`,
+  `round_to_i32(2.5)==2` to verify correct round-to-even behavior.
+
+### PPCBUG-228 — Zero interpreter execution tests for fabsx/fnegx/fnabsx/fmrx/fselx/fcmpo/fcfidx/fctidx/fctidzx/frspx
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` `#[cfg(test)]` module
+- **Symptom**: 10 of the 13 group-31 opcodes have zero dedicated tests. `test_fcmpu` covers
+  only the ordered comparison `5.0 > 3.0`. `test_fctiwzx` covers one positive truncation.
+  `test_fadd`/`test_fmul` are group-30 tests, not group-31.
+- **Recommended minimum**: SNaN-preservation test for fabsx/fnegx/fnabsx; fselx with NaN/−0/−1;
+  fcmpo QNaN→VXVC (after PPCBUG-223 fix); fcfidx exact and inexact; fctidx tie cases; frspx
+  inexact → XX set (after PPCBUG-225 fix); fctiwx nearest-even tie; fctiwzx NaN sentinel.
+
+### PPCBUG-229 — `fctidx` / `fctidzx` do not set FPSCR[XX/FX] for inexact inputs
+
+- **Severity**: LOW
+- **Status**: open
+- **Locations**: `interpreter.rs:2537–2574`
+- **Symptom**: Per ISA, float-to-integer conversions set `FPSCR[XX, FX]` when the source
+  value is not an integer (the fractional part is discarded). Neither opcode sets XX.
+  Shared root cause with PPCBUG-224/225.
+
+### PPCBUG-230 — `fctiwx` / `fctiwzx` do not set FPSCR[XX/FX] for inexact inputs
+
+- **Severity**: LOW
+- **Status**: open
+- **Locations**: `interpreter.rs:2575–2612`
+- **Symptom**: Same omission as PPCBUG-229 for the word-width conversion pair.
+
+### PPCBUG-231 — `frspx` SNaN input result written as QNaN (host platform dependency)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:2519–2524`
+- **Symptom**: Rust's `as f32` (CVTSD2SS) can set the quiet bit on SNaN input, producing a
+  QNaN in the FPR. Per ISA, `frsp` on SNaN should quieten it — so the QNaN result is
+  correct in kind. The risk is that the exact QNaN bit-pattern may differ from PPC's
+  canonical quietening (which ORs bit 22 into the f32 mantissa). Game code inspecting the
+  NaN payload after frsp may see a different payload. Same structural root cause as
+  PPCBUG-128 (`lfs` SNaN quietening), but lower severity because frsp IS arithmetic.
+
+IDs PPCBUG-232 through PPCBUG-239 are unallocated — no further bugs found in group 31.
+
+---
+
+## Batch 7 — VMX integer add/sub (group 32)
+
+Per-group report: `audit-out/group-32-vmx-int-addsub.md`.
+
+**Scope**: `vaddubm`, `vaddubs`, `vadduhm`, `vadduhs`, `vadduwm`, `vadduws`, `vaddsbs`, `vaddshs`,
+`vaddsws`, `vaddcuw`, `vsububm`, `vsububs`, `vsubuhm`, `vsubuhs`, `vsubuwm`, `vsubuws`, `vsubsbs`,
+`vsubshs`, `vsubsws`, `vsubcuw`.
+
+**Overall verdict**: All 20 opcodes are arithmetically correct. No HIGH-severity bugs found.
+Lane indexing (big-endian, PPC element 0 = `Vec128::bytes[0]`), saturation arithmetic, VSCR.SAT
+sticky-set, and vaddcuw/vsubcuw carry/borrow semantics are all implemented correctly.
+4 LOW-severity findings (2 test gaps, 1 code organization, 1 API hazard).
+
+### PPCBUG-240 — 18 of 20 group-32 opcodes have zero interpreter-level tests
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` `#[cfg(test)]` module
+- **Symptom**: Only `test_vaddubs_saturates_and_sets_vscr_sat` covers any group-32 opcode.
+  `vaddubm`, `vsububm`, `vadduhm`, `vsubuhm`, `vadduwm`, `vsubuwm`, `vaddsbs`, `vsubsbs`,
+  `vadduhs`, `vsubuhs`, `vaddshs`, `vsubshs`, `vadduws`, `vsubuws`, `vaddsws`, `vsubsws`,
+  `vaddcuw`, `vsubcuw` — all 18 have no tests. No high risk today but no regression guard.
+- **Recommended minimum**: wrap-around test (byte, halfword, word); sat-at-max and sat-at-min tests;
+  VSCR.SAT sticky-set across two successive saturating instructions; vaddcuw carry lane; vsubcuw
+  no-borrow lane.
+
+### PPCBUG-241 — `vadduwm` / `vsubuwm` stranded in a separate section from the rest of group-32
+
+- **Severity**: LOW (maintenance hazard)
+- **Status**: open
+- **Location**: `interpreter.rs:2090–2104` (stranded) vs. `interpreter.rs:2784` (§4a group-32 section)
+- **Symptom**: The two word-modulo opcodes are matched 700 lines above the rest of the group, with
+  only a comment at line 2819 as a cross-reference. A future sweep of §4a for group-32 changes
+  would miss them.
+- **Fix**: Move both arms into §4a and remove the comment at line 2819.
+
+### PPCBUG-242 — `set_vscr_sat(false)` can non-stickily clear SAT from arithmetic handlers
+
+- **Severity**: LOW (API hazard)
+- **Status**: open
+- **Location**: `context.rs:252–259`
+- **Symptom**: `set_vscr_sat(bool)` accepts `false`, which would clear the sticky SAT bit. All
+  current arithmetic callers pass `true` only (inside `if sat { ... }` guards), so no mis-clear
+  occurs today. But the API is misleading — a future saturating handler that writes
+  `set_vscr_sat(lane_sat)` with `lane_sat = false` would silently clear a previously-set bit.
+- **Fix**: Rename to `sticky_set_vscr_sat()` (no bool argument, always ORs). Retain
+  `force_vscr_sat(bool)` for `mtvscr`.
+
+### PPCBUG-243 — `vmx.rs` saturation helpers: u16/i16/u32/i32 variants have zero unit tests
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `crates/xenia-cpu/src/vmx.rs:705–799`
+- **Symptom**: `vmx.rs` tests cover 5 cases of `sat_add/sub_i8/u8`. The 8 helpers for wider
+  types (`sat_add_u16`, `sat_sub_u16`, `sat_add_i16`, `sat_sub_i16`, `sat_add_u32`, `sat_sub_u32`,
+  `sat_add_i32`, `sat_sub_i32`) are mathematically correct but unguarded by any test. Recommended
+  additions listed in the per-group report.
+
+IDs PPCBUG-244 through PPCBUG-274 are unallocated — no further bugs found in group 32.
+
+---
+
+## Batch 7 — VMX integer compare / min / max / avg (group 33)
+
+Per-group report: `audit-out/group-33-vmx-int-compare.md`.
+
+### PPCBUG-275 — All VC-form vector compare dot forms: `rc_bit()` reads wrong bit; CR6 never updated
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Affected opcodes**: `vcmpequb.`, `vcmpequh.`, `vcmpgtsb.`, `vcmpgtsh.`, `vcmpgtub.`, `vcmpgtuh.`
+- **Location**: `decoder.rs:75` + `interpreter.rs:3318`, `3331`, `3344`, `3357`, `3370`, `3383`
+- **Symptom**: `rc_bit()` is implemented as `self.raw & 1 != 0` (reads LSB = bit 0 of the word).
+  For VC-form instructions the Rc flag is at **PPC bit 21 = LSB bit 10**, not bit 0. Bit 0 is
+  the LSB of the 10-bit XO field. All integer compare XO values are even (XO=6, 70, 518, 774, 582, 838),
+  so their bit 0 is always 0. The CR6 update block is **unconditionally dead** regardless of
+  whether the programmer wrote the dot form. `vcmpequb. vMask, vData, vNeedle` + `bc 12,26`
+  (branch on CR6.LT = all-true) is the canonical AltiVec memchr idiom; it will always fall through.
+- **Fix**:
+  ```rust
+  // decoder.rs — add:
+  /// Rc bit for VC-form vector compare instructions (PPC bit 21 = LSB bit 10).
+  #[inline] pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }
+  ```
+  Replace `instr.rc_bit()` with `instr.vc_rc_bit()` at interpreter.rs:3318, 3331, 3344, 3357,
+  3370, 3383.
+
+### PPCBUG-276 — `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`: same VC-form Rc bug
+
+- **Severity**: MEDIUM
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Affected opcodes**: `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`
+- **Location**: `interpreter.rs:2237`, `3396`, `3406`
+- **Symptom**: Same root cause as PPCBUG-275. XO for vcmpequw=134, vcmpgtuw=646, vcmpgtsw=902 —
+  all even, bit 0 always 0. Word-compare dot forms never update CR6. `vcmpequw128` uses the
+  VMX128_R Rc encoding which also likely reads the wrong bit.
+- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:2237, 3396, 3406. Separately verify
+  VMX128_R Rc bit position for `vcmpequw128` (may require its own extractor).
+
+### PPCBUG-277 — Zero tests for all `vcmp*` dot forms and CR6 correctness
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` `#[cfg(test)]` module
+- **Symptom**: No test exercises any of the 10 integer vector compare opcodes. Critical missing:
+  `vcmpequb.` all-true → CR6.LT=1; `vcmpequb.` all-false → CR6.EQ=1; `vcmpgtsb` signed
+  boundary (0x80 vs 0x7F must yield false, not true); `vcmpgtsh` at 0x8000 vs 0x7FFF.
+
+### PPCBUG-278 — Zero tests for all 12 `vmax*` / `vmin*` opcodes
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` `#[cfg(test)]` module
+- **Symptom**: None of vmaxub/uh/uw/sb/sh/sw, vminub/uh/uw/sb/sh/sw are tested. Critical missing:
+  `vmaxsb(0x80, 0x7F)` = 0x7F (signed max of -128 and +127); `vminsb(0x80, 0x7F)` = 0x80.
+  Without these, signed vs unsigned confusion in min/max would not be caught.
+
+### PPCBUG-279 — Zero tests for all 6 `vavg*` opcodes; no signed-boundary or rounding coverage
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` `#[cfg(test)]` module; `vmx.rs` test module
+- **Symptom**: `avg_u8` through `avg_i32` helpers have no unit tests. Key rounding case:
+  `avg_u8(0, 1)` must be 1 (round up), not 0 (truncation). `avg_i32(i32::MIN, i32::MIN)` must
+  be `i32::MIN` without overflow.
+
+IDs PPCBUG-280 through PPCBUG-314 are unallocated — no further bugs found in group 33.
+
+---
+
+## Batch 6 — VMX integer logical / shift / rotate / select (group 34)
+
+Per-group report: `audit-out/group-34-vmx-logic-shift.md`.
+
+Group 34 summary: the bitwise logical ops (vand/vandc/vor/vxor/vnor and their 128 variants)
+are all ISA-correct — Vec128 is `[u8; 16]` with no padding bits, so `!(u32)` flips exactly
+32 bits per lane with no upper-bit pollution (the PPCBUG-029/030/031 class does not apply to
+VMX register files). The per-lane shifts (vslb/vsrb/vsrab, vslh/vsrh/vsrah, vslw/vsrw/vsraw
+and their 128 variants) all correctly mask the shift count to the lane width before shifting;
+vsraw uses i32 arithmetic right shift which is correctly defined in Rust for shift-by-31.
+The per-lane rotates (vrlb/vrlh/vrlw and 128 variants) are correct. The whole-register bit
+shifts (vsl/vsr) and whole-register byte shifts (vslo/vsro and 128 variants) correctly
+extract the shift count from VB.b[15] with the proper bit masks. vsel and vsel128 are correct
+including the read-before-write ordering on vsel128's vc=vd aliasing.
+
+**One HIGH bug found**: vrlimi128 extracts both the rotate-amount (z) field and the
+blend-mask (IMM) field from the wrong bit positions of the instruction word.
+
+**0 MEDIUM bugs with code change needed. 1 HIGH. 10 LOW (test gaps and informational).**
+
+### PPCBUG-315 — vrlimi128 z and IMM fields extracted from wrong bit positions
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: interpreter.rs:3551–3552
+- **Symptom**: `shift = ((instr.raw >> 16) & 0x3)` reads integer bits 16–17 — the low 2 bits
+  of the 5-bit IMM (blend-mask) field — instead of the 2-bit `z` (rotate) field at integer
+  bits 6–7. `mask = (instr.raw >> 2) & 0xF` reads integer bits 2–5 — VD128h extension bits
+  and a reserved field — instead of the low 4 bits of IMM at integer bits 16–19.
+  **Every `vrlimi128` executes with a wrong rotate amount and a wrong per-word select mask.**
+  The only benign case is the degenerate encoding where `z == IMM[1:0]` and the garbage mask
+  happens to equal the intended mask — unlikely in real code.
+- **VX128_4 field layout** (LSB-0 integer bit numbering after PPC big-endian byte-swap to host):
+  - `VD128l : 5` at integer bits 21–25 (PPC bits 6–10)
+  - `IMM : 5` at integer bits 16–20 (PPC bits 11–15) — blend mask, 4 bits used
+  - `VB128l : 5` at integer bits 11–15 (PPC bits 16–20)
+  - `z : 2` at integer bits 6–7 (PPC bits 24–25) — rotate amount 0..3
+  - `VD128h : 2` at integer bits 2–3 (PPC bits 28–29)
+- **Fix**:
+  ```rust
+  let shift = ((instr.raw >> 6) & 0x3) as usize;  // z field: integer bits 6-7
+  let mask  = (instr.raw >> 16) & 0xF;             // IMM low 4 bits: integer bits 16-19
+  ```
+- **Canary reference**: `ppc_decode_data.h:585–608` `FormatVX128_4`; `ppc_emit_altivec.cc:1318,1324`.
+- **Note**: the rotate logic (`b[(shift + i) % 4]`) and mask-select logic (`(mask >> (3-i)) & 1`)
+  in the interpreter body are ISA-correct — only the field extraction is wrong.
+- **Test gap**: no interpreter execution test for vrlimi128 (PPCBUG-325).
+
+### PPCBUG-316 — Zero interpreter execution tests for vslb/vsrb/vsrab (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs:3440–3463
+
+### PPCBUG-317 — Zero interpreter execution tests for vslh/vsrh/vsrah (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs:3472–3503
+
+### PPCBUG-318 — vslo/vsro byte-shift count max is 15 (correct; informational)
+
+- **Severity**: LOW (informational / wontfix)
+- **Status**: wontfix
+- `N` is a 4-bit field; max shift is 15 bytes = 120 bits (not 128). VD retains
+  the 8 LSBs of VA in position [127:120] at N=15. ISA-correct.
+
+### PPCBUG-319 — vsel128 vc=vd read-before-write ordering (correct; informational)
+
+- **Severity**: LOW (informational / wontfix)
+- **Status**: wontfix
+- `c = ctx.vr[vc]` is read before `ctx.vr[vd] = result`. Correctly sequenced.
+
+### PPCBUG-320 — Zero interpreter execution tests for vslw/vsrw/vsraw/vrlw (+128 variants)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs:2108–2155
+
+### PPCBUG-321 — Zero interpreter execution tests for vsl/vsr
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs:3508–3521
+
+### PPCBUG-322 — Zero interpreter execution tests for vslo/vsro (+128 variants)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs:3523–3541
+
+### PPCBUG-323 — Zero interpreter execution tests for vand/vandc/vor/vxor/vnor (+128 variants)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs:1900–1944
+
+### PPCBUG-324 — Zero interpreter execution tests for vsel/vsel128
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs:1945–1967
+
+### PPCBUG-325 — Zero interpreter execution tests for vrlb/vrlh/vrlw/vrlimi128 (+128 variants)
+
+- **Severity**: LOW (test gap; fix PPCBUG-315 before writing vrlimi128 tests)
+- **Status**: open
+- **Location**: interpreter.rs:3464–3503, 2144–2155, 3550–3565
+
+IDs PPCBUG-326 through PPCBUG-354 are unallocated — no further bugs found in group 34.
+
+---
+
+## Batch 8 — VMX permute / merge / splat / pack / unpack (group 35)
+
+Per-group report: `audit-out/group-35-vmx-permute.md`.
+
+**Summary**: 5 HIGH, 3 MEDIUM, 9 LOW. Four VX128_* field-extraction bugs; one missing post-pack permutation logic; VSCR.SAT and pack saturation semantics are all correct. Zero interpreter tests for any group-35 opcode.
+
+### PPCBUG-360 — vperm128: VC register read from wrong field (vd128() instead of VX128_2 VC bits 23-25)
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `interpreter.rs:1979`
+- **Symptom**: `vperm128` uses the VX128_2 instruction form. The permute-control register VC is a 3-bit field at PPC bits 23-25 (LSB integer bits 6-8). The code does `vc = instr.vd128()` which reads PPC bits 6-10 + 21-22 — a completely different set of bits. Every `vperm128` therefore permutes with a control vector read from the wrong register, producing garbage output. `vperm128` is one of the most-used VMX128 ops in Xbox 360 graphics code (texture/vertex data layout).
+- **Fix**:
+  ```rust
+  // decoder.rs — add accessor:
+  #[inline] pub fn vc128_2(&self) -> usize { ((self.raw >> 6) & 0x7) as usize }
+  // interpreter.rs:1979 — replace:
+  vc = instr.vc128_2(); // VX128_2 VC field at PPC bits 23-25
+  ```
+- **ISA ref**: `ppc-manual/vmx/vperm.md`, VX128_2 encoding; `ppc_decode_data.h:541-561`; `ppc_emit_altivec.cc:1203-1204` (`VX128_2_VC`).
+
+### PPCBUG-361 — vsldoi128: SH field MSB reads bit 4 (reserved) instead of bit 9
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `interpreter.rs:2012`
+- **Symptom**: VX128_5 SH is a 4-bit field at LSB integer bits 6-9. Code does `((raw >> 6) & 0x7) | (((raw >> 4) & 0x1) << 3)`. This reads bit 4 (a reserved field, always 0 in valid encodings) as the MSB of SH instead of bit 9. Shifts of 8-15 bytes silently resolve as shifts of 0-7 bytes. `vsldoi128` with `SH >= 8` (common in vector rotation patterns) always produces the wrong result.
+- **Fix**:
+  ```rust
+  let sh = ((instr.raw >> 6) & 0xF) as usize; // SH field: integer bits 6-9
+  ```
+- **ISA ref**: `ppc-manual/vmx/vsldoi.md`, VX128_5 encoding; `ppc_decode_data.h:609-634`; canary `VX128_5_SH`.
+
+### PPCBUG-362 — vpermwi128: PERMh (high 3 bits of 8-bit PERM immediate) read from VD128l bits instead of bits 6-8
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `interpreter.rs:4089`
+- **Symptom**: VX128_P PERM = `PERMl[4:0] | (PERMh[2:0] << 5)` where PERMl is at integer bits 16-20 and PERMh is at integer bits 6-8. Code does `(raw >> 16) & 0xFF` which reads bits 16-23. Bits 21-23 are VD128l[4:2], not PERMh. The top 3 bits of the 8-bit PERM immediate are wrong; output word lane selections for lanes 0 and 1 are controlled by garbage bits. Same pattern as PPCBUG-315.
+- **Fix**:
+  ```rust
+  let imm = ((instr.raw >> 16) & 0x1F) | (((instr.raw >> 6) & 0x7) << 5); // VX128_P PERM
+  ```
+- **ISA ref**: `ppc_decode_data.h:664-686`; `ppc_emit_altivec.cc:1214`.
+
+### PPCBUG-363 — vpkd3d128: post-pack permutation (pack + z fields) entirely absent; output always placed in wrong lane when pack != 0
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `interpreter.rs:3783-3808`
+- **Symptom**: Canary's `vpkd3d128` does three things: (1) pack VB by type, (2) permute the result with the existing VD register using a control determined by `pack` (IMM[1:0]) and `shift` (z field at integer bits 6-7), (3) store. Xenia-rs does only (1) and (3), skipping the entire lane-placement permutation. When `pack != 0`, the packed value must be merged into a specific 32-bit or 64-bit slot of VD — this merge never happens. `pack=0` is the only safe case. Most D3D vertex pack sequences use `pack=1` (32-bit slot) with varying `shift`.
+- **Fix**: Extract `pack = uimm & 3` and `shift = (instr.raw >> 6) & 3` (z field), read existing `ctx.vr[vd]`, apply the permutation table from `ppc_emit_altivec.cc:2125-2188`, write back.
+- **ISA ref**: `ppc_emit_altivec.cc:2088-2191`.
+
+### PPCBUG-364 — vsldoi (non-128): correct; PPCBUG-365 — vsplt*: correct; informational
+
+- **Severity**: LOW (wontfix)
+- **Status**: wontfix
+- `vsldoi` correctly extracts SH as `(raw >> 6) & 0xF`. `vspltb/vsplth/vspltw` correctly read UIMM from the VA position (integer bits 16-20, masked to lane width). No bugs.
+
+### PPCBUG-366 — vspltisb / vspltish: sign-extension idiom is correct but non-obvious; future regression risk
+
+- **Severity**: MEDIUM
+- **Status**: open (clarity fix recommended)
+- **Location**: `interpreter.rs:2059-2060`, `2064-2066`
+- **Symptom**: `simm | !0x1F` where `simm` is typed `i8`/`i16` is functionally correct (Rust narrows `!0x1F` to the target type), but the pattern is fragile under refactoring. Recommend:
+  ```rust
+  let simm = (((instr.raw >> 16) & 0x1F) as i32).wrapping_shl(27).wrapping_shr(27) as i8;
+  ```
+
+### PPCBUG-367 — vupkhpx / vupklpx: channel replication vs zero-extend divergence; canary is unimplemented
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `vmx.rs:318-330`
+- **Symptom**: `unpack_pixel_555` replicates 5-bit RGB channels (`r << 3 | r >> 2`) to fill 8 bits. ISA specifies zero-extension into bits 7:3, leaving bits 2:0 as zero. The replicate approach produces slightly different values (and slightly higher values), diverging from hardware.
+- **Fix**: `let r8 = r << 3;` (drop the `| r >> 2` replication term).
+
+### PPCBUG-368 — vpkpx: pack_pixel_555 channel assignment unverified against hardware; canary comparison inconclusive
+
+- **Severity**: MEDIUM
+- **Status**: open (needs hardware trace or more detailed canary analysis)
+- **Location**: `vmx.rs:310-316`
+- **Symptom**: The xenia-rs layout comment says R=bits 8-15, G=16-23, B=24-31. Canary's `vkpkx_in_low` uses different shift amounts (`>> 9` for R, `>> 6` for G, `>> 3` for B), suggesting either a different input layout assumption or the channels are swapped. Without a hardware reference, cannot determine which is authoritative.
+
+### PPCBUG-369 — vpkd3d128 z-field not extracted (sub-issue of PPCBUG-363)
+
+- **Severity**: LOW (tracked under PPCBUG-363)
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `interpreter.rs:3785`
+- The `z` field (VX128_4, integer bits 6-7) is never extracted. Correct extraction: `(instr.raw >> 6) & 0x3`.
+
+### PPCBUG-370 — Zero interpreter tests for vperm / vperm128 (test gap)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:1970-1995`
+
+### PPCBUG-371 — Zero interpreter tests for vsldoi / vsldoi128 (test gap)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:1997-2020`
+
+### PPCBUG-372 — Zero interpreter tests for vpermwi128 (test gap)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:4087-4099`
+
+### PPCBUG-373 — Zero interpreter tests for vmrghb / vmrglb / vmrghh / vmrglh (test gap)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:3570-3600`
+
+### PPCBUG-374 — Zero interpreter tests for vspltb / vsplth / vspltw / vspltw128 (test gap)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:2022-2048`
+
+### PPCBUG-375 — Zero interpreter tests for vspltisb / vspltish / vspltisw / vspltisw128 (test gap)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:2050-2068`
+
+### PPCBUG-376 — Zero interpreter tests for all vpk* (16 ops) + VSCR.SAT coverage (test gap)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:3607-3718`
+
+### PPCBUG-377 — Zero interpreter tests for vupkhsb / vupklsb / vupkhsh / vupklsh (test gap)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:3722-3754`
+
+### PPCBUG-378 — Zero interpreter tests for vpkd3d128 / vupkd3d128 (test gap; blocked on PPCBUG-363)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `interpreter.rs:3783-3835`
+
+IDs PPCBUG-379 through PPCBUG-419 are unallocated — no further bugs found in group 35.
+
+---
+
+## Batch 9 — VMX float arithmetic / compare / convert / estimate (group 36)
+
+Per-group report: `audit-out/group-36-vmx-float.md`.
+
+Group 36 summary: **21 findings (PPCBUG-420..440). 6 HIGH, 8 MEDIUM, 7 LOW.** The most
+critical bugs are: (1) four VMX float compare VC-form opcodes use `rc_bit()` (bit 0) instead
+of the correct VC-form Rc bit (bit 10) — CR6 is never updated, same root cause as PPCBUG-275;
+(2) vmaddfp128 and vmaddcfp128 have their multiplicand and accumulator operands swapped —
+every matrix multiply / Newton-Raphson step using these opcodes produces the wrong result;
+(3) VMX128_R dot-form compares (vcmpeqfp128. etc.) decode as Invalid due to missing key4
+entries in decode_op6.
+
+**6 HIGH, 8 MEDIUM, 7 LOW. 21 IDs used (PPCBUG-420..440). 39 IDs unallocated (PPCBUG-441..479).**
+
+### PPCBUG-420 — vcmpeqfp / vcmpgefp / vcmpgtfp: `rc_bit()` reads wrong bit; CR6 never updated
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Affected opcodes**: `vcmpeqfp.`, `vcmpgefp.`, `vcmpgtfp.`
+- **Location**: `interpreter.rs:1875`, `1885`, `1895`
+- **Symptom**: `rc_bit()` = `self.raw & 1` reads LSB bit 0. For VC-form the Rc flag is at
+  PPC bit 21 = LSB bit 10. All XO values (vcmpeqfp=198, vcmpgefp=454, vcmpgtfp=710) have
+  bit 0 = 0, so CR6 is never updated for any float compare dot form. `vcmpeqfp.` + `bc 12,24`
+  (branch all-equal) always falls through.
+- **Cross-reference**: PPCBUG-275 (identical root cause for integer vcmp). Canary reads
+  `i.VXR.Rc` (ppc_emit_altivec.cc:625, 633, 641).
+- **Fix**: Add `pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }` to
+  `decoder.rs` and replace `instr.rc_bit()` at interpreter.rs:1875, 1885, 1895.
+
+### PPCBUG-421 — vcmpbfp: `rc_bit()` reads wrong bit (VC-form); Rc gate permanently dead
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `interpreter.rs:3428`
+- **Symptom**: Same root cause as PPCBUG-420. XO=966, bit 0 = 0; CR6 update never fires
+  for `vcmpbfp.`. The CR6 value logic (`eq = !any_out`) is correct; only the gate is wrong.
+- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:3428.
+
+### PPCBUG-422 — vcmpeqfp128 / vcmpgefp128 / vcmpgtfp128 / vcmpbfp128: `rc_bit()` reads wrong bit (VX128_R-form)
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `interpreter.rs:1875`, `1885`, `1895`, `3428` (shared arms with non-128 forms)
+- **Symptom**: For VX128_R-form, Rc is at PPC bit 27 = LSB bit 4 (confirmed from canary's
+  `VX128_R` bitfield: `uint32_t Rc : 1` at bit 4 from LSB). `rc_bit()` reads bit 0. Fix
+  PPCBUG-423 first (dot forms decode as Invalid before this even matters).
+- **Fix**: Add `pub fn vx128r_rc_bit(&self) -> bool { (self.raw >> 4) & 1 != 0 }` and use
+  it in the VX128_R compare arms.
+
+### PPCBUG-423 — vcmpeqfp128. / vcmpgefp128. / vcmpgtfp128. / vcmpbfp128.: dot forms decode as `Invalid`
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `decoder.rs:640-648` (decode_op6 VMX128 compare key4 table)
+- **Symptom**: decode_op6 extracts `key4 = (bits22-24 << 3) | bit27`. When Rc=1, PPC bit 27
+  is set, making key4 = non-dot value + 1. Dot-form key4 values (1, 9, 17, 25, 33) are all
+  absent from the match table. Decoder returns `PpcOpcode::Invalid`. Any game shader using a
+  VMX128-form float compare dot form traps with unimplemented opcode.
+- **Fix**: Add dot-form entries to the key4 match table mapping to the same opcodes (the
+  interpreter arm uses `instr.vx128r_rc_bit()` to conditionally update CR6):
+  ```rust
+  0b000001 => return PpcOpcode::vcmpeqfp128,
+  0b001001 => return PpcOpcode::vcmpgefp128,
+  0b010001 => return PpcOpcode::vcmpgtfp128,
+  0b011001 => return PpcOpcode::vcmpbfp128,
+  0b100001 => return PpcOpcode::vcmpequw128,
+  ```
+
+### PPCBUG-424 — vmaddfp128: operand swap — computes VA×VB+VD instead of VA×VD+VB
+
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: `interpreter.rs:1771` (`r[i] = ai.mul_add(bi, di)`)
+- **Symptom**: Canary (ppc_emit_altivec.cc:806-809) documents `(VD) <- (VA × VD) + VB` and
+  routes as `MulAdd(VA, VD, VB)`. Xenia-rs reads VA, VB, VD then computes
+  `ai.mul_add(bi, di)` = `VA × VB + VD` — VB and VD roles swapped. Every shader using
+  vmaddfp128 for matrix multiply or Newton-Raphson accumulation accumulates the wrong value.
+  The existing denorm-flush test aliases vA=vD=v2, making the swap invisible.
+- **Fix**: `r[i] = ai.mul_add(di, bi);`
+
+### PPCBUG-425 — vmaddcfp128: operand swap — computes VD×VB+VA instead of VA×VD+VB
+
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: `interpreter.rs:4065` (`r[i] = di.mul_add(bi, ai)`)
+- **Symptom**: Canary (ppc_emit_altivec.cc:819) documents `(VD) <- (VA × VD) + VB`.
+  Xenia-rs computes `VD × VB + VA`. Both the first multiplicand and the addend are wrong.
+- **Fix**: `r[i] = ai.mul_add(di, bi);`
+- **Test gap**: zero tests for `vmaddcfp128`. Add test with distinct VA, VB, VD registers.
+
+### PPCBUG-426 — vnmsubfp: two rounding steps instead of fused FMA; NaN sign may be flipped
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:1786` (`r[i] = bi - ai * ci`)
+- **Symptom**: `vmaddfp` uses single-rounded `ai.mul_add(ci, bi)`, but `vnmsubfp` uses
+  `bi - ai * ci` (two operations, two rounding steps). ISA specifies a single fused operation.
+  Canary acknowledges the same limitation (ppc_emit_altivec.cc:1136). Additionally, the
+  implicit negation in subtraction may flip the sign bit of a NaN result (see PPCBUG-183).
+- **Fix**: `r[i] = -ai.mul_add(ci, -bi);` — single FMA rounding: `-(ai*ci + (-bi))` = `bi - ai*ci`.
+
+### PPCBUG-427 — vnmsubfp128: same two-rounding form as vnmsubfp
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:1803` (`r[i] = di - ai * bi`)
+- **Symptom**: Same class as PPCBUG-426 for the VMX128 form.
+- **Fix**: `r[i] = -ai.mul_add(bi, -di);`
+
+### PPCBUG-428 — vrefp / vrefp128: full-precision 1/x instead of ~12-bit hardware estimate
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:1853` (`r[i] = 1.0 / b[i]`)
+- **Symptom**: Same class as PPCBUG-184 (fresx). Xenon vrefp provides ~12-bit accuracy;
+  xenia-rs computes full IEEE-754 division. Canary also uses full precision in practice.
+
+### PPCBUG-429 — vrsqrtefp / vrsqrtefp128: full-precision 1/sqrt(x) instead of ~12-bit estimate
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:1862` (`r[i] = 1.0 / b[i].sqrt()`)
+- **Symptom**: Same class as PPCBUG-428 for reciprocal square root.
+
+### PPCBUG-430 — vexptefp / vexptefp128: full-precision exp2(x) instead of ~12-bit estimate
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:3934` (`r[i] = b[i].exp2()`)
+- **Symptom**: Same class as PPCBUG-428. NaN/Inf edge cases may diverge.
+
+### PPCBUG-431 — vlogefp / vlogefp128: full-precision log2(x) instead of ~12-bit estimate
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:3944` (`r[i] = b[i].log2()`)
+- **Symptom**: Same class as PPCBUG-428.
+
+### PPCBUG-432 — vrfin / vrfin128: Rust `round()` is round-half-away-from-zero; ISA requires round-to-nearest-even
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:2172` (`r[i] = b[i].round()`)
+- **Symptom**: `vrfin(0.5)` → ISA = 0.0; Rust = 1.0. `vrfin(2.5)` → ISA = 2.0; Rust = 3.0.
+  Canary uses SSE2 `ROUNDPS` which is round-to-nearest-even.
+- **Fix**: Use `f32::round_ties_even()` (stable since Rust 1.77).
+
+### PPCBUG-433 — vctsxs / vcfpsxws128: NaN input returns 0 instead of saturating to INT_MIN (0x80000000)
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `vmx.rs:217` (`if x.is_nan() { return (0, true); }`)
+- **Symptom**: AltiVec ISA: NaN in vctsxs saturates to INT_MIN (0x80000000). Xenia-rs returns 0.
+- **Fix**: `if x.is_nan() { return (i32::MIN, true); }`
+
+### PPCBUG-434 — vctuxs NaN → 0 is correct; informational
+
+- **Severity**: LOW (wontfix)
+- **Status**: wontfix
+- **Location**: `vmx.rs:225`
+- **Note**: Unsigned NaN saturates to 0 per ISA. Xenia-rs is correct. Add a comment.
+
+### PPCBUG-435 — vaddfp / vsubfp / vmulfp128: subnormal inputs not flushed when VSCR.NJ=1
+
+- **Severity**: MEDIUM (latent — Xbox 360 always boots with NJ=1)
+- **Status**: open
+- **Location**: `interpreter.rs:1713`, `1729`, `1812`
+- **Symptom**: VSCR.NJ=1 requires flush-to-zero for subnormal inputs. vmaddfp family correctly
+  calls `vmx::flush_denorm()`; plain add/sub/mul do not check VSCR.
+
+### PPCBUG-436 — vmsum3fp128 / vmsum4fp128: per-product intermediates not individually flushed
+
+- **Severity**: MEDIUM (latent)
+- **Status**: open
+- **Location**: `interpreter.rs:4076`, `4083`
+- **Symptom**: `flush_denorm` on final sum only. Per-lane products can be subnormal and
+  accumulate before the final flush.
+
+### PPCBUG-437 — vmaddfp / vmaddfp128 / vmaddcfp128 / vnmsubfp128: subnormal output not flushed
+
+- **Severity**: MEDIUM (latent)
+- **Status**: open
+- **Location**: `interpreter.rs:1752–1754`, `1771–1773`, `4064–4067`, `1803–1805`
+- **Symptom**: VSCR.NJ=1 requires flushing subnormal results. Inputs flushed; outputs are not.
+
+### PPCBUG-438 — Zero tests for vcmpeqfp / vcmpgefp / vcmpgtfp / vcmpbfp and dot forms
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` test module
+
+### PPCBUG-439 — Zero tests for vrfiz / vrfin / vrfip / vrfim and 128-bit variants
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs:2158–2192`
+
+### PPCBUG-440 — Zero tests for vctsxs / vctuxs / vcfsx / vcfux and 128-bit variants
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs:3842–3923`
+
+IDs PPCBUG-441 through PPCBUG-479 are unallocated — no further bugs found in group 36.
+
+---
+
+## Batch 8 — VMX integer multiply-sum / multiply-half / sums / special (group 37)
+
+Per-group report: `audit-out/group-37-vmx-mulsum.md`.
+
+**Note**: All opcodes in this group are `XEINSTRNOTIMPLEMENTED()` stubs in xenia-canary; correctness is derived from the IBM ISA and `ppc-manual/vmx/` snapshots. `vrlimi128` is already tracked as PPCBUG-315.
+
+### PPCBUG-482 — `vmhaddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)
+
+- **Severity**: WITHDRAWN
+- **Status**: no bug
+- **Note**: Draft analysis suggested >>16; the spec snapshot `ppc-manual/vmx/vmhaddshs.md`
+  explicitly shows `prod = (VA[i]*VB[i]) >> 15` and the pathological-case example confirms
+  `0x8000*0x8000 >> 15 = 32768`. Xenia-rs matches the spec exactly. No code change.
+
+### PPCBUG-483 — `vmhraddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)
+
+- **Severity**: WITHDRAWN
+- **Status**: no bug
+- **Note**: `ppc-manual/vmx/vmhraddshs.md` explicitly shows `(product + 0x4000) >> 15`.
+  Xenia-rs matches. No code change needed.
+
+### PPCBUG-487 — vsumsws/vsum2sws/vsum4sbs/vsum4ubs/vsum4shs: VB operand mis-named as "c"/"VC"
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `interpreter.rs:3249-3307`
+- **Symptom**: All five vsum* handlers use a VX-form instruction (two operands: VA and VB).
+  The code names the VB source `c` and the comment references "vC" — implying a non-existent
+  third register operand. Only `instr.ra()` and `instr.rb()` are valid for VX form; there is
+  no `rc()`. The arithmetic is correct (rb() is called), but the naming misleads maintainers
+  into thinking there is a VA-form three-operand encoding.
+- **Fix**: Rename `c` → `b` and update comments to say `VB` instead of `vC` in all five
+  handler bodies.
+
+### PPCBUG-490 — Zero tests for all six vmsum* opcodes
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` `#[cfg(test)]` section
+- **Symptom**: No unit test for `vmsumubm`, `vmsummbm`, `vmsumuhm`, `vmsumuhs`, `vmsumshm`,
+  `vmsumshs`. Critical missing: saturation + VSCR.SAT for `vmsumuhs`/`vmsumshs`; mixed-sign
+  byte product for `vmsummbm`; modulo wrap for `vmsumshm`.
+
+### PPCBUG-491 — Zero tests for `vmhaddshs` and `vmhraddshs`
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` `#[cfg(test)]` section
+- **Symptom**: No test for either multiply-high-add instruction. Key cases: `VA = 0x8000`,
+  `VB = 0x8000` (minus-one-times-minus-one saturating case); `VA = VB = 0x7FFF, VC = 0x7FFF`
+  (add post-shift result to max accumulator). Verify VSCR.SAT is set on saturation and clear
+  on non-saturating inputs.
+
+### PPCBUG-492 — Zero tests for `vmladduhm`
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` `#[cfg(test)]` section
+
+### PPCBUG-493 — Zero tests for all eight `vmule*` / `vmulo*` opcodes
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` `#[cfg(test)]` section
+- **Symptom**: No test for `vmuleub`, `vmuloub`, `vmulesb`, `vmulosb`, `vmuleuh`, `vmulouh`,
+  `vmulesh`, `vmulosh`. Key: even vs odd lane distinction (`vmulesh` vs `vmulosh`) is untested.
+
+### PPCBUG-494 — Zero tests for all five vsum* opcodes
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `interpreter.rs` `#[cfg(test)]` section
+- **Symptom**: No test for `vsumsws`, `vsum2sws`, `vsum4sbs`, `vsum4ubs`, `vsum4shs`.
+  Missing: zero-output-lanes verification for `vsumsws` (w[0..2] must be 0) and `vsum2sws`
+  (w[0], w[2] must be 0); VSCR.SAT on overflow for all signed/unsigned variants.
+
+### PPCBUG-495 — `vsumsws` comment says "vC[3]" should say "VB[3]"
+
+- **Severity**: LOW (cosmetic)
+- **Status**: open
+- **Location**: `interpreter.rs:3248`
+
+IDs PPCBUG-480, PPCBUG-481, PPCBUG-482 (withdrawn), PPCBUG-483 (withdrawn), PPCBUG-484,
+PPCBUG-485, PPCBUG-486, PPCBUG-488, PPCBUG-489, PPCBUG-496, PPCBUG-497, PPCBUG-498 are
+either withdrawn (no bug found after re-examination), informational, or references to
+existing IDs. IDs PPCBUG-499 through PPCBUG-509 are unallocated — no further bugs found
+in group 37.
+
+---
+
+## Batch 8 — VMX load/store (group 38)
+
+Per-group report: `audit-out/group-38-vmx-loadstore.md`.
+
+**Opcodes**: lvebx, lvehx, lvewx, lvewx128, lvlx, lvlx128, lvlxl, lvlxl128, lvrx, lvrx128,
+lvrxl, lvrxl128, lvsl, lvsl128, lvsr, lvsr128, lvx, lvx128, lvxl, lvxl128, stvebx, stvehx,
+stvewx, stvewx128, stvlx, stvlx128, stvlxl, stvlxl128, stvrx, stvrx128, stvrxl, stvrxl128,
+stvx, stvx128, stvxl, stvxl128.
+
+Group 38 summary: The load family (lvx, lvxl, lvlx, lvrx, lvsl, lvsr, lvebx, lvehx, lvewx,
+lvewx128 and all 128/LRU-hint variants) is arithmetically correct. EA computation, alignment
+masking, big-endian byte ordering, RA=0 special cases, and lane indexing all match the ISA and
+the `ea_indexed` helper. **5 HIGH bugs found** — the systemic `invalidate_for_write` gap
+(PPCBUG-107 family) applies to ALL 16 VMX store opcodes, and `stvewx128` has an additional
+severe memory-corruption bug (writes 16 bytes instead of 1 word). **1 MEDIUM** (behavioral
+divergence between lvebx/lvehx/lvewx and canary's full-line simplification — xenia-rs is
+architecturally more correct). **1 MEDIUM** (lvsr sh=0 edge-case correctness, documentation
+gap). **3 LOW** test-coverage gaps.
+
+### PPCBUG-510 — `stvewx128` stores all 16 bytes instead of one word; 12-byte memory corruption (HIGH)
+
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: interpreter.rs:2776-2781
+- **Symptom**: Uses `& !0xF` (16-byte alignment) then stores all 16 bytes of the vector.
+  ISA semantics: word-align EA, extract the word lane `(EA & 0xF) >> 2`, store 4 bytes only.
+  The non-128 `stvewx` (interpreter.rs:1675-1687) is correct — `stvewx128` was not updated
+  to match. Corrupts 12 adjacent bytes on every execution.
+- **Canary reference**: `InstrEmit_stvewx_` (cc:170-185) — `ea & ~3`, extract lane, `ByteSwap`,
+  store 4 bytes only. `stvewx128` routes through the same helper as `stvewx`.
+- **Fix**: mirror the `stvewx` body with `instr.vs128()` substituted for `instr.rs()`.
+
+### PPCBUG-511 — `stvx`, `stvx128`, `stvxl`, `stvxl128` missing `invalidate_for_write` (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Locations**: interpreter.rs:1598-1603 (stvx), 1605-1610 (stvx128), 1699-1705 (stvxl/stvxl128)
+- **Root cause**: PPCBUG-107 (systemic)
+- **Symptom**: Under `--parallel`, a 16-byte stvx to a reserved line does not clear the
+  reservation table slot. The reserving thread's `stwcx.` spuriously succeeds.
+- **Fix**: per PPCBUG-107 pattern — add `invalidate_for_write(ea)` guard before the byte loop.
+
+### PPCBUG-512 — `stvebx`, `stvehx`, `stvewx`, `stvewx128` missing `invalidate_for_write` (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Locations**: interpreter.rs:1655 (stvebx), 1664 (stvehx), 1675 (stvewx), 2776 (stvewx128)
+- **Root cause**: PPCBUG-107 (systemic)
+- **Note**: `stvewx128` must also fix PPCBUG-510 before adding the invalidation call (or the
+  invalidation covers the wrong, over-wide address range).
+
+### PPCBUG-513 — `stvlx`, `stvlx128`, `stvlxl`, `stvlxl128` missing `invalidate_for_write` (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Locations**: interpreter.rs:2746-2749 (stvlx/stvlxl), 2751-2754 (stvlx128/stvlxl128)
+- **Root cause**: PPCBUG-107 (systemic)
+- **Note**: partial stores can span a 128-byte line boundary when `ea & 0xF != 0` and
+  `n = 16 - shift` crosses the line; two `invalidate_for_write` calls may be needed.
+
+### PPCBUG-514 — `stvrx`, `stvrx128`, `stvrxl`, `stvrxl128` missing `invalidate_for_write` (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (ca5b90b, 2026-05-01)
+- **Locations**: interpreter.rs:2756-2759 (stvrx/stvrxl), 2761-2764 (stvrx128/stvrxl128)
+- **Root cause**: PPCBUG-107 (systemic)
+- **Note**: stvrx at shift=0 is a no-op (no bytes written); guard can skip the call in
+  that case. Otherwise invalidate `ea & !0xF` (the preceding aligned block).
+
+### PPCBUG-515 — `lvebx`, `lvehx`, `lvewx` implement element semantics; canary uses full-line load (MEDIUM)
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Locations**: interpreter.rs:1613-1653
+- **Symptom**: xenia-rs places the loaded byte/halfword/word into the correct lane and preserves
+  other lanes from VD (ISA-correct for the "undefined" lanes). Canary does a full aligned
+  16-byte `lvx`-style load that overwrites all lanes.  Both are valid under the ISA's "undefined"
+  specification, but game code compiled against canary may observe the canary behavior. The
+  divergence is documented and no code change is required unless canary compatibility becomes
+  an explicit goal.
+
+### PPCBUG-516 — `lvsr` sh=0 produces {16,17,...,31}; correct per ISA but undocumented (MEDIUM)
+
+- **Severity**: MEDIUM (documentation gap — computation is correct)
+- **Status**: open
+- **Location**: interpreter.rs:2218-2226
+- **Symptom**: When EA is 16-byte aligned, `lvsr` produces byte values all >= 16 (the "select
+  entirely from VB" identity for `vperm`). The formula `(16 - sh) + i` cannot overflow u8
+  because `sh <= 15` guarantees `(16 - sh) + 15 <= 31`. No computation bug — but there is no
+  comment explaining why values > 15 are correct. Add a comment and a `debug_assert!(sh <= 15)`.
+
+### PPCBUG-517 — Zero test coverage for lvlx/lvrx/stvlx/stvrx boundary edge cases (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: vmx.rs tests (lines 756-792); interpreter.rs test module
+- **Missing**: shift=15 for lvlx (1 byte loaded), shift=1 for lvrx (15 bytes), stvlx/stvrx
+  round-trip, stvrx at shift=0 confirmed no-op, full lvlx+lvrx+vor unaligned memcpy idiom
+  verified byte-exact.
+
+### PPCBUG-518 — Zero interpreter-level execution tests for all 36 VMX load/store opcodes (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: interpreter.rs test module
+- **Missing**: lvx alignment masking, stvx byte-order verification, lvebx lane placement,
+  lvsl/lvsr permute index values, lvewx128 after PPCBUG-510 fix. 17 recommended minimum tests
+  enumerated in per-group report.
+
+### PPCBUG-519 — `stvrx` aligned no-op is silent; no debug trace (LOW)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: vmx.rs:284-292 (`store_vector_right`)
+- **Symptom**: shift=0 returns immediately with no trace event. Confusing during memory-
+  visibility debugging. Add `tracing::trace!` in debug builds.
+
+IDs PPCBUG-520 through PPCBUG-559 are unallocated — no further bugs found in group 38.
+
+---
+
+## Phase C1 — Decoder field extractors
+
+Per-group report: `audit-out/phase-c1-decoder-fields.md`.
+
+Comprehensive audit of all `DecodedInstr` field accessors in `decoder.rs` lines 21-165, cross-checked against ISA form specs, Canary `FormatXxx` structs, and the interpreter's inline re-extraction. Phase B already found PPCBUG-040/046/275/315/360-363/420-422. Phase C1 adds 8 new findings (PPCBUG-560..567).
+
+**Confirmed-clean** (no new finding): `op`, `rd`/`rs`/`rt`, `ra`, `rb`, `rc`, `simm16`, `uimm16`, `d`, `ds`, `li`, `bd`, `bo`, `bi`, `aa`, `lk`, `oe`, `to`, `mb`/`me` (M-form only), `sh`, `spr`, `crm`, `crfd`/`crfs`, `l`, `crbd`/`crba`/`crbb`, `nb`, `va128`/`vb128`/`vd128`/`vs128`, `extract_vx128_uimm5`.
+
+### PPCBUG-560 — sh64() test helper wrong bit order; masks PPCBUG-040 from unit tests (HIGH)
+
+- **Severity**: HIGH
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `xenia-rs/crates/xenia-cpu/tests/disasm_goldens.rs:160-176` (function `rldicl`)
+- **Symptom**: The `rldicl` test helper encodes `sh[5:1]` at PPC bits 16-20 and `sh[0]` at PPC bit 30. The ISA encodes `sh[4:0]` at PPC bits 16-20 and `sh[5]` at PPC bit 30. The wrong `sh64()` formula `(sh_lo << 1) | sh_hi` correctly inverts the wrong encoding, making the test pass — but fails on real binary code.
+
+  **Counterexamples** (ISA-encoded input → `sh64()` output):
+
+  | True shift | sh64() result | Error |
+  |-----------|--------------|-------|
+  | 1 | 2 | +1 |
+  | 16 | 32 | +16 |
+  | 32 | 1 | -31 |
+  | 33 | 3 | -30 |
+  | 63 | 63 | 0 (coincidence) |
+
+  Only `sh=0` and `sh=63` decode correctly. All other shifts (1-62) are wrong against real code.
+
+- **Fix for `sh64()`** (per PPCBUG-040):
+  ```rust
+  pub fn sh64(&self) -> u32 {
+      (extract_bits(self.raw, 30, 30) << 5) | extract_bits(self.raw, 16, 20)
+  }
+  ```
+- **Fix for test helper** (must be in same commit):
+  ```rust
+  // Correct: sh_lo = sh & 0x1F → PPC bits 16-20; sh_hi = sh >> 5 → PPC bit 30
+  (30 << 26) | (rs << 21) | (ra << 16) | ((sh & 0x1F) << 11)
+      | (mb_lo << 6) | (mb_hi << 5) | (0 << 2) | ((sh >> 5) << 1) | rc
+  ```
+- **Cross-reference**: PPCBUG-040 (primary finding). PPCBUG-560 is the test-infrastructure companion.
+
+### PPCBUG-561 — Missing `mb_md()` accessor on `DecodedInstr`; interpreter inlines wrong formula at 6 sites (MEDIUM)
+
+- **Severity**: MEDIUM
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `decoder.rs` — accessor absent; `disasm.rs:1256` has correct local helper; `interpreter.rs` lines 696, 706, 716, 726, 736, 746 each inline the wrong formula
+- **Symptom**: Interpreter uses `(instr.mb() << 1) | ((instr.raw >> 1) & 1)` which: (a) reads `SH5` (PPC bit 30, host bit 1) instead of `MB5` (PPC bit 26, host bit 5) as the high bit; (b) places the high bit at position 0 instead of position 5. `disasm.rs` has the correct version already — expose it as `DecodedInstr::mb_md()`.
+- **Cross-reference**: PPCBUG-046 (primary finding).
+
+- **Fix**:
+  ```rust
+  // Add to decoder.rs:
+  #[inline] pub fn mb_md(&self) -> u32 {
+      extract_bits(self.raw, 21, 25) | (extract_bits(self.raw, 26, 26) << 5)
+  }
+  ```
+  Replace all 6 inline sites in `interpreter.rs` with `instr.mb_md()`.
+
+### PPCBUG-562 — Missing `vc_rc_bit()` and `vx128r_rc_bit()` per-form Rc accessors (MEDIUM)
+
+- **Severity**: MEDIUM
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `decoder.rs` — no per-form Rc accessors; `interpreter.rs` uses generic `rc_bit()` (bit 31) for both VC and VX128_R forms
+- **Symptom**: Generic `rc_bit()` reads PPC bit 31 (LSB). VC-form Rc is at PPC bit 21 = `(raw >> 10) & 1`. VX128_R-form Rc is at PPC bit 27 = `(raw >> 4) & 1`. Using bit 31 for these forms means the CR6 update gate is permanently disabled for all dot-form VMX vector compares — root cause of PPCBUG-275/420/421/422.
+- **Fix**:
+  ```rust
+  /// Rc for VC-form vector compare (vcmpeqfp, vcmpgefp, vcmpgtfp, vcmpbfp, etc.) — PPC bit 21.
+  #[inline] pub fn vc_rc_bit(&self) -> bool { extract_bits(self.raw, 21, 21) != 0 }
+  /// Rc for VX128_R-form compare (vcmpeqfp128, vcmpgefp128, etc.) — PPC bit 27.
+  #[inline] pub fn vx128r_rc_bit(&self) -> bool { extract_bits(self.raw, 27, 27) != 0 }
+  ```
+- **Cross-reference**: PPCBUG-275 / PPCBUG-420 / PPCBUG-421 / PPCBUG-422.
+
+### PPCBUG-563 — Missing `vx128_4_z()` and `vx128_4_imm()` for VX128_4 form (MEDIUM)
+
+- **Severity**: MEDIUM
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `decoder.rs` — accessors absent; `interpreter.rs:3551-3552` (vrlimi128) reads wrong bit positions
+- **Symptom**: VX128_4 form has `IMM` (5-bit) at PPC bits 11-15 (host bits 16-20) and `z` (2-bit) at PPC bits 24-25 (host bits 6-7). Interpreter `vrlimi128` uses `(raw >> 16) & 0x3` for shift (reads VB128l partial) and `(raw >> 2) & 0xF` for mask (reads VD128h region).
+- **Fix**:
+  ```rust
+  #[inline] pub fn vx128_4_imm(&self) -> u32 { extract_bits(self.raw, 11, 15) }
+  #[inline] pub fn vx128_4_z(&self) -> u32 { extract_bits(self.raw, 24, 25) }
+  ```
+- **Cross-reference**: PPCBUG-315.
+
+### PPCBUG-564 — Missing `vx128_p_perm()` for VX128_P form; PERMh reads XO bits (MEDIUM)
+
+- **Severity**: MEDIUM
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:4089` (vpermwi128) uses `(raw >> 16) & 0xFF` which reads PERMl (correct) but uses XO/reserved bits 21-23 for PERMh instead of PPC bits 23-25
+- **Symptom**: Top 3 bits of the 8-bit PERM selector are wrong for every `vpermwi128` instruction. Lane selections for words 0 and 1 are garbage.
+- **Fix**:
+  ```rust
+  #[inline] pub fn vx128_p_perm(&self) -> u32 {
+      extract_bits(self.raw, 11, 15) | (extract_bits(self.raw, 23, 25) << 5)
+  }
+  ```
+- **Cross-reference**: PPCBUG-362.
+
+### PPCBUG-565 — Missing `vx128_5_sh()` for VX128_5 form; vsldoi128 MSB reads reserved bit (MEDIUM)
+
+- **Severity**: MEDIUM
+- **Status**: applied (52b05b1, 2026-05-01)
+- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:2012` (vsldoi128) uses `(raw >> 4) & 0x1` for the shift MSB (reads PPC bit 27 = reserved) instead of PPC bit 22 = host bit 9 = `(raw >> 9) & 1`
+- **Symptom**: vsldoi128 shift amounts ≥ 8 (where the 4th bit matters) use a garbage bit. The correct 4-bit SH is at PPC bits 22-25 (host bits 6-9) = `(raw >> 6) & 0xF`.
+- **Fix**:
+  ```rust
+  #[inline] pub fn vx128_5_sh(&self) -> u32 { extract_bits(self.raw, 22, 25) }
+  ```
+- **Cross-reference**: PPCBUG-361.
+
+### PPCBUG-566 — Missing XER TBC field accessor documentation for lswx/stswx (LOW)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `decoder.rs` — XER[25:31] (7-bit transfer byte count) is runtime state, not an instruction field; no accessor exists and no documentation notes the gap
+- **Symptom**: `lswx`/`stswx` use XER[25:31] as their byte count. The interpreter has no way to read this via the normal accessor pattern. Not a bit-position error, but a structural gap.
+- **Recommendation**: add `ctx.xer_tbc() -> u8` to `PpcContext` returning `(ctx.xer() >> 25) & 0x7F`. Document that these are the only instructions that read XER as a count operand.
+
+### PPCBUG-567 — Zero unit tests pin any scalar field accessor (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `decoder.rs` unit tests; `tests/disasm_goldens.rs`
+- **Symptom**: Phase 4 tests pin `va128`/`vb128`/`vd128`/`vs128` only. No test verifies: `sh64()` against ISA-encoded instructions (existing test validates wrong round-trip — PPCBUG-560), `mb_md()` (absent), `vc_rc_bit()`/`vx128r_rc_bit()` (absent), `ds()` for negative displacement, `spr()` for LR/CTR/XER beyond DEC.
+- **Recommended additions**: decoder-level unit tests using ISA-correct encodings for `sh64`, `mb_md`, the two new Rc accessors, `ds` negative, `spr` for LR=8 and CTR=9. See phase-c1-decoder-fields.md for concrete encoding examples.
+
+IDs PPCBUG-568 through PPCBUG-599 are unallocated — no further bugs found in Phase C1 scope.
+
+---
+
+## Phase C2 — Decoder opcode-lookup tables
+
+Per-group report: `audit-out/phase-c2-decoder-lookup.md`.
+
+**Methodology**: complete line-by-line comparison of all `decode_opNN` functions in
+`xenia-rs/crates/xenia-cpu/src/decoder.rs` against
+`xenia-canary/src/xenia/cpu/ppc/ppc_opcode_lookup_gen.cc`, plus cross-reference of
+`ppc-manual/forms/` for VC, VX128_R, VX128_5, VA, VX128_3, VX128_4 forms.
+
+**Overall verdict**: the decoder is structurally sound and entry-by-entry matches
+Canary for all real Xbox 360 instructions, with one pre-known exception (PPCBUG-600 =
+PPCBUG-423). Zero new wrong-entry bugs. One new medium-severity cross-reference bug
+(dot-form gap), one medium maintainability risk (key-ordering dependency), three LOWs
+(test gaps, reserved-encoding misidentification, undocumented fast-path).
+
+### PPCBUG-600 — `decode_op6` key4: VMX128 compare dot-forms decode as Invalid (MEDIUM)
+
+- **Severity**: MEDIUM (cross-reference for PPCBUG-423; same root cause, Phase C2 ID)
+- **Status**: applied (52b05b1, 2026-05-01) (dup-of:423 for the fix; this ID is for Phase C2 tracking)
+- **Location**: `decoder.rs:640-648` (`decode_op6`, key4 match table)
+- **Symptom**: The VX128_R form places `Rc` at PPC bit 27. The key4 formula is
+  `(bits 22-24 << 3) | bit27`. When Rc=1 (dot-form), bit27=1 and key4 is odd.
+  Only even key4 values are in the table. Five dot-form encodings fall through to
+  `PpcOpcode::Invalid`:
+  - `vcmpeqfp128.` → key4=0b000001 (1), decodes as Invalid
+  - `vcmpgefp128.` → key4=0b001001 (9), decodes as Invalid
+  - `vcmpgtfp128.` → key4=0b010001 (17), decodes as Invalid
+  - `vcmpbfp128.`  → key4=0b011001 (25), decodes as Invalid
+  - `vcmpequw128.` → key4=0b100001 (33), decodes as Invalid
+- **Contrast**: standard VMX VC-form compares (op=4 key3) are correct because their
+  Rc bit (bit21) is outside the key3 window (bits 22-31). VMX128_R uses a different
+  form where Rc is at bit27, which is inside the key4 window.
+- **Fix**: Add 5 dot-form entries to key4 in `decode_op6`:
+  ```rust
+  0b000001 => return PpcOpcode::vcmpeqfp128,
+  0b001001 => return PpcOpcode::vcmpgefp128,
+  0b010001 => return PpcOpcode::vcmpgtfp128,
+  0b011001 => return PpcOpcode::vcmpbfp128,
+  0b100001 => return PpcOpcode::vcmpequw128,
+  ```
+  The interpreter's existing `instr.rc_bit()` check already handles CR6 update for
+  dot-forms — decoder just needs to emit the right opcode.
+- **See also**: PPCBUG-423 (Phase B original finding) for impact assessment and
+  full context.
+
+### PPCBUG-601 — `decode_op6` key ordering creates undocumented correctness dependency (MEDIUM)
+
+- **Severity**: MEDIUM (maintainability risk; no current wrong-decode for real code)
+- **Status**: open
+- **Location**: `decoder.rs:603-637` (`decode_op6`, key1/key2/key3 dispatch)
+- **Symptom**: key1 (bits 21-22 << 5 | bits 26-27), key2 (bits 21-23 << 4 | bits 26-27),
+  and key3 (bits 21-27) all overlap. Correctness depends on an implicit invariant:
+  vpkd3d128 and vrlimi128 (matched by key2) always have bits 26-27 = `01`, while all
+  15 key3 unary entries always have bits 26-27 = `11`. If a future instruction were
+  added to key2 with bits 26-27 = `11`, it would shadow a key3 entry. No comment in
+  the source documents this constraint.
+- **Fix**: Add a comment block above the key2/key3 dispatches explaining the invariant:
+  ```
+  // key2 matches bits 26-27 == 01 only (vpkd3d128, vrlimi128).
+  // key3 entries all have bits 26-27 == 11. No overlap is possible
+  // for any currently-defined Xbox 360 instruction.
+  ```
+
+### PPCBUG-602 — `decode_op4` vsldoi128 fallback: over-broad single-bit catch-all (LOW)
+
+- **Severity**: LOW (only fires for reserved/undefined encodings in practice)
+- **Status**: open
+- **Location**: `decoder.rs:558-561`
+- **Symptom**: The VX128_5 form for vsldoi128 is identified by op=4, bit27=1. The
+  dispatch uses a bare `if extract_bits(code, 27, 27) == 1` after the other tables,
+  rather than an exact VX128_5-form check. Reserved VA extended opcodes that happen
+  to have their key4 bit4 (= word bit27) set decode as vsldoi128 instead of Invalid.
+  Example: VA XO=0b100011 (35, reserved gap between vmladduhm=34 and vmsumubm=36)
+  — key4 misses, bit27=1 fires → decoded as vsldoi128. ISA specifies reserved
+  encodings should trap; this silently assigns a meaning.
+- **Fix (optional)**: Strengthen to an exact match:
+  ```rust
+  // VX128_5 form: SH@22-25, VA128h@26, XO=bit27. Bits 28-31 carry VD128h/VB128h.
+  // Only vsldoi128 uses this form. Verify the XO bit and absence of load/store marker.
+  if extract_bits(code, 27, 27) == 1 && extract_bits(code, 30, 31) != 0b11 {
+      return PpcOpcode::vsldoi128;
+  }
+  ```
+  Alternatively, accept current behavior and add a comment.
+
+### PPCBUG-603 — Primary opcode 9 maps to Invalid; correct but undocumented (LOW)
+
+- **Severity**: LOW (test gap / documentation only)
+- **Status**: open
+- **Location**: `decoder.rs:369` (the `_ => PpcOpcode::Invalid` arm of `lookup_opcode`)
+- **Symptom**: Primary opcode 9 (`dozi` in original POWER ISA) is undefined on
+  Xenon/750CL and correctly decodes as Invalid. Canary also returns `PPC_DECODER_MISS`.
+  No comment documents this intentional absence.
+- **Fix**: Add `// 9 = dozi (POWER-only, not present on Xenon)` comment near the
+  match, or explicitly add `9 => PpcOpcode::Invalid` with a comment.
+
+### PPCBUG-604 — Zero decoder unit tests for decode_op5, decode_op6, decode_op30, decode_op63 (LOW)
+
+- **Severity**: LOW (test gap)
+- **Status**: open
+- **Location**: `decoder.rs:897-1107` (test module)
+- **Symptom**: The 10 existing decoder tests cover addi, lwz, branch, stw, ori, and
+  cache mechanics. None exercise VMX128 (op=5, op=6), rotate-doubleword (op=30), or
+  FPU (op=63) opcode paths. In particular, no test would have caught PPCBUG-600
+  (vcmpeqfp128 dot-form decodes as Invalid) before it caused a runtime trap.
+- **Recommended minimum additions** (8 tests):
+  1. `vcmpeqfp128` (Rc=0) → decodes as `vcmpeqfp128`.
+  2. `vcmpeqfp128.` (Rc=1) → decodes as `vcmpeqfp128` (tests PPCBUG-600 fix).
+  3. `vcmpeqfp` (op=4, Rc=0) → key3 check, bit21=0.
+  4. `vcmpeqfp.` (op=4, Rc=1) → key3 check, bit21=1, same decode.
+  5. `vsldoi128` (op=4, bit27=1) → fallback fires.
+  6. `rldicl` (op=30) → decode_op30.
+  7. `fadd` (op=63, Rc=0) → arithmetic table.
+  8. `fadd.` (op=63, Rc=1) → same decode as fadd.
+
+### PPCBUG-605 — `decode_op31` sradix fast-path is correct but undocumented (LOW)
+
+- **Severity**: LOW (documentation gap only)
+- **Status**: open
+- **Location**: `decoder.rs:702-705`
+- **Symptom**: The sradix pre-check uses bits 21-29 (9 bits). The subsequent main
+  table uses bits 21-30 (10 bits). Because no main-table entry has bits 21-29 =
+  0b110011101, the fast-path cannot shadow a legitimate main-table entry. However,
+  this is not documented in the source, and a reader might worry that sradix (Rc=0,
+  bits 21-30 = 0b1100111010) or sradix. (Rc=1, same bits 21-30) could conflict with
+  a future entry at key 0b1100111010.
+- **Fix**: Add a comment: `// sradix: XS-form, XO=413 (bits 21-29=0b110011101).`
+  `// No main-table entry uses bits 21-30 starting with 0b110011101x.`
+
+IDs PPCBUG-606 through PPCBUG-639 are unallocated — no further bugs found in Phase C2.
+
+---
+
+## Phase C3 — Disassembler formatter parity
+
+Per-group report: `audit-out/phase-c3-disasm.md`.
+
+**Methodology**: Full line-by-line audit of `disasm.rs:format()` and all ~70 per-class helpers.
+Cross-referenced against `xenia-canary/src/xenia/cpu/ppc/ppc_opcode_disasm_gen.cc`,
+`tests/golden/extended_mnemonics.json`, and `tests/golden/base_mnemonics.json`.
+Checked: mnemonic correctness (Rc/OE/LK/AA/L-field), operand formatting (signed vs unsigned,
+hex vs decimal), simplified-mnemonic priority, branch-condition extended forms, VMX register
+naming, VX128 field extraction, and golden test coverage.
+
+**Overall verdict**: The formatter is structurally sound. All OE/Rc/LK/AA suffix handling, the
+simplified mnemonic priority order, VMX 5-bit and VMX128 7-bit register naming, SPR mnemonics,
+and CR-logical extended forms are correct. Two HIGH bugs found: the `bdnz`/`bdz` extended
+mnemonic appends a spurious condition suffix, and the pre-existing `sync`/`lwsync` bug
+(PPCBUG-088) is re-assessed as HIGH in disassembler scope. Two MEDIUM bugs: decimal vs hex
+for SIMM immediates and D-form displacements (diverges from every real PPC disassembler).
+Several LOW findings for golden fixture correctness and edge cases.
+
+**Key finding**: the disassembler's VX128 field extraction (vperm128 VC, vsldoi128 SH,
+vpermwi128 PERM) is CORRECT in all three cases where the interpreter (PPCBUG-360/361/362)
+has the wrong extraction. The disassembler was written independently and got them right.
+
+### PPCBUG-640 — `fmt_bc`: pure `bdnz`/`bdz` emits `bdnzge`/`bdzge` (spurious condition suffix) (HIGH)
+
+- **Severity**: HIGH
+- **Status**: open
+- **Location**: `disasm.rs:829-834`
+- **Symptom**: For `bcx` with BO=16 (`bdnz`: decrement CTR, branch if CTR≠0, CR ignored):
+  - `decr = (16 & 4) == 0` = true
+  - `uncond = (16 & 16) != 0` = true
+  - Code falls into the `if decr` branch and computes `cond_name_opt` from `(cr_bit=0, cond_true=false)` → `Some("ge")`
+  - Emits: **`bdnzge`** — WRONG. ISA simplified form is `bdnz`.
+
+  For BO=18 (`bdz`): same path → **`bdzge`** — WRONG.
+
+  The bug is absent in `fmt_bclr` which has an explicit `if decr && uncond` guard at line 872
+  producing `bdnzlr`/`bdzlr` correctly. `fmt_bc` lacks this guard.
+
+  The golden fixture "bdnz 0x82000040" (PPCBUG-650 companion) pins the wrong output.
+
+- **Fix**: In `fmt_bc`, inside the `if decr` block, gate the condition string on `!uncond`:
+  ```rust
+  if decr {
+      let z = if bo & 0x02 != 0 { "z" } else { "nz" };
+      let cond_str = if uncond { "" } else { cond_name_opt.unwrap_or("") };
+      let ext_mnem = format!("bd{z}{cond_str}{a}{l}");
+      let ext_ops = format!("{cr}0x{target:08X}");
+      with_ext(&base_mnem, base_ops, 8, &ext_mnem, ext_ops, 8)
+  }
+  ```
+  Also update golden fixtures PPCBUG-650.
+
+- **Impact**: All analysis-DB queries for `bdnz` loops (common in pixel-shader and vertex
+  processing loops) return zero rows; they are stored as `bdnzge`. Developers inspecting
+  loop structures see a misleading condition name on a CTR-only branch.
+
+### PPCBUG-641 — `sync` emits `"sync"` for `lwsync` (L=1) — re-assessment of PPCBUG-088 (HIGH)
+
+- **Severity**: HIGH (disassembler scope; PPCBUG-088 was LOW for interpreter scope)
+- **Status**: open (see PPCBUG-088 for fix)
+- **Location**: `disasm.rs:364`
+- **Symptom**: `PpcOpcode::sync` always emits `"sync"`. The L-field at PPC bit 10 selects
+  `lwsync` (L=1, encoding `0x7C2004AC`). `lwsync` is the acquire barrier in every Xbox 360
+  spinlock. Every `lwsync` in the disassembly DB is stored as `mnemonic='sync'`.
+  `SELECT * WHERE mnemonic='lwsync'` returns zero rows regardless of binary content.
+- **Note**: the golden fixture for lwsync (PPCBUG-649) currently pins the wrong output.
+
+### PPCBUG-642 — `fmt_bcctr` missing extended form for CTR-decrement/ignore-CR BO values (MEDIUM)
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `disasm.rs:880-902`
+- **Symptom**: `bcctrx` with BO=16 (decrement CTR, ignore CR) falls through to `base()` with
+  no extended form. `fmt_bclr` (the equivalent for bclrx) correctly handles the same case with
+  an explicit `decr && uncond` check at line 872, producing `bdnzlr`.
+  Note: `bcctr` with CTR-decrement is undefined by PowerISA; this encoding should never appear
+  in valid compiled code. The inconsistency is a maintenance concern rather than a runtime bug.
+- **Fix**: Add a `decr && uncond` check before the `cond_branch_ext` call in `fmt_bcctr`,
+  mirroring lines 872-876 in `fmt_bclr`. Or add a comment explaining the ISA undefined status.
+
+### PPCBUG-643 — SIMM immediate display: decimal diverges from Canary and real disassemblers (MEDIUM)
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `disasm.rs:946` (addi), `976` (addic), `989` (subfic), `990` (mulli),
+  `1003` (cmpi), `1048-1061` (fmt_ld/fmt_st), and all similar SIMM sites
+- **Symptom**: SIMM immediates are formatted via Rust's `{imm}` (decimal). Canary uses
+  `"-0x{:X}"` / `"0x{:X}"` (signed hex) for every SIMM field. GNU objdump, IDA Pro,
+  and all standard PPC disassemblers use hex. The inconsistency is internal to xenia-rs:
+  `addis`/`oris`/`xoris` use hex (`0x{imm_u:X}`), but `addi`/`addic`/`mulli` use decimal.
+  This misleads analysis-DB queries that mix instructions (e.g. `addi r3, r1, -4` vs
+  `addis r3, r0, 0x8000`).
+- **Impact**: Medium — the output is not *wrong* (the value is correctly computed), but
+  cross-referencing with Canary output or objdump requires manual conversion.
+
+### PPCBUG-644 — D-form load/store displacement uses decimal instead of hex (MEDIUM)
+
+- **Severity**: MEDIUM
+- **Status**: open
+- **Location**: `disasm.rs:1053` (`fmt_ld`), `1061` (`fmt_st`), `1069` (`fmt_ds`)
+- **Symptom**: `format!("{rn}, {d}({})", gpr(ra))` outputs decimal for the displacement.
+  Canary outputs `"-0x8(r1)"` not `"-8(r1)"`. Every standard PPC disassembler uses hex.
+  Affects 25+ D-form and DS-form opcodes. Negative displacements (-8, -16, etc.) are
+  especially confusing in decimal when reading stack frame accesses.
+- **Fix**:
+  ```rust
+  let d_str = if d < 0 { format!("-0x{:X}", -d) } else { format!("0x{:X}", d) };
+  base(mnem, format!("{rn}, {d_str}({})", gpr(ra)), 8)
+  ```
+  Update all golden fixture rows with displacement values.
+
+### PPCBUG-645 — `cntlzdx` Rc suffix: moot for valid encodings, but WONTFIX (LOW)
+
+- **Severity**: LOW
+- **Status**: wontfix
+- **Location**: `disasm.rs:286`
+- **Note**: `fmt_x_unary_rc` would emit `cntlzd.` for Rc=1, but valid `cntlzd` encodings
+  always have Rc=0. Canary emits `cntlzd` always. No impact for valid code.
+
+### PPCBUG-646 — `fmt_rlwimi` inslwi/insrwi priority overlap: confirmed correct (LOW)
+
+- **Severity**: LOW
+- **Status**: wontfix
+- **Note**: After careful analysis, the `inslwi` guard excludes `insrwi` overlap cases
+  (`sh != 31u32.wrapping_sub(me)`). Priority is correct. Informational only.
+
+### PPCBUG-647 — `fmt_rlwinm` `extrwi` uses `wrapping_sub` which can give misleading results for invalid encodings (LOW)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `disasm.rs:1137`
+- **Symptom**: `let b = sh.wrapping_sub(n) % 32;` — for invalid `sh < n` encodings,
+  `wrapping_sub` gives a large u32, `% 32` gives a confusing value. For all compiler-emitted
+  encodings `sh >= n` holds. Add `&& sh >= 32 - mb` to the guard to avoid the fallthrough.
+
+### PPCBUG-648 — `fmt_mftb` TBR=268: ext mnemonic identical to base mnemonic (LOW)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `disasm.rs:1443`
+- **Symptom**: `268 => with_ext("mftb", base_ops, 8, "mftb", gpr(rd), 8)` — base is `mftb`,
+  extended is also `mftb`. `display()` picks the extended form (omitting the `268` operand),
+  making it ambiguous vs. `mftbu`. Consider: either emit base-only (`mftb r3, 268`) or rename
+  the base to `mftb.raw` for disambiguation.
+
+### PPCBUG-649 — Golden fixture for `lwsync` pins wrong output (no ext_mnemonic) (LOW)
+
+- **Severity**: LOW (test coverage gap)
+- **Status**: open
+- **Location**: `tests/golden/extended_mnemonics.json`, entry "lwsync"
+- **Symptom**: Fixture has `mnemonic: "sync"` and no `ext_mnemonic`. After PPCBUG-088/641
+  fix, expected output is `mnemonic: "sync"`, `ext_mnemonic: "lwsync"`. Current fixture
+  defeats regression detection — the test passes with wrong output.
+
+### PPCBUG-650 — Golden fixtures for `bdnz`/`bdz` pin wrong extended mnemonic (LOW)
+
+- **Severity**: LOW (companion to PPCBUG-640)
+- **Status**: open
+- **Location**: `tests/golden/extended_mnemonics.json`, rows "bdnz 0x82000040" and "bdz 0x82000040"
+- **Symptom**: Both rows have `ext_mnemonic: "bdnzge"` and `ext_mnemonic: "bdzge"`.
+  After PPCBUG-640 fix, correct values are `"bdnz"` and `"bdz"`.
+
+### PPCBUG-651 — `fmt_vmx128_pack_d3d` shared by `vpkd3d128` and `vrlimi128`: confirmed correct (LOW)
+
+- **Severity**: LOW
+- **Status**: wontfix
+- **Note**: Both opcodes use VX128_4 form. Shared formatter outputs identical operand lists
+  (`vd, vb, imm, z`) which is correct for both. Informational only.
+
+### PPCBUG-652 — Zero golden fixtures for any VMX128 opcode disassembly (LOW)
+
+- **Severity**: LOW (test coverage gap)
+- **Status**: open
+- **Location**: `tests/golden/` — all three JSON files
+- **Symptom**: No fixture pins the formatted output of any VMX128 instruction. Regressions
+  in VMX128 field extraction (e.g. a re-introduction of PPCBUG-360/361/362 in the disassembler)
+  would be invisible. Recommend adding at minimum: `vaddfp128`, `vperm128`, `vsldoi128`,
+  `vpkd3d128`, `vcmpeqfp128.`, `vmaddfp128`.
+
+### PPCBUG-653 — `fmt_trap_imm` unconditional trap extended form: confirmed not-a-bug (LOW)
+
+- **Severity**: LOW
+- **Status**: wontfix
+- **Note**: `twi 31, rA, IMM` (to=31) has no ISA simplified mnemonic unless RA=0 and IMM=0
+  (which matches `tw 31, r0, r0 = trap`). The `fmt_trap_imm` correctly emits base-only for
+  `twi 31, rA, N`. Informational.
+
+### PPCBUG-654 — `fmt_rldimi` `insrdi` guard excludes valid `mb=0` (b=0) case (LOW)
+
+- **Severity**: LOW
+- **Status**: open
+- **Location**: `disasm.rs:1220`
+- **Symptom**: Guard `if mb > 0` excludes `insrdi rA, rS, n, 0` (b=0 → mb=0). A valid
+  compiler-emitted `rldimi` with sh+mb+n=64 and mb=0 falls through to base form instead of
+  displaying the `insrdi` simplified mnemonic.
+- **Fix**: Remove the `mb > 0` guard; the inner `n > 0` guard is sufficient to avoid
+  degenerate cases.
+
+IDs PPCBUG-655 through PPCBUG-679 are unallocated — no further bugs found in Phase C3.
diff --git a/audit-report-2026-04-29.md b/audit-report-2026-04-29.md
new file mode 100644
index 0000000..7ff7667
--- /dev/null
+++ b/audit-report-2026-04-29.md
@@ -0,0 +1,421 @@
+# PPC Instruction Audit — Triaged Report (2026-04-29)
+
+**Status**: audit complete. **No code modified.** This file is the fix-order plan for the follow-up session.
+**Source of truth**: detailed bug entries (one heading per PPCBUG ID) live in `audit-findings.md`. This file references every entry by ID so nothing is lost — it does not duplicate the per-bug detail.
+
+## Counts
+
+- **Total findings**: 253 PPCBUG IDs, of which 4 are explicitly retracted/withdrawn (PPCBUG-220, 222, 226, 482, 483 — see Notes section).
+- **Net findings**: ~248 actionable.
+- **Severity breakdown** (rough):
+  - HIGH: ~55 (~22%)
+  - MEDIUM: ~75 (~30%)
+  - LOW (test gaps + cosmetic + informational): ~118 (~48%)
+
+## Headline findings (most likely Sylpheed-renderer-blockers)
+
+1. **PPCBUG-107 cascade** — `ReservationTable::invalidate_for_write` defined and unit-tested but never called from any of the **50+ store opcodes** in the interpreter. Under `--parallel`, every cross-thread atomic via `lwarx`/`stwcx.` is silently broken: spinlocks succeed without exclusion, atomic counters race, condition-variable handshakes never sync. Plausible direct cause of the 4-worker-thread renderer plateau (`project_xenia_rs_sylpheed_stage3_2026_04_29.md`). **Fix is mechanical**: one-line `if t.has_active_reservers() { t.invalidate_for_write(ea) }` before every `mem.write_*` in interpreter.rs.
+
+2. **PPCBUG-053+054 cascade** — `bcx`/`bclrx` CTR zero-test compares all 64 bits; `mtspr CTR` writes full 64-bit GPR. Combined with PPCBUG-006 (`negx` poisons GPR upper 32) → **`neg; mtctr; bdnz` loops run forever**.
+
+3. **8 decoder/field-extraction bugs collapse into 6 missing accessors** + 1 wrong sh64 formula + 1 missing decode_op6 dot-form entry. The disassembler already has correct local versions. Single mechanical sweep.
+
+4. **PPCBUG-046 (`clrldi r3, r4, 32`)** — the canonical zero-extend-low-32 idiom is currently a no-op. Emitted constantly by 32-bit-ABI compilers.
+
+5. **PPCBUG-510** — `stvewx128` corrupts 12 adjacent bytes per call.
+
+6. **PPCBUG-424/425** — `vmaddfp128`/`vmaddcfp128` operand swap. Every D3D vertex/pixel shader using FMA with non-aliased operands gets wrong arithmetic.
+
+7. **PPCBUG-360/363** — `vperm128` uses wrong control vector (every D3D shader swizzle); `vpkd3d128` missing post-pack permutation (canonical D3D vertex-pack `pack=1` always wrong).
+
+8. **PPCBUG-275/420-422** — VC-form and VMX128_R-form `rc_bit()` reads bit 0 instead of bit 21/27 → **CR6 never updated for ANY VMX vector compare dot form**. Breaks every `vcmpequb. + bc CR6_all_true` early-exit loop in audio mixing, font rendering, string ops.
+
+## Recommended fix order
+
+The phases below are the recommended fix order for the follow-up session. Each phase is **independently mergeable**; later phases may reveal that earlier phases unblocked their symptoms (e.g. P1 by itself could be sufficient to break open the Sylpheed renderer plateau).
+
+After each phase: `cargo test --workspace --release` (must stay at 506+ pass) AND `xenia-rs check sylpheed.iso -n 100M` (must not regress against the 2026-04-29 addis-fix baseline of `swaps=2`). The acid test is whether `draws > 0` opens after P1 or P2.
+
+---
+
+### Phase 1 — Cross-thread atomicity (PPCBUG-107 cascade)
+
+**Why first**: highest confidence smoking-gun for the renderer plateau. Single, mechanical, low-risk fix. Largest leverage relative to size.
+
+**Coupled — must land together**:
+- PPCBUG-107 (root: missing call from stores)
+- PPCBUG-130 (9 byte/halfword stores)
+- PPCBUG-140, 141, 142, 143, 144 (5 word stores: stw/stwu/stwx/stwux/stwbrx)
+- PPCBUG-150 (5 doubleword stores: std/stdu/stdx/stdux/stdbrx)
+- PPCBUG-160 (3 multiple/string stores: stmw/stswi/stswx)
+- PPCBUG-167 (9 FP stores)
+- PPCBUG-511, 512, 513, 514 (16 VMX stores)
+
+**Independent but related**:
+- PPCBUG-151 (stwcx/stdcx reservation width discriminator) — separate fix; add `reservation_width: u8` to PpcContext.
+- PPCBUG-108 (legacy per-context path: cross-thread invalidation impossible) — informational; --reservations-table mode bypasses.
+
+**Approach** — one PR adds `if t.has_active_reservers() { t.invalidate_for_write(ea) }` before every `mem.write_*` call site. Scope:
+```
+mem.write_u8 / write_u16 / write_u32 / write_u64 / write_f32 / write_f64
+mem.write_vec128 / write_vec128_aligned (for VMX)
+```
+~38 sites total. Add 1+ targeted concurrency tests (lwarx + cross-thread plain store + stwcx., expect EQ=0).
+
+---
+
+### Phase 2 — Decoder/field-extraction structural sweep
+
+**Why second**: single mechanical sweep, fixes 12 distinct HIGH-severity findings, unblocks correct execution of compiler-emitted code. Disassembler already has correct local extraction logic — promote/port.
+
+**Coupled — same commit**:
+- PPCBUG-040 + PPCBUG-560 — fix `sh64()` bit order AND fix the test helper that was masking it
+- PPCBUG-046 + PPCBUG-561 — promote `mb_md()` from `disasm.rs:1256` to `decoder.rs`; replace 6 inline-formula sites in interpreter.rs (rldicl/rldicr/rldic/rldimi/rldcl/rldcr)
+- PPCBUG-275 + PPCBUG-276 + PPCBUG-420 + PPCBUG-421 + PPCBUG-422 + PPCBUG-562 — add `vc_rc_bit()` (PPC bit 21) and `vx128r_rc_bit()` (PPC bit 27); replace `instr.rc_bit()` at all VMX compare dot-form sites
+- PPCBUG-315 + PPCBUG-563 — add `vx128_4_z()`, `vx128_4_imm()`; fix `vrlimi128`
+- PPCBUG-361 + PPCBUG-565 — add `vx128_5_sh()`; fix `vsldoi128`
+- PPCBUG-362 + PPCBUG-564 — add `vx128_p_perm()`; fix `vpermwi128`
+- PPCBUG-423 + PPCBUG-600 — add 5 odd-key entries to `decode_op6` key4 for `vcmp*fp128.` dot forms
+
+**Independent in this phase**:
+- PPCBUG-360 — `vperm128` reads VC from `vd128()` instead of VX128_2 VC field at integer bits 6-8. Fix at the call site (or add `vx128_2_vc()` accessor).
+- PPCBUG-363 + PPCBUG-369 — `vpkd3d128` missing post-pack permutation; add the `pack`/`shift` field handling per Canary.
+
+**Test fixture updates required** (PPCBUG-560 lesson) — once `sh64()` is fixed, verify all `disasm_goldens.rs` test helpers encode shifts ISA-correctly. Don't trust the existing fixtures blindly.
+
+---
+
+### Phase 3 — Other HIGH bugs (single targeted fixes)
+
+**Independent**:
+- PPCBUG-510 — `stvewx128` corrupting 12 bytes per call. Direct fix: align EA to word, write only 4 bytes.
+- PPCBUG-424 — `vmaddfp128` operand order: change `ai.mul_add(bi, di)` → `ai.mul_add(di, bi)`.
+- PPCBUG-425 — `vmaddcfp128` operand order similarly.
+- PPCBUG-053 + PPCBUG-054 — `bcx`/`bclrx` CTR zero-test (32-bit) + `mtspr CTR` truncation (defensive firewall). Coupled.
+- PPCBUG-640 — `fmt_bc` spurious condition suffix on pure `bdnz`/`bdz`. Port the `fmt_bclr` pattern.
+- PPCBUG-641 — `lwsync` shows as `sync` in disassembler (re-assessment of PPCBUG-088). Same fix.
+
+---
+
+### Phase 4 — 32-bit ABI writeback truncation sweep
+
+**Why this phase**: cross-cutting, mechanical. Once ALL writebacks truncate via `as u32 as u64`, the systemic 32-bit-ABI invariant is restored and most CR0/CA helper-correctness concerns become moot.
+
+#### 4a — Active poisoning (every execution corrupts GPR upper bits)
+
+These bugs corrupt GPR upper bits **regardless** of whether upstream sources are clean — typically because the implementation applies Rust's `!u64` (full 64-bit NOT) somewhere:
+- PPCBUG-006 (negx — `(!ra).wrapping_add(1)`)
+- PPCBUG-008 (subfex — `(!ra).wrapping_add(rb).wrapping_add(ca)`)
+- PPCBUG-018 (subfzex)
+- PPCBUG-019 (subfmex)
+- PPCBUG-028 (orcx — `rs | !rb`)
+- PPCBUG-029 (norx — `!(rs | rb)` — the canonical `not` mnemonic, hot path)
+- PPCBUG-030 (nandx)
+- PPCBUG-031 (eqvx — `!(rs ^ rb)` — common `eqv rA, rA, rA` set-to-all-ones)
+- PPCBUG-033 (andcx via `!rb`)
+- PPCBUG-034 (extsbx — `as i8 as i64 as u64`)
+- PPCBUG-035 (extshx)
+
+#### 4b — Same-shape-as-addis (latent under clean inputs, active when upstream is poisoned)
+
+- PPCBUG-001 (addi), PPCBUG-002 (addic), PPCBUG-003 (addicx), PPCBUG-005 (subficx), PPCBUG-007 (subfcx CA), PPCBUG-008 (subfex CA — also in 4a)
+- PPCBUG-004 (mulli), PPCBUG-009 (mullwx)
+- PPCBUG-010 + PPCBUG-011 (divwx writeback + CR0 — **must land together**, not independently)
+- PPCBUG-041 + PPCBUG-042 + PPCBUG-043 (srawx/srawix writeback + CR0 coupling — **must land together**)
+- PPCBUG-095, 096, 097, 098 (lha/lhax/lhau/lhaux halfword sign-extension)
+- PPCBUG-105 (lwa/lwax/lwaux — note: 64-bit-mode-only; less common in 32-bit-ABI binaries)
+
+#### 4c — Latent writeback (only triggers if 4a/4b are unfixed)
+
+These can be fixed in the same sweep but won't fire under clean inputs:
+- PPCBUG-012, 013, 014, 015, 016, 017 (addx/addcx/addex/addzex/addmex/subfx)
+- PPCBUG-032 (andx/orx/xorx)
+
+#### 4d — CR0 32-bit-ABI compare (cross-cutting catch-all)
+
+PPCBUG-020 documents the catch-all; the per-opcode locations are referenced from there:
+- PPCBUG-020 (catch-all in groups 2-5)
+- PPCBUG-023 (andisx)
+- PPCBUG-024 (rlwinmx), PPCBUG-025 (rlwimix), PPCBUG-026 (rlwnmx)
+- PPCBUG-036 (extsbx), PPCBUG-037 (extshx) — **must land with PPCBUG-034/035**
+- PPCBUG-044 (slwx/srwx)
+
+**Fix shape** — at every Rc=1 path, change `update_cr_signed(0, result as i64)` to `update_cr_signed(0, result as u32 as i32 as i64)`. Once 4a/4b/4c land, both forms become equivalent and 4d becomes belt-and-suspenders (still recommended for resilience).
+
+---
+
+### Phase 5 — FPU correctness (graphics middleware impact)
+
+#### 5a — Round-to-int and FPSCR.RN
+
+- PPCBUG-221 + PPCBUG-227 (`round_to_i64` NearestEven broken near 2^52 — must land together; `round_to_i32` delegates)
+- PPCBUG-201 (FPSCR.RN not honored for double arithmetic)
+- PPCBUG-432 (vrfin/vrfin128 round-half-away-from-zero vs round-to-nearest-even)
+
+#### 5b — VXISI / NaN / SNaN handling for FMA family
+
+- PPCBUG-181, 182 (single fmaddsx/fmsubsx/fnmaddsx/fnmsubsx VXISI)
+- PPCBUG-202, 203, 204 (double fmaddx/fmsubx/fnmaddx/fnmsubx VXISI — esp. 203 hot for Newton-Raphson)
+- PPCBUG-183, 205 (fnmadd/fnmsub Rust unary `-` flips NaN sign — fix: skip negation on NaN)
+- PPCBUG-186 (SNaN priority for FMA)
+- PPCBUG-128 (lfs SNaN quietening — bit-manipulation widening helper needed)
+
+#### 5c — Inexact / FPSCR exception bits
+
+- PPCBUG-180 (single XX/FR/FI never set), PPCBUG-200 (double XX/FR/FI never set)
+- PPCBUG-223 (fcmpo VXSNAN/VXVC), PPCBUG-224 (fcfidx XX), PPCBUG-225 (frspx XX/FR/FI), PPCBUG-229 (fctidx/fctidzx XX/FX), PPCBUG-230 (fctiwx/fctiwzx XX/FX), PPCBUG-231 (frspx SNaN host dependency)
+- PPCBUG-165 + PPCBUG-166 + PPCBUG-168 (stfs* FPSCR + RN + SNaN)
+
+#### 5d — Subnormal flush (FPSCR.NI / VSCR.NJ)
+
+- PPCBUG-185 (FPU NI subnormal flush not modeled)
+- PPCBUG-435, 436, 437 (VMX NJ subnormal flush — vaddfp/vsubfp/vmulfp128, vmsum3fp128/vmsum4fp128 product intermediates, vmaddfp/vmaddfp128/vmaddcfp128/vnmsubfp128 outputs)
+
+#### 5e — Estimate precision (vs hardware ~12-bit)
+
+- PPCBUG-184 (fres)
+- PPCBUG-428..431 (vrefp, vrsqrtefp, vexptefp, vlogefp — same shape as fres)
+
+#### 5f — VMX float compares + saturation
+
+- PPCBUG-426, 427 (vnmsubfp/vnmsubfp128 double-rounding)
+- PPCBUG-433 (vctsxs/vcfpsxws128 NaN saturate to INT_MIN)
+
+---
+
+### Phase 6 — Other MEDIUM correctness
+
+- PPCBUG-021 (overflow.rs OE checks at bit 63 — sub-register ops; partly covered by P4)
+- PPCBUG-022 (`mulld_ov` missing INT_MIN × -1)
+- PPCBUG-027 (rlwimix upper-32 ISA-deviation — auto-resolves once P4 lands)
+- PPCBUG-039 (cntlzdx 32-bit-ABI counts upper-zero — only matters if emitted)
+- PPCBUG-063 (trap pc-after-advance)
+- PPCBUG-064 (sc LEV field)
+- PPCBUG-065 (twi 31, r0, IMM typed-trap — relevant to Sylpheed C++ throw work, see `project_xenia_rs_sylpheed_throw_2026_04_28.md`)
+- PPCBUG-068 (mcrfs VX summary recomputation)
+- PPCBUG-078 (mtmsrd L=1 partial MSR-write)
+- PPCBUG-080 (mfvscr zero upper 96 bits)
+- PPCBUG-123 + PPCBUG-124 + PPCBUG-161 + PPCBUG-566 (XER TBC for lswx/stswx — coupled; add `xer_tbc: u8` to PpcContext, wire into xer()/set_xer(); enables lswx and stswx)
+- PPCBUG-125 (lmw RA-in-destination skip)
+- PPCBUG-126 + PPCBUG-162 (lswi/stswi `instr.rb()` → `instr.nb()`)
+- PPCBUG-487 + PPCBUG-495 (vsum* operand naming)
+- PPCBUG-515 (lvebx/lvehx/lvewx vs Canary divergence — document; xenia-rs is more ISA-faithful)
+- PPCBUG-516 (lvsr sh=0 case — add comment + debug_assert)
+- PPCBUG-601 (decode_op6 overlapping windows — document the invariant)
+- PPCBUG-642 (fmt_bcctr extended forms)
+- PPCBUG-643 + PPCBUG-644 (SIMM/D-form decimal vs hex — alignment with Canary disassembly)
+- PPCBUG-367 (vupkhpx/vupklpx channel replication vs zero-extend)
+- PPCBUG-368 (vpkpx pack_pixel_555 channel assignment unverified)
+- PPCBUG-366 (vspltisb/vspltish sign-extension idiom — fragile, not wrong)
+
+---
+
+### Phase 7 — Frozen-snapshot drift (separate sweep)
+
+8 opcodes' frozen snapshots in `ppc-manual/<cat>/<op>.md` differ from live code:
+- PPCBUG-066 (td/tdi/tw/twi)
+- PPCBUG-117 (ldarx)
+- PPCBUG-145 (stwcx)
+- PPCBUG-560 (already-listed: rldicl test helper bit-order)
+- Plus the implicit drift in addicx (PPCBUG-003), andisx (PPCBUG-023), cmp/cmpi (PPCBUG-050), extsbx/extshx (PPCBUG-036/037, PPCBUG-032 in batch 1)
+
+**Recommendation**: regenerate frozen snapshots from current code for the entire ppc-manual after Phases 1-4 land. Add a CI check that compares snapshots vs live code on every PR.
+
+---
+
+### Phase 8 — Test gap closure (broad)
+
+Single PR per group is overkill; recommend bundling test additions with each Phase 1-6 PR (test the bug being fixed). The remaining LOW IDs are pure-test-gap entries — list:
+
+- PPCBUG-045 (shift), 047 (rld), 055 (branch), 067 (trap+sc), 070 (CR logical)
+- PPCBUG-081, 082, 083, 084, 085 (SPR/MSR/TB/FPSCR/VSCR moves), 089 (cache+sync)
+- PPCBUG-091 (lbz), 100 (lha), 109, 110, 111 (lwa/lwbrx/lwarx), 118 (ld), 127 (lmw/lswi/lswx), 129 (lfs/lfd)
+- PPCBUG-132 (stb/sth), 146, 147 (stw/stwcx), 153 (std/stdcx), 163 (stmw/stswi/stswx), 171 (stfs/stfd)
+- PPCBUG-187 (FPU single), 208 (FPU double), 228 (FPU misc convert)
+- PPCBUG-240 (VMX add/sub), 243 (VMX sat helpers)
+- PPCBUG-277, 278, 279 (VMX compare/min/max/avg)
+- PPCBUG-316, 317, 320, 321, 322, 323, 324, 325 (VMX shift/rotate/logical)
+- PPCBUG-370, 371, 372, 373, 374, 375, 376, 377, 378 (VMX permute/pack)
+- PPCBUG-438, 439, 440 (VMX float compare/round/convert)
+- PPCBUG-490, 491, 492, 493, 494 (VMX multiply-sum)
+- PPCBUG-517, 518, 519 (VMX load/store)
+- PPCBUG-567 (decoder accessors)
+- PPCBUG-604 (decoder dispatch tables)
+- PPCBUG-649, 650, 652 (golden fixtures for branches/VMX128)
+
+---
+
+## Notes & administrative
+
+### Withdrawn / retracted
+
+- **PPCBUG-220** — `fctiwx` strict-`>` threshold actually correct (`i32::MAX` exactly representable in f64). Retracted by group-31 subagent.
+- **PPCBUG-222** — `fctidx` positive-overflow sentinel `0x7FFF_FFFF_FFFF_FFFF` is the correct ISA value. Retracted.
+- **PPCBUG-226** — FPRF 5-bit codes for fcmpu/fcmpo are correct per PowerISA. Retracted.
+- **PPCBUG-482** — `vmhaddshs` shift `>>15` is correct per spec snapshots. Retracted.
+- **PPCBUG-483** — `vmhraddshs` shift `>>15` is correct per spec snapshots. Retracted.
+
+### Wontfix / informational (not retracted but no fix needed)
+
+- **PPCBUG-038** — extswx ISA-correct, intentional 64-bit sign-extension. Document the asymmetry with extsb/extsh after PPCBUG-034/035 land.
+- **PPCBUG-090, 099, 152** — invalid-form (rD==rA) silently destroys load/store result. Per ISA: undefined behavior. No compiler emits these; matches Canary. Optional `debug_assert!`.
+- **PPCBUG-106, 115, 131, 169, 170, 206, 207, 318, 319, 364, 365, 434, 651, 653, 645, 646, 648** — informational confirmations that the implementation is correct, no change needed.
+- **PPCBUG-069** — test comment OX(so)=0 is wrong but the assert is correct.
+- **PPCBUG-602, 603, 605** — undocumented decoder dispatch quirks; correct but should add comments.
+- **PPCBUG-647, 654** — disassembler edge-case behavior on invalid encodings; not-a-bug for valid input.
+
+### Coupling matrix (must-land-together)
+
+| Group | IDs | Reason |
+|---|---|---|
+| divwx | 010, 011 | Quotient zero-extension changes the CR0 sign view |
+| srawx/srawix | 041, 042, 043 | Writeback truncation invalidates the CR0 view |
+| extsbx/extshx | 034+036, 035+037 | Same coupling shape as srawx |
+| sh64 | 040, 560 | Test helper is wrong in the inverse direction |
+| mb_md sweep | 046, 561 | Promote disasm.rs accessor first |
+| VC-form Rc | 275, 276, 420, 421, 562 | All consume the same new accessor |
+| VMX128_R Rc | 422, 562 | Same accessor sweep |
+| vrlimi128 | 315, 563 | Field accessor + caller fix |
+| vsldoi128 | 361, 565 | Field accessor + caller fix |
+| vpermwi128 | 362, 564 | Field accessor + caller fix |
+| vcmp*fp128. | 423, 600 | decode_op6 odd keys + opcode mapping |
+| XER TBC | 123, 124, 161, 566 | Add field, wire xer()/set_xer(), enables lswx/stswx |
+| round_to_i64 | 221, 227 | round_to_i32 delegates |
+| stfs FPSCR | 165, 166, 168 | Single fix shape covers all three |
+
+### Dependency on the addis fix
+
+The addis fix (`project_xenia_rs_addis_signext_root_cause_2026_04_29.md`) is already in place. Phase 4 generalizes that fix systematically; without it, the writeback-truncation invariant would still be incomplete.
+
+### Anticipated impact on the Sylpheed renderer plateau
+
+Strong candidates for direct cause of the plateau:
+- **PPCBUG-107** — broken atomics. Workers wait forever on never-signaled events; classical broken-spinlock symptom.
+- **PPCBUG-053+054** — broken `bdnz` loops; could explain workers parked indefinitely.
+- **PPCBUG-046 (`clrldi r3, r4, 32`)** — pollution propagation in 32-bit ABI; could break any pointer-clean-up sequence.
+
+After applying Phase 1 alone, run `xenia-rs check sylpheed.iso -n 4B --parallel` and check whether `draws > 0`. If yes, the plateau was atomics; if no, proceed to P2/P3.
+
+---
+
+## Progress log
+
+### P1 — Cross-thread atomicity sweep (merged 2026-05-01, HEAD ca5b90b)
+
+**PPCBUGs fixed**: 107, 130, 140, 141, 142, 143, 144, 150, 160, 167, 511, 512, 513, 514, 151, 108. Plus review-fix additions: dcbz, dcbz128, stswi two-line, stswx two-line (merged in review-fix commit c9f194d).
+
+**Gate results**:
+- `cargo test --workspace --release`: 449 passed, 0 failed
+- `-n 100M` lockstep: swaps=2, clean
+- `-n 100M --parallel --reservations-table`: swaps=2, clean
+- **Acid test** `-n 4B --parallel --reservations-table`: swaps=2, draws=**0**, no RtlRaiseException, no panics
+
+**Conclusion**: P1 did NOT unblock the Sylpheed renderer. `draws` remains 0. The renderer plateau is not caused by broken cross-thread atomics alone. Proceeding to P2 (decoder/field-extraction sweep). The strongest remaining candidate per the plan is PPCBUG-046 (`clrldi r3, r4, 32` no-op).
+
+---
+
+### P2 — Decoder/field-extraction structural sweep (merged 2026-05-01, HEAD see `git log master --oneline -1`)
+
+**PPCBUGs fixed**: 040, 046, 275, 276, 315, 360, 361, 362, 363, 369, 420, 421, 422, 423, 560, 561, 562, 563, 564, 565, 600.
+
+**Batches**:
+- Batch 1: PPCBUG-040+560 — sh64() bit-order fix (XS-form SH split) + rldicl test helper encoding
+- Batch 2: PPCBUG-046+561 — mb_md() accessor; all 6 rld* MB fields corrected (clrldi was a no-op)
+- Batch 3: PPCBUG-275+276+420+421+422+423+562+600 — vc_rc_bit()/vx128r_rc_bit() Rc accessors; 13 vcmp interpreter sites; 5 decode_op6 dot-form entries
+- Batch 4: PPCBUG-315+563 — vrlimi128 vx128_4_z/imm field extraction
+- Batch 5: PPCBUG-361+565 — vsldoi128 vx128_5_sh field extraction
+- Batch 6: PPCBUG-362+564 — vpermwi128 vx128_p_perm field extraction
+- Batch 7: PPCBUG-360 — vperm128 vc128_2() accessor (was erroneously vd128())
+- Batch 8: PPCBUG-363+369 — vpkd3d128 post-pack permutation (MakePermuteMask tables from canary)
+
+**Gate results**:
+- `cargo test --workspace --release`: 201 (cpu) + 6 (disasm goldens) + 144 + 76 + 16 + 8 + … passed, 0 failed
+- Independent code reviewer: all 9 check items OK
+- `-n 100M` lockstep smoke: ISO not available in CI environment; last known good at P1 HEAD was swaps=2
+- **Acid test** `-n 4B --parallel --reservations-table`: pending (ISO not in CI environment)
+
+**Conclusion**: All P2 fixes applied and reviewed. Decoder field extraction is now correct for all audited VMX128 and MD/XS-form instructions. Whether P2 unblocks the renderer (`draws > 0`) requires the sylpheed.iso acid test on the user's machine. PPCBUG-046 (clrldi no-op fix) was the highest-probability P2 renderer-unblock candidate. Next: P3 — isolated HIGH bugs (PPCBUG-510, 424/425, 053+054, 640, 641).
+
+---
+
+## Index — every PPCBUG referenced (in numerical order)
+
+This list intentionally includes every ID found in `audit-findings.md` so nothing is dropped. For each entry's full description / file:line / fix snippet / test recommendation, see the corresponding `### PPCBUG-NNN` heading in `audit-findings.md`.
+
+001-022 (batch 1: integer ALU): 001, 002, 003, 004, 005, 006, 007, 008, 009, 010, 011, 012, 013, 014, 015, 016, 017, 018, 019, 020, 021, 022.
+
+023 (batch 2 group 6 logic immediate): 023.
+
+024-027 (batch 2 group 9 word rotate): 024, 025, 026, 027.
+
+028-033 (batch 2 group 7 logic register): 028, 029, 030, 031, 032, 033.
+
+034-039 (batch 2 group 8 sign-extend / count-leading-zeros): 034, 035, 036, 037, 038, 039.
+
+040-045 (batch 2 group 11 shift): 040, 041, 042, 043, 044, 045.
+
+046-047 (batch 2 group 10 doubleword rotate): 046, 047.
+
+048-052 reserved (group 12 compare): 048, 049, 050.
+
+053-055 (batch 3 group 13 branch): 053, 054, 055.
+
+063-067 (batch 3 group 14 trap+sc): 063, 064, 065, 066, 067.
+
+068-070 (batch 3 group 15 CR logical): 068, 069, 070.
+
+078-085 (batch 3 group 16 SPR/MSR/TB/FPSCR/VSCR): 078, 079, 080, 081, 082, 083, 084, 085.
+
+088-089 (batch 3 group 17 cache+sync): 088, 089.
+
+090-091 (batch 4 group 18 load byte): 090, 091.
+
+095-100 (batch 4 group 19 load halfword): 095, 096, 097, 098, 099, 100.
+
+105-111 (batch 4 group 20 load word + reservation): 105, 106, 107, 108, 109, 110, 111.
+
+115-118 (batch 4 group 21 load doubleword): 115, 116, 117, 118.
+
+123-127 (batch 4 group 22 load multiple/string): 123, 124, 125, 126, 127.
+
+128-129 (batch 4 group 23 load float): 128, 129.
+
+130-132 (batch 5 group 24 store byte/halfword): 130, 131, 132.
+
+140-147 (batch 5 group 25 store word + stwcx): 140, 141, 142, 143, 144, 145, 146, 147.
+
+150-153 (batch 5 group 26 store doubleword): 150, 151, 152, 153.
+
+160-163 (batch 5 group 27 store multiple/string): 160, 161, 162, 163.
+
+165-171 (batch 5 group 28 store float): 165, 166, 167, 168, 169, 170, 171.
+
+180-187 (batch 6 group 29 FPU single arithmetic): 180, 181, 182, 183, 184, 185, 186, 187.
+
+200-208 (batch 6 group 30 FPU double arithmetic): 200, 201, 202, 203, 204, 205, 206, 207, 208.
+
+220-231 (batch 6 group 31 FPU sign/move/compare/convert): 220 [retracted], 221, 222 [retracted], 223, 224, 225, 226 [retracted], 227, 228, 229, 230, 231.
+
+240-243 (batch 7 group 32 VMX integer add/sub): 240, 241, 242, 243.
+
+275-279 (batch 7 group 33 VMX integer compare/min/max/avg): 275, 276, 277, 278, 279.
+
+315-325 (batch 7 group 34 VMX integer logical/shift/rotate): 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325.
+
+360-378 (batch 8 group 35 VMX permute/pack): 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378.
+
+420-440 (batch 8 group 36 VMX float arith+compare): 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440.
+
+482-495 (batch 8 group 37 VMX multiply-sum + special): 482 [retracted], 483 [retracted], 487, 490, 491, 492, 493, 494, 495.
+
+510-519 (batch 8 group 38 VMX load/store): 510, 511, 512, 513, 514, 515, 516, 517, 518, 519.
+
+560-567 (Phase C1 decoder field extractors): 560, 561, 562, 563, 564, 565, 566, 567.
+
+600-605 (Phase C2 decoder opcode-lookup): 600, 601, 602, 603, 604, 605.
+
+640-654 (Phase C3 disassembler formatter): 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654.
+
+**Counted IDs**: 253. **Retracted**: 220, 222, 226, 482, 483 (5). **Net actionable**: 248.
+
+**Counted by phase here**: P1 (~17 IDs), P2 (~17 IDs), P3 (~7 IDs), P4 (~30 IDs), P5 (~30 IDs), P6 (~25 IDs), P7 (~5 IDs), P8 (~50 IDs), Notes (~30 wontfix/informational/retracted). Total accounts for all 253 IDs — every ID is either in a fix phase, the wontfix/informational list, or retracted. **Nothing has been dropped.**