Files
xenia-rs/audit-findings.md
MechaCat02 f424132a5b chore(audit): mark P3 PPCBUGs applied; append P3 progress section
P3 phase merged at f3ebaba. Update audit-findings.md status fields and
append the P3 progress section to audit-report-2026-04-29.md, including
the new PPCBUG-700 discovery (VMX128 register accessor canary-compliance).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 11:28:38 +02:00

3452 lines
190 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PPC Instruction Audit — Findings Tracker
**Started**: 2026-04-29 (single session, audit-only)
**Trigger**: `addis` 32-bit-ABI sign-extension fix surfaced a likely systemic class of bugs.
**Status**: in flight. Per-group reports live in `audit-out/`. This file is the consolidated, stable-ID index.
**Workflow**: audit only this session; fix session(s) reference these IDs.
## Conventions
- Every finding has an ID `PPCBUG-NNN` for cross-referencing.
- **Status**: `open` (audit found it, not yet fixed) | `applied` (fix landed) | `wontfix` (intentional) | `dup-of:NNN` (collapsed into another finding).
- **Severity**:
- **HIGH** = wrong arithmetic / control flow on plausible Xbox 360 user code.
- **MEDIUM** = wrong status flag / latent under broken upstream invariants / edge case.
- **LOW** = test gap / cosmetic / dead-code-only.
- All file:line refs are `xenia-rs/crates/xenia-cpu/src/interpreter.rs` unless otherwise noted.
- Suggested fixes are written as one-line patches where possible; see the per-group report for full context.
## Cross-cutting recommendation
The single recurring root cause is **violating the 32-bit ABI invariant that all GPR writes truncate to 32 bits**. The cleanest fix is to systematically apply `as u32 as u64` at every GPR writeback in every integer ALU op. The existing CA/CR0/OE helpers will then be correct without further changes (because their inputs become guaranteed-clean). The audit reports list each fix individually; the fix session may choose to apply them as one sweep or one-at-a-time.
A defensive secondary recommendation: even after the writeback truncation, instructions whose CA computation does its own internal arithmetic on 64-bit operands (`subfcx`, `subfex`, `addic`, `addicx`, `subficx`) should additionally truncate their compare operands. This guards against any future regression that re-pollutes the GPR file.
---
## Batch 1 — integer ALU (groups 1-5)
Per-group reports: `audit-out/group-01-add-imm.md`, `group-02-add-reg.md`, `group-03-sub-reg.md`, `group-04-multiply.md`, `group-05-divide.md`.
### PPCBUG-001 — addi sign-extension, no truncation
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:114-118
- **Symptom**: `addi rT, r0, -1` (= `li rT, -1`) writes `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`. Identical shape to addis.
- **Fix**:
```rust
ctx.gpr[instr.rd()] = ra_val.wrapping_add(instr.simm16() as i64 as u64) as u32 as u64;
```
- **Test gap**: existing `test_addi` only covers positive simm16. Add a test for `li rT, -1` and verify the upper 32 bits are zero.
### PPCBUG-002 — addic untruncated writeback + 64-bit CA compare
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:133-140
- **Symptom**: (a) GPR writeback not truncated (same shape as addi). (b) CA computed via 64-bit `result < ra` — Canary's `AddDidCarry` explicitly truncates both operands to int32 first.
- **Fix**:
```rust
let ra32 = ra as u32;
let imm = instr.simm16() as i32 as u32;
let result32 = ra32.wrapping_add(imm);
ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;
```
- **Test gap**: zero unit tests for addic.
### PPCBUG-003 — addicx untruncated writeback + 64-bit CA + CR0 regression
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:141-150
- **Symptom**: same as PPCBUG-002 plus a CR0 regression: live code uses `update_cr_signed(0, result as i64)` (64-bit signed). The frozen snapshot in `ppc-manual/alu/addicx.md` shows the previously-correct `result as i32 as i64` form. Live code has drifted.
- **Fix**: PPCBUG-002 fix plus `update_cr_signed(0, result32 as i32 as i64)`.
- **Test gap**: zero unit tests.
- **Note**: confirms the manual's frozen snapshots are useful drift detectors — see if other opcodes have similarly regressed.
### PPCBUG-004 — mulli untruncated 64-bit signed product
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:159-164
- **Symptom**: RA read as full `i64`, product stored as `u64` without truncation. Per ISA in 32-bit ABI, both factors should be i32 and product should fit in 32 bits (overflow silently wraps per ISA).
- **Fix**:
```rust
let ra = ctx.gpr[instr.ra()] as i32 as i64;
let imm = instr.simm16() as i64;
ctx.gpr[instr.rd()] = (ra.wrapping_mul(imm) as u32) as u64;
```
- **Test gap**: zero unit tests.
### PPCBUG-005 — subficx untruncated writeback + 64-bit CA compare
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:151-158
- **Symptom**: (a) `imm.wrapping_sub(ra)` on 64-bit values writes poisoned upper bits; sign-extended `imm` for negative SIMM has bits 32-63 set. (b) CA `imm >= ra` is 64-bit unsigned compare; wrong relative to Canary's 32-bit form.
- **Fix**:
```rust
let ra32 = ra as u32;
let imm32 = instr.simm16() as i32 as u32;
let result32 = imm32.wrapping_sub(ra32);
ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;
```
- **Test gap**: zero unit tests.
### PPCBUG-006 — negx active GPR poisoning + 64-bit OE overflow check
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:319-330
- **Symptom**: (a) `(!ra).wrapping_add(1)` unconditionally sets upper 32 bits to all-ones because `!ra` flips them. Even a clean `r3 = 5` produces `0xFFFFFFFF_FFFFFFFB` instead of `0x00000000_FFFFFFFB`. **This is active, not latent — every neg in 32-bit-ABI code poisons the GPR.** (b) `neg_ov_64` overflow predicate tests `ra == 0x8000_0000_0000_0000` (64-bit INT_MIN) instead of `ra == 0x0000_0000_8000_0000` (32-bit INT_MIN).
- **Fix**:
```rust
let result = (!(ra as u32)).wrapping_add(1);
ctx.gpr[instr.rd()] = result as u64;
if instr.oe() {
overflow::apply(ctx, (ra as u32) == 0x8000_0000);
}
if instr.rc_bit() { ctx.update_cr_signed(0, result as i32 as i64); }
```
- **Test gap**: existing `nego_sets_ov_only_on_int_min` tests 64-bit INT_MIN — add a 32-bit INT_MIN case.
### PPCBUG-007 — subfcx CA via 64-bit unsigned compare
- **Severity**: HIGH (defensive — same shape as the compare that broke addis)
- **Status**: open
- **Location**: interpreter.rs:258
- **Symptom**: `if rb >= ra { 1 } else { 0 }` is the exact 64-bit unsigned compare that the addis bug exploited. Wrong CA when either operand has poisoned upper 32 bits. Apply defensively even if all upstream sources are cleaned, because a wrong CA bit is unrecoverable downstream.
- **Fix**:
```rust
let ra32 = ra as u32;
let rb32 = rb as u32;
let result32 = rb32.wrapping_sub(ra32);
ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;
```
- **Test gap**: zero dedicated unit tests for subfcx — the most critical opcode in Group 3 had no coverage. Add 6+ tests including the exact 0x828F3F98 / 0x828F3F68 case from the addis incident.
### PPCBUG-008 — subfex CA via 64-bit unsigned compare + `!ra` poisons writeback
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:268-284
- **Symptom**: (a) CA `if rb > ra || (rb == ra && ca != 0)` is 64-bit; same shape as PPCBUG-007. (b) Writeback uses `(!ra).wrapping_add(rb).wrapping_add(ca)` — `!ra` always sets upper 32 bits, guaranteed GPR poison even with clean inputs (same shape as PPCBUG-006).
- **Fix**:
```rust
let ra32 = ra as u32;
let rb32 = rb as u32;
let ca = ctx.xer_ca as u32;
let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca);
ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;
```
### PPCBUG-009 — mullwx untruncated 64-bit signed product
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:331-344
- **Symptom**: 32x32 multiply produces 64-bit signed `i64` product, written to GPR via `as u64` without truncation. When product overflows i32 (which `mullw_ov` correctly detects), upper 32 bits are non-zero and corrupt downstream 64-bit unsigned compares — same class as addis.
- **Fix** (one line; OE handler unchanged):
```rust
ctx.gpr[instr.rd()] = product as u32 as u64;
```
### PPCBUG-010 — divwx quotient sign-extended to 64 bits
- **Severity**: HIGH
- **Status**: open (must be applied in same commit as PPCBUG-011)
- **Location**: interpreter.rs:373
- **Symptom**: `(ra / rb) as i64 as u64` sign-extends a negative i32 quotient. `-10 / 3 = -3` writes `0xFFFFFFFF_FFFFFFFD` instead of `0x00000000_FFFFFFFD`. Canary's `InstrEmit_divwx` uses `f.ZeroExtend(v, INT64_TYPE)` — explicit zero-extension.
- **Fix**: `ctx.gpr[instr.rd()] = (ra / rb) as u32 as u64;`
### PPCBUG-011 — divwx CR0 update breaks after PPCBUG-010 fix
- **Severity**: MEDIUM (coupled to PPCBUG-010 — must land together)
- **Status**: open
- **Location**: interpreter.rs:379
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.rd()] as i64)` accidentally works today because the sign-extended GPR has consistent sign in i64 view. After PPCBUG-010, GPR holds `0x00000000_FFFFFFFD` for `-3` and `as i64` reads positive — CR0.LT will be wrong for negative quotients.
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64);`
### PPCBUG-012 — addx writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:167-179
- **Symptom**: 64-bit `wrapping_add` result written to GPR untruncated. Latent: only triggers if upstream operands have poisoned upper 32 bits. With PPCBUG-001 etc. unfixed, that invariant is broken — addx amplifies the poison.
- **Fix**: `ctx.gpr[instr.rd()] = result as u32 as u64;`
### PPCBUG-013 — addcx writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:180-193
- **Fix**: same shape as PPCBUG-012.
### PPCBUG-014 — addex writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:194-209
- **Fix**: same shape as PPCBUG-012.
### PPCBUG-015 — addzex writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:210-224
- **Fix**: same shape as PPCBUG-012.
### PPCBUG-016 — addmex writeback not truncated (latent + edge case)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:225-240
- **Symptom**: same writeback issue plus the `wrapping_sub(1)` produces all-ones upper 32 bits when low 32 bits underflow — guaranteed poison even if inputs are clean (same shape as PPCBUG-006/008).
- **Fix**: truncate operands and result to 32 bits.
### PPCBUG-017 — subfx writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:241-253
- **Fix**: same shape as PPCBUG-012.
### PPCBUG-018 — subfzex writeback not truncated + `!ra` poisons
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:285-302
- **Symptom**: `(!ra).wrapping_add(ca)` flips upper 32 bits — guaranteed poison.
- **Fix**: truncate ra to u32, do arithmetic on u32, write `as u64`.
### PPCBUG-019 — subfmex writeback poisoning + always-true CA edge
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:303-318
- **Symptom**: (a) writeback poisoned via `(!ra)`. (b) CA predicate `(!ra) != 0` is always true when ra has clean upper 32 bits (because `!ra` flips them) — so CA is always 1, even in the documented edge case where 32-bit `ra == 0xFFFFFFFF && ca == 0` should yield CA=0.
- **Fix**: operate on u32, then `xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 }`.
### PPCBUG-020 — CR0 update uses 64-bit signed compare in all sub-register ops
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:250, 264, 281, 299, 315, 327, 341, 379, 396, 410, 419, 428, 445, 462 (every Rc=1 path in groups 2-5)
- **Symptom**: `update_cr_signed(0, result as i64)` views result as 64-bit signed. In 32-bit ABI, bit 31 determines LT/GT, not bit 63. A result like `0x00000000_80000000` is negative in 32-bit but positive in 64-bit — CR0.LT inverted.
- **Fix (catch-all)**: change to `result as u32 as i32 as i64` everywhere. Once PPCBUG-001..-019 truncate writebacks, the upper 32 bits of `result` are zero and this distinction becomes moot — but applying both is cheap and provides defense in depth.
- **Note**: this is one logical fix duplicated across all rc paths; the fix session should grep `update_cr_signed(0, .* as i64)` to find them all.
### PPCBUG-021 — OE overflow checks at bit 63 in all sub-register ops
- **Severity**: LOW
- **Status**: open
- **Locations**: throughout — `add_ov_64`, `sub_ov_64`, `sum_overflow_64`, `mullw_ov`, etc. (defined in `xenia-cpu/src/overflow.rs`)
- **Symptom**: signed-overflow check operates on 64-bit boundary. For 32-bit-ABI ops (`addo`, `subfo`, `subfco`, etc.), should check at bit 31. With PPCBUG-006 a tighter form was given for `negx`. The pattern probably needs systematic review across overflow.rs.
- **Fix**: open a follow-up audit of overflow.rs after batch B completes.
### PPCBUG-022 — mulld_ov missing INT_MIN * -1 edge case
- **Severity**: LOW
- **Status**: open
- **Location**: `xenia-cpu/src/overflow.rs` (`mulld_ov` helper)
- **Symptom**: 64-bit signed multiply overflow check doesn't handle `i64::MIN * -1`.
- **Fix**: add the special case to the helper.
### PPCBUG-023 — andisx CR0 update uses 64-bit signed compare; should use 32-bit
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:475
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` interprets the result as 64-bit signed. The `andisx` result is bounded by `0x0000_0000_FFFF_0000`, which is always non-negative in 64-bit view. In 32-bit ABI, bit 31 is the sign bit — results with bit 31 set (e.g. `andis. rA, rS, 0x8000` with rS=0x80000000 → result=0x80000000) should yield CR0.LT=1, but xenia-rs gives CR0.GT=1. The ppc-manual frozen snapshot for `andisx` shows the correct `as i32 as i64` form; the live code has drifted. Common trigger: `andis. rA, rS, 0x8000` to test the sign bit of a 32-bit word.
- **Fix**:
```rust
ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);
```
- **Test gap**: zero tests for `andisx`. Add at minimum: result with bit 31 set (expect LT=1), result with bits 030 set (expect GT=1), result=0 (expect EQ=1).
---
## Batch 2 — logical immediate (group 6)
Per-group report: `audit-out/group-06-logic-imm.md`.
Group 6 summary: only 1 new bug found. The `simm16` sign-extension pattern does not apply (all ops use `uimm16`). `ori`, `oris`, `xori`, `xoris`, and `andix` are ISA-correct; `andisx` has a CR0 interpretation bug (PPCBUG-023). All 6 opcodes have inadequate test coverage (LOW gaps for 5 of them, MEDIUM gap for `andisx` tied to the bug).
---
## Batch 3 — word rotate-and-mask (group 9)
Per-group report: `audit-out/group-09-word-rotate.md`.
Group 9 summary: core arithmetic is clean — `rlw_mask`, rotate logic, and result write are all ISA-correct. The single recurring defect is the Rc=1 CR0 path using `as i64` instead of `as u32 as i32 as i64` (instances of PPCBUG-020 specific to these three opcodes). `rlwimix` zeroes the upper 32 bits of RA instead of preserving them per ISA, but this is safe under 32-bit ABI invariant and classified LOW. Test coverage is poor: 1 partial test for `rlwinmx`, zero for the other two.
### PPCBUG-024 — rlwinmx CR0 update uses 64-bit signed compare; should use 32-bit
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:667
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` — result is a zero-extended u32, so bit 31 set yields +2147483648 in 64-bit signed view but -2147483648 in 32-bit ABI. CR0.LT/GT inverted for results with bit 31 set. `rlwinm.` is the most common dot-form instruction in compiler output (all `slwi.`, `srwi.`, `clrlwi.`, bitfield-test-and-branch idioms).
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
- **Test gap**: `test_rlwinm` exists but non-Rc only, result has bit 31 clear. Add Rc=1 tests with bit 31 set in result.
### PPCBUG-025 — rlwimix CR0 update uses 64-bit signed compare; should use 32-bit
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:679
- **Symptom**: same class as PPCBUG-024. `rlwimi.` is compiler-generated for struct bitfield writes; when the inserted value occupies or sets bit 31 of RA, CR0.LT is wrong.
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
- **Test gap**: zero tests for `rlwimix`. Add basic insert (non-Rc) + Rc=1 with bit-31-set case.
### PPCBUG-026 — rlwnmx CR0 update uses 64-bit signed compare; should use 32-bit
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:690
- **Symptom**: same class as PPCBUG-024. `rlwnm.` is less frequent but used in variable-shift normalisation patterns.
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
- **Test gap**: zero tests for `rlwnmx`.
### PPCBUG-027 — rlwimix zeroes upper 32 bits of RA instead of preserving them (ISA deviation, LOW)
- **Severity**: LOW
- **Status**: open (no fix action required for 32-bit ABI emulation)
- **Location**: interpreter.rs:677-678
- **Symptom**: `let ra = ctx.gpr[instr.ra()] as u32` discards upper 32 bits; result written as `as u64` zero-extends. Per ISA, `(RA) & ¬MASK(MB+32, ME+32)` preserves upper 32 bits of RA. Canary confirms: `f.And(f.LoadGPR(i.M.RA), f.LoadConstantUint64(~m))` with `~m` non-zero in upper half.
- **Impact**: under 32-bit ABI, if the 32-bit GPR invariant holds, upper 32 bits of RA are already zero before `rlwimix`, so both behaviours are identical. The deviation is only observable if an upstream bug (PPCBUG-001..023) has leaked non-zero upper bits into RA — in which case `rlwimix` would silently clean them (beneficial side-effect). No isolated fix needed; resolves automatically when upstream bugs are fixed.
- **Note**: if 64-bit mode support is ever added, this will become a HIGH bug.
---
## Batch 2 — logical register (group 7) [renumbered from collision]
Per-group report: `audit-out/group-07-logic-reg.md` (note: report uses original IDs PPCBUG-023..029 from the subagent's local numbering; tracker uses PPCBUG-028..033 here to avoid collision with groups 6 and 9).
The group 7 subagent also flagged a CR0 regression across all 8 opcodes — that is an extension of PPCBUG-020 (catch-all for CR0 64-bit-signed regressions). Adding andx, andcx, orx, orcx, xorx, norx, nandx, eqvx Rc=1 paths to PPCBUG-020's scope rather than creating a new ID.
### PPCBUG-028 — orcx active GPR poisoning
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:509-513
- **Symptom**: writes `rs | !rb`. Rust's `!` on `u64` flips all 64 bits — the upper 32 bits of `!rb` are unconditionally all-ones, OR'd into the result. With clean inputs `orc r5, r3, r4` writes `0xFFFFFFFF_xxxxxxxx`. Active poisoning, same shape as PPCBUG-006/008.
- **Fix**: operate on u32, write `as u64`:
```rust
let result = (ctx.gpr[instr.rs()] as u32) | !(ctx.gpr[instr.rb()] as u32);
ctx.gpr[instr.ra()] = result as u64;
```
- **Test gap**: zero tests.
### PPCBUG-029 — norx active GPR poisoning (the `not` simplified mnemonic)
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:519-523
- **Symptom**: writes `!(rs | rb)` — outer `!` flips upper 32 bits unconditionally. **`nor rA, rS, rS` is the canonical `not` simplified mnemonic** used pervasively in PPC code; every `not` in 32-bit-ABI Xbox 360 binaries actively poisons the GPR.
- **Fix**: u32 arithmetic, write `as u64`.
### PPCBUG-030 — nandx active GPR poisoning
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:524-528
- **Symptom**: writes `!(rs & rb)` — same shape as norx. The simplified mnemonic `nand` is also `nand rA, rS, rS` (= `nor . . .` in some assemblers).
- **Fix**: u32 arithmetic.
### PPCBUG-031 — eqvx active GPR poisoning
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:529-533
- **Symptom**: writes `!(rs ^ rb)` — same shape. The idiom `eqv rA, rS, rS` "set rA to all-ones (i.e. -1 in 32-bit ABI)" produces `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`.
- **Fix**: u32 arithmetic.
### PPCBUG-032 — andx / orx / xorx writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:494-498 (andx), 504-508 (orx), 514-518 (xorx)
- **Symptom**: 64-bit bitwise on full GPR values. Latent — clean if both operands are clean; pollutes if either is poisoned upstream.
- **Fix**: `as u32 as u64` truncation at writeback. Once all upstream poison sources are fixed, these become unnecessary; until then, defensive truncation.
### PPCBUG-033 — andcx active poisoning via `!rb` sub-expression
- **Severity**: MEDIUM (the `!rb` always poisons; outer `&` masks it away when rs is clean — fully active when rs is poisoned)
- **Status**: open
- **Location**: interpreter.rs:499-503
- **Symptom**: writes `rs & !rb`. The `!rb` always has all-ones upper bits; if rs has clean upper bits (zero), the result is clean. If rs is poisoned upstream, the poison propagates AND the always-set bits in `!rb` make it look "guaranteed". This is closer to active than latent.
- **Fix**: `(rs as u32) & !(rb as u32)` then `as u64`.
## Batch 2 — sign-extend / count-leading-zeros (group 8) [renumbered]
Per-group report: `audit-out/group-08-extend-clz.md` (report uses local IDs PPCBUG-023..030; tracker uses PPCBUG-034..039).
### PPCBUG-034 — extsbx writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:537
- **Symptom**: `as i8 as i64 as u64` — a byte with high bit set (0x80) writes `0xFFFFFFFF_FFFFFF80` instead of `0x00000000_FFFFFF80`. Active poisoning on every negative byte. `extsb` is emitted by compilers to canonicalize signed-byte arguments — common code path.
- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i8 as i32 as u32 as u64;`
- **Test gap**: zero unit tests.
- **Note**: Canary's JIT does the same sign-extension but is rescued by x86's 32-bit-write zeroing the upper 32 of host registers. Pure interpreter has no such escape.
### PPCBUG-035 — extshx writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:542
- **Symptom**: `as i16 as i64 as u64` — same shape as PPCBUG-034 for halfwords.
- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i16 as i32 as u32 as u64;`
### PPCBUG-036 — extsbx CR0 coupling
- **Severity**: MEDIUM (must land in same commit as PPCBUG-034)
- **Status**: open
- **Location**: interpreter.rs:538
- **Symptom**: `update_cr_signed(0, ra as i64)` — currently latent because the unfixed sign-extended value's i64 sign matches bit 7 of the byte. After PPCBUG-034 lands, the truncated value's i64 view becomes always non-negative — CR0.LT will never fire for negative byte results.
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);` — must land with PPCBUG-034.
### PPCBUG-037 — extshx CR0 coupling
- **Severity**: MEDIUM (must land with PPCBUG-035)
- **Status**: open
- **Location**: interpreter.rs:543
- **Symptom**: same coupling shape as PPCBUG-036 for halfwords.
### PPCBUG-038 — extswx ISA-correct, document asymmetry
- **Severity**: LOW (informational / wontfix)
- **Status**: wontfix
- **Location**: interpreter.rs:547
- **Symptom**: `as i32 as i64 as u64` produces full 64-bit sign-extension. This IS the documented purpose of extsw — argument-register canonicalization in 64-bit mode. Behavior is intentional. After PPCBUG-034/035 land, document the asymmetry with extsb/extsh in a comment.
### PPCBUG-039 — cntlzdx counts upper 32 always-zero bits in 32-bit ABI
- **Severity**: LOW
- **Status**: open (probably dead code in Xbox 360 binaries)
- **Location**: interpreter.rs:556-562
- **Symptom**: counts leading zeros in full 64. If a 32-bit-ABI binary emits cntlzd, the result is `32 + cntlzw(low32)` not `cntlzw(low32)`. ISA-correct for 64-bit mode; only matters if the binary actually emits it.
- **Test gap**: zero tests.
#### Clean opcodes from group 8
- `cntlzwx` (interpreter.rs:551-555) — `(rs as u32).leading_zeros()` reads only low 32 bits, result range 0..=32, upper 32 zero. CR0 path benign because result is small. **Test gap only**, LOW.
- `extswx` CR0 path is correct per ISA (PPCBUG-038 wontfix).
## Batch 2 — shift (group 11) [renumbered]
Per-group report: `audit-out/group-11-shift.md` (uses local IDs PPCBUG-050..055; tracker uses PPCBUG-040..045).
### PPCBUG-040 — DECODER BUG: `sh64()` wrong bit order for sradi (HIGH)
- **Severity**: HIGH (this is a decoder-level bug, file:line is in `decoder.rs` not `interpreter.rs`)
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `xenia-rs/crates/xenia-cpu/src/decoder.rs:91-93` (the `sh64()` accessor method on `DecodedInstr`)
- **Symptom**: the XS-form `sradix` (sradi) shift amount is assembled as `SH[4:0] << 1 | SH[5]` instead of the correct `SH[5] << 5 | SH[4:0]`. **Every `sradi rA, rS, N` instruction where N is not 0 or 63 executes with a completely wrong shift count.** Example: `sradi rA, rS, 32` shifts by 1 instead. This is a silent, structural mis-decoding — none of the interpreter changes can paper over it.
- **Cross-reference**: Canary's `(i.XS.SH5 << 5) | i.XS.SH` pattern is the correct ISA encoding.
- **Fix**: in `decoder.rs:sh64()` body, swap the bit order:
```rust
pub fn sh64(&self) -> u32 {
// SH5 is at bit 30 of the encoded word; SH[4:0] is at bits 16-20.
let sh_lo = extract_bits(self.raw, 16, 20);
let sh_hi = extract_bits(self.raw, 30, 30);
(sh_hi << 5) | sh_lo
}
```
- **Impact**: `sradi` is used by compilers for arithmetic right shifts on 64-bit values. In Xbox 360 32-bit-ABI binaries it should not be common, but it's emitted by some compilers for sign-magnitude conversions and 64-bit fixed-point arithmetic. **This is the kind of silent decoder bug the user explicitly wanted the audit to catch.**
- **Test gap**: no decoder unit test pins `sh64()` for non-trivial SH values. Add fixture cases in `disasm_goldens.rs` for `sradi rA, rS, 1`, `sradi rA, rS, 32`, `sradi rA, rS, 63`.
- **Note**: any other instruction that uses the same XS-form SH split-encoding is suspect. Phase C decoder audit must verify `sradi` and `sradix` are the only consumers of `sh64()`.
### PPCBUG-041 — srawx writeback sign-extends to 64 bits
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:583, 588 (two writeback paths for the count<32 and count>=32 branches)
- **Symptom**: `result as i64 as u64` violates the 32-bit-ABI zero-extension convention. A negative shifted value writes `0xFFFFFFFF_xxxxxxxx` instead of `0x00000000_xxxxxxxx`.
- **Fix**: `result as u32 as u64` in both writeback paths.
- **Note**: subagent verified the CA computation is **independently correct** — uses `(rs as u32) << (32 - sh) != 0` which is the canonical ISA shifted-out-bits test on 32-bit operands. **Do not change CA logic.**
### PPCBUG-042 — srawix writeback sign-extends to 64 bits
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:600, 605 (same shape as PPCBUG-041 for srawi)
- **Fix**: `result as u32 as u64`.
### PPCBUG-043 — srawx / srawix CR0 coupling
- **Severity**: MEDIUM (must land with PPCBUG-041 and PPCBUG-042)
- **Status**: open
- **Locations**: interpreter.rs:593, 607
- **Symptom**: currently masked by the sign-extended writeback (sign-extension makes the 64-bit and 32-bit sign agree). After truncating the writeback, `as i64` will misread the sign for negative results.
- **Fix**: `as u32 as i32 as i64` in both Rc=1 paths, applied with PPCBUG-041/042.
### PPCBUG-044 — slwx / srwx CR0 misclassifies negative 32-bit results
- **Severity**: LOW (zero-extended results have bit 31 set in low 32, but always positive in i64 view → CR0.LT never fires for slw/srw with bit-31-set results)
- **Status**: open
- **Locations**: interpreter.rs:568, 576
- **Fix**: `as u32 as i32 as i64`.
### PPCBUG-045 — Zero unit tests for any shift opcode
- **Severity**: LOW (test gap only)
- **Status**: open
- **Locations**: interpreter.rs:563-658 (entire shift group: slwx, srwx, srawx, srawix, sldx, srdx, sradx, sradix)
- **Recommendation**: add at least one functional test per opcode. Especially: `srawix r3, r3, 1` with rs=0xFFFFFFFE (CA should be 0), `srawix r3, r3, 1` with rs=0x80000001 (CA should be 1, result=0xC0000000); `sradix r3, r3, 32` (currently wrong per PPCBUG-040).
#### Clean opcodes from group 11
- `slwx` writeback at line 568 (zero-ext 32-bit result via `(rs as u32 << count) as u64`) — clean.
- `srwx` writeback at line 576 — clean.
- `sldx`, `srdx`, `sradx` — 64-bit ops, ISA-correct (probably dead in 32-bit-ABI binaries).
- `sradix` body logic is structurally correct; failure is solely from PPCBUG-040 giving it a wrong shift count.
## Batch 2 — doubleword rotate (group 10) [renumbered]
Per-group report: `audit-out/group-10-dword-rotate.md` (uses local IDs PPCBUG-027/028; tracker uses PPCBUG-046/047).
### PPCBUG-046 — DECODER BUG: wrong bit position for MB[5] in all 6 doubleword-rotate opcodes (HIGH)
- **Severity**: HIGH (decoder-level; impacts the canonical zero-extend-to-32 idiom)
- **Status**: applied (52b05b1, 2026-05-01)
- **Locations**: interpreter.rs — every arm of `rldiclx`, `rldicrx`, `rldicx`, `rldimix`, `rldclx`, `rldcrx` (lines 693-754)
- **Symptom**: each arm computes `let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1)`. The bit at `(instr.raw >> 1) & 1` is **PPC bit 30**, which in MD form is `sh[0]` (the low bit of the shift amount) — NOT `mb[5]`. The high bit of the 6-bit MB field lives at PPC bit 26 = `(instr.raw >> 5) & 1`.
As written, the code computes `(mb[4:0] << 1) | sh[0]`. Ironically `disasm.rs:1256` (the `mb_md()` helper) has the correct formula. The interpreter was written independently with the wrong bit position — probably a copy-error from `sh64()` where bit 30 really is the split bit.
- **Concrete impact**:
- `clrldi r3, r4, 32` is the canonical "zero-extend low 32 bits" idiom emitted constantly in 32-bit-ABI PPC code. Encoded as `rldicl r3, r4, 0, mb=32`. With mb=32, `mb[5]=1, mb[4:0]=0`. The interpreter decodes mb=0 → mask is all-ones → instruction becomes a no-op. Any downstream 64-bit compare (subfcx CA, cmpld) on that register sees a polluted 64-bit value instead of a clean 32-bit zero-extended one. **This is the same class of bug that caused the addis/BST incident.**
- For `rldcr` (MDS form), the XO field's LSB at bit 30 is always 1 (Rc=0 opcode), so `me[5]` is forcibly set to 1 for every non-record-form invocation — effectively adding 32 to all me values.
- **Fix** (one line per opcode):
```rust
// Replace in all 6 arms:
let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1);
// With:
let mb = instr.mb() | (((instr.raw >> 5) & 1) << 5);
```
Or, cleaner: expose `mb_md()` (currently in disasm.rs:1256) as a method on `DecodedInstr` in `decoder.rs` and have the interpreter call `instr.mb_md()` — single source of truth for MD-form mb extraction.
- **Test gap**: zero execution tests for any of the 6 opcodes; only disasm-golden string-output tests.
- **Note**: this is the second decoder bug found by the audit (PPCBUG-040 / `sh64()` for `sradi` is the first). Phase C decoder audit must verify whether other MD/MDS/XS form accessors have similar bit-position errors.
### PPCBUG-047 — Zero execution tests for any doubleword-rotate opcode
- **Severity**: LOW (test gap)
- **Status**: open
- **Locations**: interpreter.rs:693-754 (all 6 opcodes)
- **Recommendation**: at minimum, a `clrldi r3, r4, 32` test verifying the result is exactly the low 32 bits of r4. After PPCBUG-046 lands, this test would have caught the MB-reconstruction bug.
#### What's correct in group 10
- `sh64()` accessor — correctly reconstructs 6-bit shift from MD split encoding (cross-check: `disasm.rs` agrees).
- `rld_mask_left()` / `rld_mask_right()` mask helpers — verified against Canary's XEMASK.
- `rldicx`/`rldimix` mask formulas (`63 - sh` for right edge) — correct.
- `rldimix` read-modify-write merge — correct 64-bit mask-insert.
- CR0 `as i64` — correct here because these ARE genuine 64-bit ops (unlike word rotate).
- `rldcl`/`rldcr` register-shift extraction (`gpr[rb] & 0x3F`) — correct.
- No 32-bit writeback truncation needed: these are intentionally 64-bit; 32-bit-ABI compilers only emit them with masks that yield 32-bit-clean results.
## Batch 3 — branch (group 13)
Per-group report: `audit-out/group-13-branch.md`.
Group 13 summary: the branch implementation is substantively correct. All BO/BI bit masks,
CTR decrement-before-test ordering, AA absolute vs relative dispatch, LK unconditional write
(including not-taken path in `bcx`), LR-read-before-LR-write atomicity in `bclrx`, and
`get_cr_bit()` field indexing are all ISA-correct and match Canary. The only execution bugs
are a latent 64-bit CTR zero-test (PPCBUG-053/054, active under current GPR-pollution environment)
and severely thin test coverage (PPCBUG-055).
### PPCBUG-053 — CTR zero-test uses 64-bit compare; should use 32-bit in `bcx`/`bclrx`
- **Severity**: MEDIUM (effectively HIGH given unfixed PPCBUG-001..031 GPR pollution)
- **Status**: applied (3d8e2ce, 2026-05-02)
- **Locations**: `interpreter.rs:849` (`bcx` `ctr_ok`), `interpreter.rs:879` (`bclrx` `ctr_ok`)
- **Symptom**: `ctx.ctr != 0` compares all 64 bits. In 32-bit ABI the CTR is logically 32-bit.
Canary explicitly truncates to 32 bits: `ctr = f.Truncate(ctr, INT32_TYPE)`. When CTR upper
32 bits are non-zero (due to upstream GPR pollution flowing through `mtspr CTR, rN`), the
64-bit test disagrees with the 32-bit ISA semantic. Most dangerous with `neg; mtctr; bdnz`:
`negx` (PPCBUG-006) always sets upper 32 bits, so the 32-bit CTR counter can reach zero
while the 64-bit CTR is still non-zero → infinite loop.
- **Fix**:
```rust
// Replace in both bcx and bclrx:
let ctr_ok = (bo & 0b00100) != 0
|| (((ctx.ctr as u32) != 0) ^ ((bo & 0b00010) != 0));
```
Or, alternatively, truncate at decrement:
```rust
if bo & 0b00100 == 0 {
ctx.ctr = ctx.ctr.wrapping_sub(1) as u32 as u64;
}
```
- **Test gap**: zero tests for CTR-decrement branches (bdnz, bdz, bdnzt, bdnzf, bdzt, bdzf).
### PPCBUG-054 — `mtspr CTR` writeback not truncated to 32 bits
- **Severity**: MEDIUM
- **Status**: applied (3d8e2ce, 2026-05-02)
- **Location**: `interpreter.rs:1411`
- **Symptom**: `crate::context::spr::CTR => ctx.ctr = val` writes the full 64-bit GPR to CTR.
Acts as a firewall gap: any upstream 64-bit GPR pollution flows directly into CTR, where it
will be tested by PPCBUG-053's 64-bit comparison. Defensive fix prevents CTR from ever
acquiring non-zero upper 32 bits independently of the GPR-pollution fix.
- **Note**: the `bcctrx` branch-target read (`(ctx.ctr as u32) & !3`) already truncates
correctly; the bug is confined to the `ctr != 0` zero-test in `bcx`/`bclrx`.
- **Fix**: `crate::context::spr::CTR => ctx.ctr = val as u32 as u64,`
- **Cross-reference**: Group 16 (SPR/MSR) subagent should verify this write-point.
### PPCBUG-055 — Severely inadequate test coverage for all four branch opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Locations**: `interpreter.rs` test module (lines 44554491)
- **Current coverage**: `bx` forward (1 test), `bl` LR update (1 test), `bcx` taken beq (1 test via `test_cmp_and_bc`). Zero tests for: `bclrx`, `bcctrx`, any CTR-decrement variant, not-taken path, backward branch, AA=1 absolute, `bcl` LR-write-on-not-taken.
- **Recommended minimum**: blr, bctr, bdnz (taken and not-taken at boundary CTR=1), bclrl old-LR-as-target, bcl LK-write-on-not-taken. See per-group report for concrete encoding patterns.
---
## Batch 3 — trap + system call (group 14)
Per-group report: `audit-out/group-14-trap-sc.md`.
Group 14 summary: the core trap evaluation (`trap.rs`) is correct — TO bit constants, signed/unsigned
comparison dispatch, and word-vs-doubleword width handling are all ISA-conformant. The live interpreter
arm properly evaluates the TO field (replacing the old unconditional-trap stub). Three MEDIUM issues
found: PC ordering on trap return, missing LEV dispatch for `sc`, and the Xbox 360 typed-trap
convention (`twi 31, r0, IMM`) not handled. Two LOW findings for stale manual snapshots and test gaps.
### PPCBUG-063 — `ctx.pc` already at CIA+4 when `StepResult::Trap` returns
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:1543 (`ctx.pc += 4`) before interpreter.rs:1549 (`return StepResult::Trap`)
- **Symptom**: any trap handler that reads `ctx.pc` to find the faulting instruction sees CIA+4 instead
of CIA. The existing `tracing::warn!` compensates with `.wrapping_sub(4)`, confirming the asymmetry.
On real hardware, SRR0 = CIA (trapping instruction address). Current risk LOW (no handler inspects
pc), but HIGH if any SEH/exception-delivery path is added (critical for the C++ throw investigation).
- **Fix**: save CIA before incrementing, restore it when firing the trap:
```rust
let trap_pc = ctx.pc;
ctx.pc += 4;
if fired { ctx.pc = trap_pc; return StepResult::Trap; }
```
Alternatively store CIA in a separate `ctx.srr0`-equivalent field and leave `ctx.pc` at NIA.
- **Note**: `sc` correctly leaves `ctx.pc` at NIA (the return address) — that is a different and
correct design choice. The inconsistency between sc and trap is the bug.
### PPCBUG-064 — `sc` ignores `LEV` field; `sc 2` (HVcall) silently misdispatched
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:915-918
- **Symptom**: `sc 2` (Xbox 360 hypervisor call) returns `StepResult::SystemCall` identically to
`sc 0`. Canary dispatches LEV=0 to `syscall_handler` and LEV=2 to `f.function()` (the HVcall
path). For pure game-title code (LEV=0 only) this is invisible; XDK kernel-mode components and
some HV-aware titles may use `sc 2`.
- **Fix**: decode the 7-bit LEV field (bits 20-26 of SC-form encoding), add a `HypervisorCall`
variant to `StepResult`, and dispatch accordingly.
### PPCBUG-065 — `twi 31, r0, IMM` typed-trap not handled; SIMM type code discarded
- **Severity**: MEDIUM
- **Status**: open
- **Location**: interpreter.rs:1532-1551 (trap arm)
- **Symptom**: `twi 31, r0, IMM` (TO=31=unconditional, RA=r0) is used by the Xbox 360 CRT/kernel
to encode typed C++ exceptions — the 16-bit SIMM carries the exception type discriminator. xenia-rs
fires the trap correctly but discards SIMM. The caller sees a generic `StepResult::Trap` with no
type information, preventing correct C++ SEH dispatch.
- **Canary reference**: `ppc_emit_control.cc:611-616` special-cases `RA==0 && TO==31` and calls
`f.Trap(type)` with the SIMM as the type code.
- **Fix**: add a `trap_type: Option<u16>` payload to `StepResult::Trap`. Detect `twi` with `to()==31`
and `ra()==0` and populate it with `instr.simm16() as u16`.
- **Note**: directly relevant to the Sylpheed `std::runtime_error` throw investigation
(project_xenia_rs_sylpheed_throw_2026_04_28.md) — the typed-trap SIMM carries the CRT exception
class that the kernel uses to route to the correct handler.
### PPCBUG-066 — Stale frozen snapshots in ppc-manual for td/tdi/tw/twi
- **Severity**: LOW
- **Status**: open
- **Location**: `ppc-manual/branch/td.md`, `tdi.md`, `tw.md`, `twi.md`
- **Symptom**: all four show the old unconditional-trap stub (`// For now, just trace and continue`)
instead of the current TO-field-evaluating implementation.
- **Fix**: regenerate after PPCBUG-063 and PPCBUG-065 are resolved.
### PPCBUG-067 — Test gaps for trap and sc
- **Severity**: LOW
- **Status**: open
- **Location**: interpreter.rs `#[cfg(test)] mod tests`
- **Missing coverage**: `sc` smoke test (fires SystemCall, advances PC); `td` vs `tw` on 64-bit-clean
operands (width discrimination); `tdi`/`td` signed/unsigned LT/GT conditions; `tw 31, r0, r0`
unconditional `trap` encoding; `twi 31, r0, N` typed-trap; negative simm16 in `twi`.
---
## Batch 3 — SPR / MSR / TB / FPSCR / VSCR moves (group 16)
Per-group report: `audit-out/group-16-spr-msr.md`.
Group 16 summary: the core paths are clean — `mfcr`, `mtcrf`, `mfspr`, `mtspr`, `mftb`, `mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`, `mfvscr`, `mtvscr` are all functionally ISA-correct. The `spr()` decoder accessor correctly inverts the PPC XFX half-swap encoding. The one MEDIUM finding is `mtmsrd` silently ignoring the `L=1` partial-MSR-write semantics. Five LOW test-gap findings cover near-total absence of unit tests for this entire group.
### PPCBUG-078 — `mtmsrd` L=1 partial-MSR-write not modelled
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:1458-1461`
- **Symptom**: xenia-rs merges `mtmsr` and `mtmsrd` into a single body that unconditionally writes `ctx.msr = ctx.gpr[instr.rs()]`. PowerISA specifies that `mtmsrd` with instruction bit 15 (`L`) = 1 performs a partial update: only `MSR[EE]` (u64 bit 15) and `MSR[RI]` (u64 bit 0) are modified; all other MSR bits preserved. Kernel code using `mtmsrd L=1` to re-enable external interrupts silently corrupts the entire MSR in xenia-rs. Canary acknowledges the same TODO.
- **Fix**:
```rust
PpcOpcode::mtmsrd => {
let l = (instr.raw >> (31 - 15)) & 1;
if l == 1 {
let mask: u64 = (1u64 << 15) | 1u64;
let rs = ctx.gpr[instr.rs()];
ctx.msr = (ctx.msr & !mask) | (rs & mask);
} else {
ctx.msr = ctx.gpr[instr.rs()];
}
ctx.pc += 4;
}
```
- **Test gap**: zero tests for `mtmsr` or `mtmsrd`.
### PPCBUG-079 — `mtspr` silent drop of unknown-SPR writes without value logging
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:1430-1433`
- **Symptom**: Unknown SPR writes are silently discarded with only a `tracing::warn!()` that omits the value being written. Reduces debuggability; no correctness impact for known Xbox 360 titles.
- **Fix** (optional): `tracing::warn!("mtspr: unimplemented SPR {} <= 0x{:016x}", spr, val)`.
### PPCBUG-080 — `mfvscr` does not zero the upper 96 bits of VD per ISA
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:2198-2201`
- **Symptom**: ISA requires `mfvscr VD` to place VSCR in the rightmost word of VD and zero bytes 0-11. xenia-rs copies the full 128-bit `ctx.vscr` into `ctx.vr[VD]`, leaving stale data in bytes 0-11 if `ctx.vscr` was populated from a non-zeroed vector. Canary explicitly zero-extends.
- **Fix**:
```rust
PpcOpcode::mfvscr => {
let vscr_word = ctx.vscr.as_u32x4()[3];
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array([0, 0, 0, vscr_word]);
ctx.pc += 4;
}
```
### PPCBUG-081 — Zero unit tests for `mfcr` / `mtcrf`
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:1436-1453`
- **Recommended additions**: full mfcr round-trip; `mtcrf 0xFF`; `mtcrf 0x80` (CR0 only); `mtcrf 0x38` (ABI CR2|CR3|CR4 restore).
### PPCBUG-082 — Minimal unit tests for `mfspr` / `mtspr`
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:1376-1435`
- **Note**: only DEC and TBL_WRITE covered; add LR, CTR, XER, TBL/TBU, VRSAVE.
### PPCBUG-083 — Zero unit tests for `mftb`
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:1462-1470`
### PPCBUG-084 — Zero interpreter-level round-trip tests for FPSCR move instructions
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:2678-2720`
- **Note**: `fpscr.rs` helper-level tests exist; interpreter dispatch (`mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`) is untested end-to-end.
### PPCBUG-085 — Zero unit tests for `mfvscr` / `mtvscr`
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:2198-2205`
IDs PPCBUG-086 and PPCBUG-087 are unallocated — reserved for group 16 follow-up findings.
---
## Batch 3 — cache + sync (group 17)
Per-group report: `audit-out/group-17-cache-sync.md`.
Group 17 summary: the cleanest group audited so far. Both `dcbz` and `dcbz128` have correct EA computation (ra=0 special case, 64-bit→u32 truncation, alignment masks `& !31` / `& !127`, byte counts 32/128). The nine no-op opcodes (dcbf, dcbi, dcbst, dcbt, dcbtst, icbi, sync, eieio, isync) are all listed in one arm and complete. The `dcbz128` Xbox 360 specific opcode (RT=1 bit distinguishes from dcbz) dispatches correctly. **0 HIGH, 0 MEDIUM, 2 LOW** findings.
### PPCBUG-088 — sync disasm ignores L field; `lwsync` (L=1) shows as "sync"
- **Severity**: LOW
- **Status**: open
- **Location**: `xenia-rs/crates/xenia-cpu/src/disasm.rs:364`
- **Symptom**: The `PpcOpcode::sync` disasm arm outputs `"sync"` unconditionally regardless of the L field (PPC bit 10). When L=1 (word `0x7C2004AC`), the instruction should disassemble as `"lwsync"`. The `extended_mnemonics.json` golden already accepts `"sync"` as output for the lwsync case, meaning the test currently passes with the wrong string.
- **Impact**: Disassembly output for `lwsync` (very common in Xbox 360 acquire-barrier idioms) shows as `sync`. No interpreter impact; both L=0 and L=1 are correctly treated as no-op PC advance.
- **Fix**:
```rust
PpcOpcode::sync => {
// L field at PPC bit 10
if extract_bits(instr.raw, 10, 10) == 1 {
base("lwsync", String::new(), 0)
} else {
base("sync", String::new(), 0)
}
}
```
Update `extended_mnemonics.json` golden to add `"ext_mnemonic": "lwsync"` for that entry.
### PPCBUG-089 — Zero interpreter execution tests for group 17
- **Severity**: LOW
- **Status**: open
- **Location**: `xenia-rs/crates/xenia-cpu/src/interpreter.rs` (test module)
- **Symptom**: No `#[test]` covers `dcbz`, `dcbz128`, or any no-op (sync/isync/eieio/dcbf/icbi). A regression in dcbz byte count or alignment would go undetected.
- **Recommended additions**: `dcbz` with misaligned address (verifies 32-byte aligned zero), `dcbz128` with misaligned address (verifies 128-byte aligned zero), both ra=0 and ra!=0 cases, `sync`/`isync`/`dcbf` no-op PC-advance smoke tests.
---
## Batch 3 — CR logical + CR moves (group 15)
Per-group report: `audit-out/group-15-cr-logical.md`.
Group 15 summary: **cleanest group audited to date**. All 8 CR logical ops (`crand`, `crandc`,
`creqv`, `crnand`, `crnor`, `cror`, `crorc`, `crxor`), `mcrf`, and `mcrxr` are ISA-correct.
The `cr_logical` helper's use of `fn(bool, bool) -> bool` prevents the `!u64` bit-pollution class
(PPCBUG-028031 in group 7). CR bit indexing in `get_cr_bit`/`set_cr_bit` is correct (bit/4 =
field, bit%4 = within-field sub-index matching PPC MSB-0 numbering, with sub `{0=LT, 1=GT, 2=EQ,
3=SO}`). `mcrxr` correctly maps XER{SO,OV,CA} to CR{LT,GT,EQ} with SO=false and unconditionally
clears the XER bits. `mcrfs` nibble extraction, field shift formula (`28 - crfs*4`), and
CLEARABLE_MASK (all 14 ISA-clearable exception bits, no FEX/VX) are all correct. One MEDIUM ISA
violation: `mcrfs` omits VX summary recomputation. Two LOW findings: a misleading test comment and
zero coverage for all 8 CR logical ops + `mcrf`.
### PPCBUG-068 — `mcrfs` does not recompute VX summary bit after clearing VX* exception bits
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:4250` (`ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK)`)
- **Symptom**: When `mcrfs` clears VX* exception bits (VXSNAN, VXISI, VXIDI, VXZDZ, VXIMZ,
VXVC, VXSOFT, VXSQRT, VXCVI) from any source field, the VX summary bit (FPSCR[2], `fpscr::VX
= 1<<29`) is left stale. If those VX* bits were the only contributors to VX, it should become
0 but remains 1. A subsequent `mcrfs cr0, 0` will then report VX=1 in CR0.EQ, misleading the
caller into thinking an invalid-operation exception is still active.
- **Fix**:
```rust
// After ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK); add:
if (ctx.fpscr & fpscr::VX_ALL) != 0 {
ctx.fpscr |= fpscr::VX;
} else {
ctx.fpscr &= !fpscr::VX;
}
// FEX recomputation omitted — xenia doesn't model enabled-exception dispatch.
```
- **Test gap**: existing test only covers crfS=0 (FX+OX) — no VX* bits involved. Add a test
that sets only VXSNAN, runs `mcrfs cr0, 1`, then verifies VX is now 0.
### PPCBUG-069 — `mcrfs` test comment claims OX(so)=0 but OX is set in the test
- **Severity**: LOW (cosmetic; the assert is correct, only the comment is wrong)
- **Status**: open
- **Location**: `interpreter.rs:5402`
- **Symptom**: Comment reads `"FX(lt)=1 and OX(so)=0"`. FPSCR was set to `(1<<31)|(1<<28)`,
which sets both FX and OX. The nibble is `0b1001`, so `so=true`. The assert `cr[2].as_u8()
== 0b1001` is correct; only the comment is wrong.
- **Fix**: `// FX(lt)=1, FEX(gt)=0, VX(eq)=0, OX(so)=1 → 0b1001 = 9`
### PPCBUG-070 — Zero execution tests for all 8 CR logical ops and `mcrf`
- **Severity**: LOW (test gap)
- **Status**: open
- **Locations**: `interpreter.rs:14731484`
- **Missing minimum**: `crclr` idiom (`crxor BT,BT,BT`, BT=1 → 0), `crset` idiom
(`creqv BT,BT,BT`, BT=0 → 1), `crmove` idiom (`cror BT,BA,BA`), `crnot` idiom
(`crnor BT,BA,BA`, BA=1 → 0), cross-field `crand`/`crandc`, and a full `mcrf
cr0, cr3` field-copy + source-field-intact test.
---
## Pre-pass hints REFUTED by audit
These were flagged by the orchestrator's regex scan but the subagents found them to be safe:
- **`divwux` writeback** (interpreter.rs:390) — both operands cast to `u32` before division, `as u64` zero-extends correctly. **Clean.**
- **`mulhwx` intermediate cast** (interpreter.rs:349) — `((result >> 32) as i32 as i64 as u64) & 0xFFFF_FFFF` is redundant but the trailing mask saves correctness. Cosmetic only.
- **`mulhwux` writeback** (interpreter.rs:359) — `(result >> 32) & 0xFFFF_FFFF` clean unsigned. Clean.
- **CR0 stale-prepass-claim**: pre-pass document mentioned `result as i32 as i64`; live code actually uses `result as i64` — so the *claim that the live form is i64* is **correct**, but the prepass implied an i32 form was already there. PPCBUG-020 is the real finding.
---
## Batch 4 — load float (group 23)
Per-group report: `audit-out/group-23-load-float.md`.
Group 23 summary: the double-precision load family (`lfd`, `lfdu`, `lfdux`, `lfdx`) is fully
ISA-correct — EA computation, endianness, update-form writeback, and bit-pattern fidelity are
all clean. The single-precision family (`lfs`, `lfsu`, `lfsux`, `lfsx`) has one HIGH bug:
Rust's `as f64` float cast compiles to x86 `CVTSS2SD` which unconditionally sets the IEEE quiet
bit in the output, silently converting f32 SNaN loads to f64 QNaN. The ISA requires the SNaN
to pass through unchanged. FPSCR.NI does not apply to loads (correct by omission). One LOW
test-gap finding. **2 IDs used (PPCBUG-128, PPCBUG-129). 8 IDs unallocated (PPCBUG-130..137).**
### PPCBUG-128 — lfs/lfsu/lfsx/lfsux silently quieten SNaN via `as f64` Rust float cast
- **Severity**: HIGH
- **Status**: open
- **Locations**: interpreter.rs:1064 (lfs), 1070 (lfsx), 1087 (lfsu), 1093 (lfsux)
- **Symptom**: All four single-precision load arms use `mem.read_f32(ea) as f64` where
`read_f32` = `f32::from_bits(read_u32(ea))`. The `as f64` Rust float cast compiles to x86
`CVTSS2SD`, which unconditionally sets bit 51 of the f64 mantissa (the IEEE quiet/signalling
discriminator bit) for any NaN input. An f32 SNaN (e.g. `0x7F800001`) is loaded and written
to the FPR as the f64 QNaN `0x7FF8000002000000` instead of the SNaN `0x7FF0000002000000`.
**ISA requirement**: "A signalling NaN passes through unchanged into the FPR — it will signal
at the next FP arithmetic instruction." (lfs.md Special Cases). The FPR must hold the SNaN;
VXSNAN fires at the consuming arithmetic op, not at the load.
**Impact**: (a) Game code storing f32 SNaN sentinels (physics engines mark unset float slots
with SNaN) and then loading+inspecting them: `fpscr::is_snan(ctx.fpr[rd])` returns false
after the load, breaking sentinel detection. (b) Arithmetic ops consuming the loaded value
see a QNaN rather than SNaN, so VXSNAN is never set; games relying on VXSNAN to detect
uninitialized-read bugs get false negatives.
- **Canary parity**: Canary's JIT also uses CVTSS2SD via `f.Convert()`. Both emulators share
this deviation. The bug is a structural consequence of using semantic float widening rather
than a bit-pattern-preserving widening routine.
- **Fix**: replace the float cast with a bit-manipulation widening that preserves the SNaN bit:
```rust
fn widen_f32_bits_to_f64(raw32: u32) -> u64 {
let sign = ((raw32 >> 31) as u64) << 63;
let exp32 = ((raw32 >> 23) & 0xFF) as i32;
let mant32 = (raw32 & 0x007F_FFFF) as u64;
if exp32 == 0xFF {
// NaN or Infinity — propagate mantissa left-shifted by 29 bits.
// SNaN (bit22=0) stays SNaN (bit51=0); QNaN (bit22=1) stays QNaN (bit51=1).
sign | (0x7FFu64 << 52) | (mant32 << 29)
} else if exp32 == 0 {
// ±Zero or subnormal f32.
if mant32 == 0 { return sign; } // ±zero
// Subnormal: normalize by finding leading bit, then adjust exponent.
let shift = mant32.leading_zeros() - (64 - 23);
let exp64 = (1023u64 - 126).wrapping_sub(shift as u64);
let mant64 = (mant32 << (shift + 1 + 29)) & 0x000F_FFFF_FFFF_FFFF;
sign | (exp64 << 52) | mant64
} else {
// Normal f32 → normal f64.
let exp64 = (exp32 as u64) - 127 + 1023;
sign | (exp64 << 52) | (mant32 << 29)
}
}
// In each lfs* arm:
ctx.fpr[instr.rd()] = f64::from_bits(widen_f32_bits_to_f64(mem.read_u32(ea)));
```
This function also correctly handles subnormal f32 → normal f64 widening (which the `as f64`
cast already gets right numerically, but now goes through a consistent code path).
- **Test gap**: add a test loading an f32 SNaN (`0x7F800001`) via `lfs` and asserting
`fpscr::is_snan(ctx.fpr[rd])` is `true` and bit 51 of `ctx.fpr[rd].to_bits()` is 0.
### PPCBUG-129 — Zero interpreter execution tests for all 8 float-load opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Locations**: interpreter.rs test module; `tests/disasm_goldens.rs:249-250` (disasm-only)
- **Symptom**: No `#[test]`-decorated function exercises any float-load interpreter arm.
A regression in EA computation, endianness, f32→f64 widening, or update-form writeback
would go undetected. The SNaN bug (PPCBUG-128) was undetected partly due to this gap.
- **Recommended minimum**:
1. `lfs` normal: `0x3F800000` (1.0f32) → assert `fpr[rd] == 1.0f64` exact.
2. `lfs` negative displacement: base minus 4.
3. `lfs` ra=0 path (absolute addressing).
4. `lfd` normal: store PI bits, assert exact bit equality via `.to_bits()`.
5. `lfd` SNaN: store `0x7FF0_0000_0000_0001u64`, assert exact bit equality after load.
6. `lfsu` / `lfsux` / `lfdu` / `lfdux`: verify loaded FPR value AND rA update address.
7. After PPCBUG-128 fix: `lfs` SNaN round-trip test.
IDs PPCBUG-130 through PPCBUG-137 are unallocated — no further bugs found in group 23.
---
## Files modified by the audit
- `xenia-rs/audit-prepass-findings.md` — Phase A pre-pass red flags (orchestrator regex output).
- `xenia-rs/audit-out/group-01-add-imm.md` — Group 1 report (Sonnet subagent).
- `xenia-rs/audit-out/group-02-add-reg.md` — Group 2 report.
- `xenia-rs/audit-out/group-03-sub-reg.md` — Group 3 report.
- `xenia-rs/audit-out/group-04-multiply.md` — Group 4 report.
- `xenia-rs/audit-out/group-05-divide.md` — Group 5 report.
- `xenia-rs/audit-out/group-06-logic-imm.md` — Group 6 report.
- `xenia-rs/audit-out/group-09-word-rotate.md` — Group 9 report.
- `xenia-rs/audit-out/group-13-branch.md` — Group 13 report.
- `xenia-rs/audit-out/group-14-trap-sc.md` — Group 14 report.
- `xenia-rs/audit-out/group-15-cr-logical.md` — Group 15 report.
- `xenia-rs/audit-out/group-16-spr-msr.md` — Group 16 report.
- `xenia-rs/audit-out/group-17-cache-sync.md` — Group 17 report.
- `xenia-rs/audit-out/group-18-load-byte.md` — Group 18 report.
- `xenia-rs/audit-out/group-19-load-halfword.md` — Group 19 report.
- `xenia-rs/audit-out/group-21-load-doubleword.md` — Group 21 report.
- `xenia-rs/audit-out/group-22-load-mlsr.md` — Group 22 report.
- `xenia-rs/audit-out/group-23-load-float.md` — Group 23 report.
- `xenia-rs/audit-out/group-24-store-byte-half.md` — Group 24 report.
- `xenia-rs/audit-out/group-26-store-doubleword.md` — Group 26 report.
- `xenia-rs/audit-findings.md` — this consolidated tracker.
**No source code under `xenia-rs/crates/` has been modified.**
---
## Batch 4 — load byte (group 18)
Per-group report: `audit-out/group-18-load-byte.md`.
Group 18 summary: **cleanest group audited to date — zero HIGH or MEDIUM bugs.** All four opcodes
(`lbz`, `lbzu`, `lbzx`, `lbzux`) are ISA-correct: EA computation (rA=0 special case, D-field
sign-extension, 32-bit EA truncation), zero-extension of the byte result to 64 bits, and
update-form writeback all match the ISA spec and Canary cross-reference. Two LOW findings only.
### PPCBUG-090 — lbzu/lbzux: rD==rA "invalid form" silently misloads rD
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
- **Status**: open
- **Location**: interpreter.rs:951-956 (lbzu), 963-968 (lbzux)
- **Symptom**: When `rD == rA` (invalid form, UISA undefined), the byte load into `gpr[rD]` at
line 953/965 is immediately overwritten by the EA writeback at line 954/966. Net result:
`gpr[rD]` holds the EA, not the loaded byte. Canary has the same behaviour. No practical impact
under normal compiler output.
- **Recommendation**: add `debug_assert!(instr.rd() != instr.ra())` in debug builds.
### PPCBUG-091 — Zero interpreter execution tests for all four lbz* opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs test module; disasm_goldens.rs:247 (disasm-only, no execution)
- **Symptom**: No `#[test]` exercises lines 945-968. A regression in EA computation,
zero-extension, or the update writeback would go undetected.
- **Recommended minimum**: `lbz` with ra=0 + negative displacement; `lbzu` normal case (verify
both byte result and rA update); `lbzx` with ra=0; `lbzux` normal case. Each test should
assert `gpr[rD] <= 0xFF` to catch any future accidental sign-extension.
IDs PPCBUG-092, PPCBUG-093, PPCBUG-094 are unallocated — no further bugs found in group 18.
---
## Batch 4 — load halfword (group 19)
Per-group report: `audit-out/group-19-load-halfword.md`.
Group 19 summary: **4 HIGH bugs confirmed — all pre-pass flags validated.** The four `lha*` opcodes
(`lha`, `lhax`, `lhau`, `lhaux`) all use `as i16 as i64 as u64`, sign-extending a negative halfword
to 64 bits in violation of the 32-bit ABI. Every negative halfword load (common for `int16_t` PCM
samples, packed vertex deltas, `short[]` arrays) actively poisons the upper 32 bits of the
destination GPR — identical shape to the `addis` bug. The four `lhz*` opcodes and `lhbrx` are all
clean (`as u64` zero-extension; `swap_bytes() as u64` byte-reversal; correct endian handling; correct
EA computation and update writebacks). Two LOW findings: rD==rA invalid-form in update variants,
and zero unit tests for all nine opcodes.
### PPCBUG-095 — `lha`: GPR writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:990
- **Symptom**: `mem.read_u16(ea) as i16 as i64 as u64` — memory `0x8000` writes
`0xFFFFFFFF_FFFF8000` instead of `0x00000000_FFFF8000`. Active GPR poisoning for every
negative halfword. Common trigger: `int16_t` struct fields, PCM samples, packed vertex deltas.
- **Fix**:
```rust
ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64;
```
- **Test gap**: zero unit tests. Add: memory `0x8000` → `gpr[rD] == 0x00000000_FFFF8000`;
memory `0x7FFF` → `gpr[rD] == 0x00000000_00007FFF`.
### PPCBUG-096 — `lhax`: GPR writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:996
- **Symptom**: identical to PPCBUG-095. Indexed form emitted for array access with GPR index.
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
- **Test gap**: zero unit tests.
### PPCBUG-097 — `lhau`: GPR writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:1007
- **Symptom**: identical to PPCBUG-095. Update form emitted for auto-incrementing `short[]` loops;
poison accumulates across all iterations.
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
- **Test gap**: zero unit tests. Add: verify both `gpr[rD]` (upper-32 = 0) and `gpr[rA]` (EA update).
### PPCBUG-098 — `lhaux`: GPR writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:1013
- **Symptom**: identical to PPCBUG-095, update+indexed form.
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
- **Test gap**: zero unit tests.
- **Note**: PPCBUG-095..098 are the same one-line fix at four sites. Fix session sweep:
`rg -n 'as i16 as i64 as u64' interpreter.rs` finds exactly these four lines.
### PPCBUG-099 — `lhau`/`lhaux`: rD==rA invalid-form silently destroys load result
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
- **Status**: open
- **Location**: interpreter.rs:1005-1016
- **Symptom**: same as PPCBUG-090 (`lbzu`/`lbzux`) — EA writeback overwrites `gpr[rD]` when
`rD == rA`. Net: `gpr[rD]` holds EA, not the loaded value.
- **Recommendation**: `debug_assert!(instr.rd() != instr.ra())` in both arms.
### PPCBUG-100 — Zero execution tests for all nine halfword-load opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs test module
- **Symptom**: No `#[test]` exercises any of the 9 opcodes. The HIGH sign-extension bug would
have been caught by any test that checks `gpr[rD] <= 0x0000_0000_FFFF_FFFF`.
- **Recommended minimum**: `lha` with negative halfword (assert upper 32 zero), `lhz` same,
`lhau` verify both rD and rA, `lhzux` verify both rD and rA, `lhbrx` verify byte-swap.
IDs PPCBUG-101, PPCBUG-102, PPCBUG-103, PPCBUG-104 are unallocated — no further bugs found in group 19.
---
## Batch 4 — load word (group 20)
Per-group report: `audit-out/group-20-load-word.md`.
Group 20 summary: **1 HIGH bug (reservation invalidation never called), 1 MEDIUM (cross-thread
reservation isolation), 1 MEDIUM (lwa 64-bit sign-extension hazard), 3 LOW test gaps.** The
zero-extending family (`lwz`/`lwzu`/`lwzx`/`lwzux`) is entirely correct — `mem.read_u32(ea) as u64`
cleanly zero-extends; EA computation, update writebacks, and RA0 handling all match ISA and Canary.
`lwbrx` is correct: the double-swap (`from_be_bytes` then `swap_bytes()`) correctly produces a
little-endian word read, zero-extended. The sign-extending family (`lwa`/`lwax`/`lwaux`) is
ISA-correct for 64-bit mode but a 32-bit-ABI hazard — classified MEDIUM because `lwa` is a
64-bit-mode instruction unlikely to appear in Xbox 360 32-bit-ABI binaries. The HIGH finding is
that `ReservationTable::invalidate_for_write` is defined and unit-tested but **never called** from
any store instruction, breaking multi-threaded `lwarx`/`stwcx.` atomicity under `--parallel`.
### PPCBUG-105 — lwa / lwax / lwaux sign-extend to 64 bits; 32-bit-ABI hazard
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:1032 (lwa), 1038 (lwax), 1043 (lwaux)
- **Symptom**: `mem.read_u32(ea) as i32 as i64 as u64` — a word with high bit set (e.g. `0x8000_0000`)
writes `0xFFFF_FFFF_8000_0000` to rD. ISA-correct for 64-bit-mode `lwa`. In 32-bit ABI, the poisoned
upper 32 bits produce wrong CA / CR results in downstream 64-bit unsigned compares — same shape as
the `addis` bug.
- **Likelihood**: LOW on real Xbox 360 32-bit-ABI binaries (compilers use `lwz` for word loads; `lwa`
is a 64-bit-mode instruction). Risk elevated if the binary contains 64-bit-mode kernel code.
- **Note**: Canary also uses `SignExtend(..., INT64_TYPE)` — both are ISA-correct. Pre-pass flagged
HIGH; audit downgrades to MEDIUM because `lwa` is unlikely in 32-bit-ABI Xbox 360 code.
### PPCBUG-106 — lwa no-update-form undocumented (LOW / informational)
- **Severity**: LOW
- **Status**: open
- **Location**: interpreter.rs:1029-1034
- **Symptom**: `lwa` arm has no RA writeback. Correct per ISA (no `lwau` in PowerISA). Undocumented.
- **Fix**: add comment `// No lwau in PowerISA; lwa is DS-form non-update only.`
### PPCBUG-107 — `invalidate_for_write` never called from stores; lwarx/stwcx. atomicity broken under `--parallel` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: `reservation.rs:234` (definition, never called from interpreter); `interpreter.rs:1182-1278` (all store arms, none call it)
- **Symptom**: `ReservationTable::invalidate_for_write(addr)` is defined and correctly unit-tested but
no interpreter store arm calls it. Under M3 `--parallel` with the table enabled, a plain `stw` by
thread B to a cache line reserved by thread A does NOT clear thread A's table slot. Thread A's
subsequent `stwcx.` calls `t.try_commit()`, which succeeds — spurious success, violating
store-conditional atomicity. All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic
counters) built on `lwarx`/`stwcx.` are broken in multi-threaded mode.
- **Concrete scenario**: thread A: `lwarx r3, 0, r4` (reserves line). Thread B: `stw r5, 0(r4)`
(same address; should invalidate). Thread A: `stwcx. r6, 0, r4` → should fail (CR0.EQ=0) but
succeeds (CR0.EQ=1). Thread A's store silently overwrites thread B's store.
- **Fix**: in every store arm, before `mem.write_*`, add:
```rust
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
`has_active_reservers()` is a single `Relaxed` atomic load — negligible cost for non-atomic code
(common case returns false immediately). Alternative: inject the table into the memory layer so
`write_u32`/`write_u64` call it automatically.
- **Test gap**: add interpreter-level test: `lwarx` reserve a line, intervening `stw` to the same
line, `stwcx.` must fail (CR0.EQ=0).
### PPCBUG-108 — Legacy per-ctx reservation path: cross-thread invalidation impossible (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: interpreter.rs:1148-1153 (stwcx legacy path)
- **Symptom**: When table is None/disabled, reservation state lives in per-thread `PpcContext` fields.
A store by thread B cannot clear `ctx_A.has_reservation`. Safe in strict lockstep (one host thread).
Broken under real parallelism with the table inadvertently disabled.
- **Fix**: add a `debug_assert!` in `lwarx`/`stwcx.` that table is enabled when multiple host threads
are active. The M3 scheduler should always enable the table before spawning a second host thread.
### PPCBUG-109 — Zero unit tests for lwa / lwax / lwaux
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs test module
- **Recommended minimum**:
- `lwa` with `0x8000_0000` → `gpr[rD] == 0xFFFF_FFFF_8000_0000`.
- `lwa` with `0x7FFF_FFFF` → `gpr[rD] == 0x0000_0000_7FFF_FFFF`.
- `lwax` with ra=0.
- `lwaux`: verify loaded value and rA update.
### PPCBUG-110 — Zero unit tests for lwbrx
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs test module
- **Recommended minimum**: memory `[0x11, 0x22, 0x33, 0x44]` at EA → `gpr[rD] == 0x4433_2211`; ra=0;
assert `gpr[rD] <= 0xFFFF_FFFF`.
### PPCBUG-111 — lwarx / stwcx test suite missing key cases
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:5167-5207 (two tests exist)
- **Missing**: `lwarx` ra=0; `stwcx.` without prior `lwarx` → CR0.EQ=0; second `lwarx` displaces
first; post-PPCBUG-107-fix store-invalidation test; `lwarx` zero-extension assertion.
IDs PPCBUG-112, PPCBUG-113, PPCBUG-114 are unallocated — reserved for group 20 follow-up.
---
## Batch 4 — load doubleword (group 21)
Per-group report: `audit-out/group-21-load-doubleword.md`.
Group 21 summary: **cleanest load group audited — zero HIGH bugs.** All six instructions (`ld`,
`ldu`, `ldux`, `ldx`, `ldbrx`, `ldarx`) are ISA-correct: 64-bit load, big-endian byte order,
EA computation (RA=0, DS-form, u32 truncation), update-form writebacks, and reservation tracking
all pass scrutiny against Canary and the ISA spec. `ldbrx`'s double-swap pattern was investigated
and confirmed correct (PPCBUG-115 informational). One MEDIUM documentation finding, two LOW findings.
### PPCBUG-115 — `ldbrx` byte-swap confirmed correct (informational)
- **Severity**: LOW (confirmed clean, informational only)
- **Status**: wontfix
- **Location**: `interpreter.rs:4157-4159`
- **Analysis**: `mem.read_u64` uses `u64::from_be_bytes` internally (confirmed in `heap.rs:404`
and interpreter's `TestMem`), so it returns the BE-decoded value. Calling `.swap_bytes()`
re-reverses to give the LE interpretation, which is exactly what `ldbrx` specifies. Canary
achieves the same result by skipping `ByteSwap` at the HIR level. Both approaches are correct.
See per-group report for full byte-level worked example.
### PPCBUG-116 — `ld`/`ldx`/`ldu`/`ldux` as 32-bit-ABI poison sources (documentation)
- **Severity**: MEDIUM (awareness/documentation; no change to load instructions themselves)
- **Status**: open
- **Location**: `interpreter.rs:1017-1058`
- **Symptom**: These instructions correctly write full 64-bit values to the destination GPR.
Xbox 360 32-bit-ABI binaries legitimately emit them for TOC loads, vtable loads, and kernel
structure accesses — all of which may have non-zero upper 32 bits. Until PPCBUG-001..089
arithmetic truncation fixes land, such values can flow into 64-bit compares and corrupt CA
bits and CR fields — the inverse of the `addis` bug (pollution from memory side vs. sign-ext).
- **Key guard already in place**: PPCBUG-007's `subfcx` CA fix truncates operands to u32 before
the compare, correctly handling `ld`-originated 64-bit values. This is the most critical
downstream consumer and the fix is already specified.
### PPCBUG-117 — Stale frozen snapshot in `ppc-manual/memory/ldarx.md`
- **Severity**: LOW
- **Status**: open
- **Location**: `ppc-manual/memory/ldarx.md` (frozen snapshot section)
- **Symptom**: Snapshot uses old field name `ctx.reserved_addr`; live code uses
`ctx.reserved_line = ea & !RESERVATION_MASK` (M3 refactor). Cosmetic only.
- **Fix**: Regenerate snapshot after M3 field names settle.
### PPCBUG-118 — Zero functional tests for `ld`, `ldx`, `ldu`, `ldux`, `ldbrx`
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` test module
- **Symptom**: `test_ldarx_stdcx_pair` covers `ldarx`/`stdcx` only. Five doubleword load
variants are untested. Recommended minimum: `ld` with positive DS, negative DS, and RA=0;
`ldx` basic; `ldu` with RA writeback check; `ldux` with RA writeback check; `ldbrx` with
asymmetric data to distinguish output from plain `ldx`.
IDs PPCBUG-119 through PPCBUG-122 are unallocated — reserved for group 21 follow-up.
---
## Batch 4 — load multiple/string (group 22)
Per-group report: `audit-out/group-22-load-mlsr.md`.
Group 22 summary: one structural HIGH bug (`lswx` is always a no-op due to missing XER TBC field),
one MEDIUM coupling bug (the write path discards TBC on `mtspr XER`), one MEDIUM ISA-form deviation
(`lmw` does not skip RA-in-range stores unlike Canary), and two LOW findings. The `lswi` body itself
is correct; `lmw` core logic (loop bound, zero-extension, byte-packing, register wraparound) is clean.
Zero unit tests across all three opcodes.
### PPCBUG-123 — `lswx` XER TBC field not modeled; always loads 0 bytes
- **Severity**: HIGH
- **Status**: open
- **Location**: `context.rs:235-237` (`xer()` method) + `interpreter.rs:4172`
- **Symptom**: `ctx.xer()` assembles only SO[31], OV[30], CA[29] — bits 028 are always zero.
`lswx` reads `ctx.xer() & 0x7F` expecting the XER TBC byte-count field at bits 06, but always
gets 0. The `while bytes_left > 0` loop never executes; **`lswx` is permanently a no-op** —
no bytes are loaded, no destination registers are written. The companion `stswx` at
`interpreter.rs:4191` has the identical pattern and is equally broken.
- **Root cause**: `PpcContext` has no `xer_tbc` field. Neither `xer()` nor `set_xer()` model
XER[25:31]. Any `mtspr XER, rN` that sets a non-zero byte count silently discards it (PPCBUG-124).
- **Cross-reference**: Canary marks `lswx` as `XEINSTRNOTIMPLEMENTED()` — xenia-rs implemented the
body but left the XER infrastructure incomplete.
- **Fix**:
1. Add `pub xer_tbc: u8` to `PpcContext`.
2. In `xer()`: `| (self.xer_tbc as u32)` for bits 06.
3. In `set_xer()`: `self.xer_tbc = (val & 0x7F) as u8`.
The `lswx` body is then correct as-is.
- **Test gap**: zero unit tests. After fix: `mtspr XER, r3` (r3=4) then `lswx r5, 0, r4` should
write exactly 4 bytes into r5 (high byte = first byte at EA).
### PPCBUG-124 — `set_xer()` discards TBC on `mtspr XER` (structural coupling to PPCBUG-123)
- **Severity**: MEDIUM (must land with PPCBUG-123)
- **Status**: open
- **Location**: `context.rs:239-244`
- **Symptom**: `set_xer()` writes only SO/OV/CA from the 32-bit value, silently discarding bits 028
(including the 7-bit TBC field). Any guest `mtspr XER, rN` with a non-zero byte count loses that
count; subsequent `lswx`/`stswx` see TBC=0. Fix is the same three-line change as PPCBUG-123.
### PPCBUG-125 — `lmw` missing RA-in-destination-range skip
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:1515`
- **Symptom**: PowerISA declares `lmw rT, D(rA)` invalid when `rA` is in `[rT..31]`. Canary skips
the store to `rA` in that case (`if (i.D.RT + j == i.D.RA) continue`). xenia-rs pre-computes EA
before the loop (so EA values remain correct), but overwrites `rA` with the loaded word instead of
preserving it. Result differs from Canary for this invalid encoding. Any program that relies on RA
surviving a nominally invalid `lmw` will see the wrong value.
- **Fix**:
```rust
for r in instr.rd()..32 {
if r == instr.ra() { ea = ea.wrapping_add(4); continue; }
ctx.gpr[r] = mem.read_u32(ea as u32) as u64;
ea = ea.wrapping_add(4);
}
```
- **Test gap**: zero tests. Add: `lmw r28, 0(r28)` (RA=RT=28) — after fix, gpr[28] unchanged.
### PPCBUG-126 — `lswi` uses `instr.rb()` instead of `instr.nb()` for the NB field
- **Severity**: LOW (maintenance hazard, not a correctness bug)
- **Status**: open
- **Location**: `interpreter.rs:1340`
- **Symptom**: `instr.rb()` and `instr.nb()` both extract bits 1620 and return identical values.
Using `rb()` misrepresents the operand as a register reference rather than a 5-bit immediate count.
The companion `stswi` at line 1359 has the same pattern. A future `rb()` type-system refactor
could break `lswi`/`stswi` silently.
- **Fix**: `instr.nb()` at both sites.
### PPCBUG-127 — Zero execution tests for lmw, lswi, lswx
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` test module
- **Symptom**: No `#[test]` exists for any of the three opcodes. A regression in loop bounds,
byte-packing, EA computation, or the NB=0 special case would go undetected.
- **Recommended minimum**: `lmw r30, 0(r1)` (2-word load); `lswi r3, r4, 8` (2-word byte pack);
`lswi r31, r4, 8` (register wraparound → r31 and r0); `lswi r3, r4, 0` (NB=0→32 special case);
post-PPCBUG-123 fix: `lswx` with XER TBC=4 (1-word load), TBC=0 (no-op), TBC=5 (partial word).
---
## Batch 5 — store byte/halfword (group 24)
Per-group report: `audit-out/group-24-store-byte-half.md`.
Group 24 summary: **3 findings: 1 HIGH (cross-cutting reservation invalidation), 1 LOW/informational
(update-form zero-extension correct but undocumented), 1 LOW (zero test coverage).** EA computation,
value truncation (`as u8`, `as u16`), RA=0 special cases, update-form writeback zero-extension,
big-endian `mem.write_u16` path, and `sthbrx` byte-reverse logic are all ISA-correct. The single
HIGH finding is the systemic absence of `invalidate_for_write` calls — same class as PPCBUG-107,
now documented for all 9 byte/halfword store opcodes.
### PPCBUG-130 — All 9 store-byte/halfword opcodes missing `invalidate_for_write` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: `interpreter.rs:1207` (stb), `1213` (stbu), `1219` (stbx), `1225` (stbux),
`1231` (sth), `1237` (sthu), `1243` (sthx), `1249` (sthux), `1337` (sthbrx)
- **Class**: same root cause as PPCBUG-107 (stw/stdcx family — `invalidate_for_write` never called
from any store arm).
- **Symptom**: Under `--parallel`, a `stb`, `sth`, or `sthbrx` (or any variant in this group) to a
cache line reserved by another thread via `lwarx`/`ldarx` does NOT clear the table slot.
The reserving thread's subsequent `stwcx.`/`stdcx.` spuriously succeeds even though an
intervening sub-word store has modified the line — violating store-conditional atomicity. Affects
any lock-free protocol that uses byte or halfword stores adjacent to or inside a `lwarx`/`stwcx.`
loop (e.g. byte-level spinlocks, tagged-pointer updates, audio ring-buffer flags).
- **Fix** (per PPCBUG-107 pattern): before each `mem.write_u8/u16`, add:
```rust
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
- **Note**: PPCBUG-107 is the canonical parent finding. PPCBUG-130 documents that the byte/halfword
group must be included in the same fix sweep.
### PPCBUG-131 — Update-form rA zero-extension correct but undocumented (LOW / informational)
- **Severity**: LOW (informational — behavior is correct)
- **Status**: open (documentation gap)
- **Locations**: `interpreter.rs:1216` (stbu), `1228` (stbux), `1240` (sthu), `1252` (sthux)
- **Symptom**: Each update-form arm writes `ctx.gpr[instr.ra()] = ea as u64` where `ea: u32`.
This zero-extends to 64 bits — correct in the 32-bit ABI (addresses are 32-bit; upper half must
be zero). No bug, but there is no comment explaining the deliberate zero-extension. A maintainer
who computes EA as `u64` throughout and drops the `as u32` intermediate would silently
sign-extend negative displacements into rA, mirroring the `addis` bug shape.
- **Fix**: add comment `// EA is u32; zero-extend into rA (32-bit ABI: upper 32 bits must be 0).`
at each update-form writeback line.
### PPCBUG-132 — Zero unit tests for all 9 store-byte/halfword opcodes (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` test module
- **Symptom**: No `test_stb*` or `test_sth*` functions exist. Any regression in EA computation,
value truncation, update-form writeback order, or `sthbrx` byte-swap logic would be invisible.
- **Recommended minimum**: `stb` basic + ra=0; `stbu`/`stbux` with rA writeback check; `stbx`
ra=0; `sth` big-endian byte check (`0xDEAD` → `[0xDE, 0xAD]`); `sthu`/`sthux` writeback;
`sthbrx` byte-reversed check (`0xDEAD` → `[0xAD, 0xDE]`); post-PPCBUG-130 fix: `lwarx` + `stb`
to same line + `stwcx.` → CR0.EQ=0.
IDs PPCBUG-133 through PPCBUG-139 are unallocated — reserved for group 24 follow-up.
---
## Batch 5 — store word (group 25)
Per-group report: `audit-out/group-25-store-word.md`.
Group 25 summary: **8 findings: 4 HIGH (reservation invalidation per opcode), 0 MEDIUM, 4 LOW.**
Core arithmetic and semantics are entirely clean for all 6 opcodes. EA computation (RA=0 guards,
simm16 sign-extend, u32 truncation), value truncation (`as u32`), update-form writebacks
(`ea as u64` zero-extension), big-endian `mem.write_u32`, `stwbrx` byte-reversal, and `stwcx`
conditional-store logic (cache-line reservation check, CAS, CR0 update, reservation always
cleared) all match the ISA and Canary exactly. The `stwcx` manual snapshot is stale (uses old
`reserved_addr` field name; live code correctly uses `reserved_line` at cache-line granularity —
actually MORE correct than the snapshot). Dominant finding is the same systemic miss as PPCBUG-107
and PPCBUG-130: `invalidate_for_write` is never called from any plain store arm.
### PPCBUG-140 — stw: missing `invalidate_for_write` call (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:1183-1188`
- **Systemic root cause**: PPCBUG-107
- **Symptom**: Under `--parallel` with the ReservationTable enabled, a plain `stw` by thread B
to a cache line reserved by thread A does not clear thread A's table slot. Thread A's
subsequent `stwcx.` spuriously succeeds (CR0.EQ=1) even though thread B has written the line.
All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic counters) built on
`lwarx`/`stwcx.` are broken in multi-threaded mode. `stw` is the most common store instruction —
every stack write, pointer store, and integer field write is affected.
- **Fix**: Before `mem.write_u32(ea, ...)`:
```rust
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
`has_active_reservers()` is a single `Relaxed` load — zero cost in the common non-atomic case.
### PPCBUG-141 — stwu: missing `invalidate_for_write` call (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:1189-1194`
- **Systemic root cause**: PPCBUG-107
- **Symptom**: Same class as PPCBUG-140. `stwu r1, -N(r1)` is the canonical function-prologue
stack-allocation idiom emitted by every compiled function. A thread holding a reservation on
the stack region would see spurious `stwcx.` success after any prologue store.
- **Fix**: Same pattern as PPCBUG-140, inserted before `mem.write_u32`.
### PPCBUG-142 — stwx: missing `invalidate_for_write` call (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:1195-1200`
- **Systemic root cause**: PPCBUG-107
- **Symptom**: Same class as PPCBUG-140. `stwx` is the indexed store used for array writes and
indirect dereferences — common in loops that may run concurrently with reservation holders.
- **Fix**: Same pattern as PPCBUG-140.
### PPCBUG-143 — stwux: missing `invalidate_for_write` call (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:1201-1206`
- **Systemic root cause**: PPCBUG-107
- **Symptom**: Same class as PPCBUG-140. Less common than stw/stwu but still a plain store
that must participate in reservation invalidation.
- **Fix**: Same pattern as PPCBUG-140.
### PPCBUG-144 — stwbrx: missing `invalidate_for_write` call (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:1568-1573`
- **Systemic root cause**: PPCBUG-107
- **Symptom**: Same class as PPCBUG-140. Byte-reversed stores (used for LE-payload GPU command
buffers, file format fields) are still plain stores with respect to the reservation protocol.
- **Fix**: Same pattern as PPCBUG-140. `ea` is already a `u32` at this point (line 1570).
### PPCBUG-145 — stwcx: stale manual snapshot uses `reserved_addr` (LOW)
- **Severity**: LOW (documentation only; live code is correct)
- **Status**: open
- **Location**: `ppc-manual/memory/stwcx.md` (frozen snapshot section)
- **Symptom**: The frozen snapshot shows `ctx.reserved_addr == ea` (exact-word comparison).
The live code at `interpreter.rs:1137-1153` uses `ctx.reserved_line == line` where
`line = ea & !RESERVATION_MASK` (cache-line comparison). The live code is MORE correct per
ISA (PowerISA 2.07B defines reservation at cache-line granularity). Snapshot reflects an
earlier implementation before M3 introduced `RESERVATION_MASK` and `reserved_line`.
Tests confirm live behavior is correct (`stwcx_succeeds_within_same_cache_line`).
- **Fix**: Regenerate the `stwcx.md` snapshot to show current field names and add a note on
the ISA cache-line granule.
### PPCBUG-146 — Zero unit tests for stwu / stwx / stwux / stwbrx (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` test module
- **Symptom**: Four of the six group-25 opcodes have zero dedicated unit tests.
- **Recommended minimum**:
- `stwu r3, -8(r1)`: verify memory at `r1-8` and `gpr[1]` updated to `old_r1 - 8`.
- `stwx ra=0`: store at `gpr[rb]`, verify memory and no RA writeback.
- `stwux`: indexed update — verify store and RA writeback.
- `stwbrx 0x11223344`: bytes at EA should be `[0x44, 0x33, 0x22, 0x11]`.
### PPCBUG-147 — stwcx test suite missing key cases (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs:5167-5208` (two existing tests)
- **Missing**:
- `stwcx.` without prior `lwarx` → CR0.EQ=0, memory not written.
- Post-PPCBUG-140-fix: `lwarx` then `stw` to same line then `stwcx.` → CR0.EQ=0.
- RA=0 form: `stwcx. rS, 0, rB`.
- Explicit memory check on failure path (assert memory unchanged).
IDs PPCBUG-148 and PPCBUG-149 are unallocated — reserved for group 25 follow-up.
---
## Batch 5 (continued) — store multiple/string (group 27)
Per-group report: `audit-out/group-27-store-mlsr.md`.
Group 27 summary: **5 findings: 2 HIGH, 1 MEDIUM, 2 LOW.** `stswx` is a permanent no-op (identical
root cause as PPCBUG-123 for `lswx` — XER TBC field not modeled; fixed as side effect of
PPCBUG-123/124). `stmw`, `stswi`, and `stswx` all omit `invalidate_for_write`, aggravated vs.
single-word stores because a single `stmw` can dirty multiple cache lines. `stswi` uses `instr.rb()`
instead of `instr.nb()` (maintenance hazard, same shape as PPCBUG-126 for `lswi`). Zero unit tests
across all three opcodes.
### PPCBUG-160 — stmw, stswi, stswx missing `invalidate_for_write`; multi-line atomicity exposure (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: `interpreter.rs:1521` (stmw), `interpreter.rs:1357` (stswi), `interpreter.rs:4189` (stswx)
- **Extends**: PPCBUG-107. The prior stated range `1182-1278` does not cover these three arms.
Multi-word instructions (stmw up to 128 bytes = 2 lines; stswx up to 127 bytes = ~2 lines) make
the probability of missing a reservation invalidation much higher than single-word stores.
- **Symptom**: thread B's `stmw` saves 18+ non-volatile registers across two cache lines. Thread A's
`lwarx` reservation on the second line is not cleared. Thread A's `stwcx.` spuriously succeeds.
Because `stmw` is the ABI-standard non-volatile register save, this is triggered constantly in
function prologues — any lock-free primitive inside a prologue/epilogue window is at risk.
- **Fix** (same pattern as PPCBUG-107): before each `mem.write_u32`/`mem.write_u8` call, add the
`invalidate_for_write` guard. See group-27 report for per-opcode code snippets.
- **Test gap**: `lwarx` reserve a line, `stmw` across that line, `stwcx.` must return CR0.EQ=0.
### PPCBUG-161 — `stswx` is a permanent no-op: XER TBC not modeled (HIGH)
- **Severity**: HIGH
- **Status**: open
- **Location**: `interpreter.rs:4189` (`stswx` arm) + `context.rs:235-243` (`xer()`/`set_xer()`)
- **Companion**: PPCBUG-123 (lswx), PPCBUG-124 (mtspr XER). This finding covers the store side.
- **Symptom**: `ctx.xer() & 0x7F` always returns 0 (no `xer_tbc` field). `stswx` unconditionally
stores zero bytes. The byte-loop body is otherwise correct and requires no further changes.
- **Fix**: same three-line fix as PPCBUG-123 (add `xer_tbc: u8` to `PpcContext`; update `xer()`
and `set_xer()`). The `stswx` body is correct once TBC is live.
- **Test gap**: `mtspr XER` (TBC=5) + `stswx r3, 0, r4` → 5 bytes written big-endian.
### PPCBUG-162 — `stswi` uses `instr.rb()` instead of `instr.nb()` for NB field (MEDIUM)
- **Severity**: MEDIUM (maintenance hazard; not a runtime correctness bug today)
- **Status**: open
- **Location**: `interpreter.rs:1359`
- **Companion**: PPCBUG-126 (`lswi` identical pattern at line 1340).
- **Symptom**: `instr.rb()` and `instr.nb()` extract the same bits 16-20, so values are equal now.
If `rb()` is ever given a newtype wrapper (e.g. `RegIdx`) to enforce register semantics, the cast
`instr.rb() as u32` will either fail or yield wrong semantics — silently treating a register index
as a byte count.
- **Fix**: `let nb = if instr.nb() == 0 { 32 } else { instr.nb() };`
### PPCBUG-163 — Zero unit tests for stmw, stswi, stswx (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` test module
- **Symptom**: No `#[test]` exists for any of the three opcodes. Regressions in loop bounds, byte
order, EA computation, NB=0 handling, or register wraparound are invisible.
- **Recommended minimum**: stmw 2-word and 32-word cases; stswi 4-byte / 0 to 32 / wraparound /
partial; stswx (post PPCBUG-123 fix) TBC=4, TBC=0, TBC=5. See group-27 report for full list.
ID PPCBUG-164 is unallocated — reserved for group 27 follow-up.
---
## Batch 5 (continued) — store doubleword (group 26)
Per-group report: `audit-out/group-26-store-doubleword.md`.
Group 26 summary: **0 HIGH, 2 MEDIUM, 2 LOW.** The core semantics of all six opcodes are
ISA-correct: `ds()` decoder extracts the DS-form displacement correctly; `mem.write_u64` handles
big-endian byte ordering; update-form writebacks are zero-extended and in the right order; `stdcx.`
CR0 encoding, reservation check, and table-path interaction all match the ISA. `stdbrx` correctly
applies `swap_bytes()`. No 32-bit writeback truncation issues (these are store ops, not ALU ops).
Two MEDIUM findings: (1) PPCBUG-150 extends PPCBUG-107 to the doubleword stores (same gap —
`invalidate_for_write` never called); (2) PPCBUG-151 identifies that `stwcx.` and `stdcx.` share
the same reservation slot without a width discriminator, allowing a `lwarx`+`stdcx.` or
`ldarx`+`stwcx.` cross-pair to succeed when it should fail. Four IDs used (PPCBUG-150..153).
### PPCBUG-150 — `std`/`stdu`/`stdx`/`stdux`/`stdbrx` do not call `invalidate_for_write` (scope extension of PPCBUG-107)
- **Severity**: MEDIUM (same classification as PPCBUG-107)
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**:
- `interpreter.rs:1258` (`std`)
- `interpreter.rs:1264` (`stdx`)
- `interpreter.rs:1269` (`stdu`)
- `interpreter.rs:1275` (`stdux`)
- `interpreter.rs:4163` (`stdbrx`)
- **Symptom**: When `--parallel` is active and the `ReservationTable` is enabled, any of these
five stores to an address another HW thread has reserved via `ldarx` will NOT invalidate that
thread's reservation. The `ldarx`-holding thread's `stdcx.` can subsequently succeed even though
the memory was overwritten — a classic LL/SC ABA gap. Fix session for PPCBUG-107 must include
these five sites.
- **Fix**: in each arm, after `mem.write_u64(ea, ...)`, add:
```rust
if let Some(t) = &ctx.reservation_table {
if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
### PPCBUG-151 — `stdcx.`/`stwcx.` reservation width not discriminated: cross-width pair silently succeeds
- **Severity**: MEDIUM
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:4119-4155` (`stdcx`) vs `interpreter.rs:1134-1180` (`stwcx`)
- **Symptom**: Both `stwcx.` and `stdcx.` match reservations using only `(has_reservation,
reserved_line)`. A `lwarx` reservation can be spuriously committed by `stdcx.`, or a `ldarx`
reservation by `stwcx.`, as long as the cache line matches. The ISA requires pairing — `lwarx`
must be committed by `stwcx.`, and `ldarx` by `stdcx.`. Cross-width commit reads the wrong width
from memory and writes back the wrong width, with no failure indication (CR0.EQ=1).
- **Fix**: add a `reservation_width: u8` field (4 or 8) to `PpcContext`. `stwcx.` requires
`reservation_width==4`; `stdcx.` requires `reservation_width==8`. In the table path, pack the
1-bit width flag into one of the spare bits of the 64-bit slot (bits 3932 are always zero for
line addresses in the 32-bit guest address space).
### PPCBUG-152 — `stdu`/`stdux` no invalid-form guard for RS==RA (LOW)
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this)
- **Status**: open
- **Location**: `interpreter.rs:1267-1278`
- **Symptom**: When `RA==RS`, the store writes the original RS value, then RA (==RS) is
overwritten with EA, destroying the source. ISA marks this invalid-form. Consistent with
policy of other update-form stores in groups 18-22.
- **Fix**: `debug_assert!(instr.ra() != 0 && instr.ra() != instr.rs())` in debug builds.
### PPCBUG-153 — Zero unit tests for std/stdu/stdx/stdux/stdbrx; stdcx. happy-path only (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` test module (only `test_ldarx_stdcx_pair` at line 4629)
- **Missing coverage**: `std` with negative DS; `std` with RA=0; `stdu` update writeback; `stdx`
with RA=0; `stdux` indexed update; `stdbrx` byte-reversed output; `stdcx.` failure path (no
prior reservation or EA mismatch); `stdcx.` `has_reservation` cleared on failure.
- **Recommended minimum**: 6 tests — see per-group report for encodings.
IDs PPCBUG-154 through PPCBUG-159 are unallocated — reserved for group 26 follow-up.
---
## Batch 5 (continued) — store float (group 28)
Per-group report: `audit-out/group-28-store-float.md`.
Group 28 summary: **7 findings: 3 HIGH, 1 MEDIUM, 3 LOW.** EA computation, endianness, update-form
writebacks, and `stfiwx` integer-word extraction are all correct. Critical bugs: (1) `stfs*` never
raises FPSCR exception bits (VXSNAN, XX, OX, UX) required by PowerISA for double→single narrowing;
(2) `stfs*` ignores FPSCR.RN rounding mode, always using round-to-nearest-even; (3) all 9 FP store
arms omit `invalidate_for_write` (same class as PPCBUG-107). The `stfd*` family and `stfiwx` are
clean (bit-pattern stores with no conversion). Zero unit tests across all 9 opcodes.
**7 IDs used (PPCBUG-165..171). 3 IDs unallocated (PPCBUG-172..174).**
### PPCBUG-165 — stfs* does not raise FPSCR exception bits (VXSNAN, XX, OX, UX)
- **Severity**: HIGH
- **Status**: open
- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux)
- **Symptom**: PowerISA requires that `stfs` double→single narrowing raises FPSCR[VXSNAN] for SNaN
input, FPSCR[OX] on overflow to ±∞, FPSCR[UX] on underflow to ±0/denormal, and FPSCR[XX] when the
result is inexact. None of these bits are ever set. The narrowing is done via `ctx.fpr[instr.rs()] as f32`
(x86 `CVTSD2SS`); no FPSCR inspection or update follows. Games that poll FPSCR[OX] to detect
overflow (physics engines clamping large velocities), or FPSCR[VXSNAN] after sentinel SNaN writes,
get false negatives.
- **Canary parity**: Canary also omits these FPSCR updates for `stfs*`. Both share the deviation.
- **Fix**: after the narrowing, check `fpscr::is_snan(src)` → set `VXSNAN`; compare source vs.
f64 round-trip of narrowed value for inexact; compare src.is_finite() && f32.is_infinite() for
overflow. See group-28 report for illustrative code sketch.
### PPCBUG-166 — stfs* ignores FPSCR.RN; always uses round-to-nearest-even
- **Severity**: HIGH
- **Status**: open
- **Locations**: interpreter.rs:1284, 1289, 1296, 1301
- **Symptom**: `ctx.fpr[instr.rs()] as f32` uses the host MXCSR rounding mode, never consulting
`ctx.fpscr & fpscr::RN_MASK`. Any game that configures FPSCR.RN to truncate/ceil/floor and then
stores via `stfs` gets the wrong f32 in memory (wrong by at most 1 ULP). The stfs.md spec
explicitly acknowledges this gap.
- **Canary parity**: Canary also ignores FPSCR.RN for stfs. Both share the deviation.
- **Fix**: read `ctx.fpscr & fpscr::RN_MASK` and set host MXCSR before narrowing, then restore.
Minimum viable: `debug_assert_eq!(ctx.fpscr & fpscr::RN_MASK, 0)` for debug-build visibility.
### PPCBUG-167 — All 9 FP store arms missing `invalidate_for_write` (PPCBUG-107 class)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux),
1308 (stfd), 1313 (stfdu), 1320 (stfdx), 1325 (stfdux), 1333 (stfiwx)
- **Symptom**: Same class as PPCBUG-107. Under M3 `--parallel`, a FP store by thread B to a
cache line reserved by thread A via `lwarx` does not clear thread A's reservation table slot.
Thread A's subsequent `stwcx.` spuriously succeeds. Rendering workers using FP stores to shared
transform/particle buffers co-located with spinlock sites are at risk.
- **Fix**: before each `mem.write_f32`/`write_f64`/`write_u32` in every FP store arm:
```rust
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
Recommend a single sweep of all store groups (PPCBUG-107, 130, 160, 167) to avoid further drift.
### PPCBUG-168 — stfs* SNaN narrowing: `as f32` quietens SNaN without raising FPSCR.VXSNAN
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:1284, 1289, 1296, 1301
- **Symptom**: When FRS holds an f64 SNaN (bit 51 = 0), `CVTSD2SS` sets the f32 quiet bit (bit 22),
producing a QNaN in memory, without raising FPSCR[VXSNAN]. The stored memory bytes are correct per
IEEE-754 (narrowing an SNaN produces a QNaN). The bug is the missing FPSCR signal, a subset of
PPCBUG-165. **Contrast with PPCBUG-128** (lfs stores wrong FPR bits — HIGH severity; here memory
bytes are right, only the flag is missing).
- **Note**: fixed as a side effect of the PPCBUG-165 fix. No independent code change needed.
### PPCBUG-169 — stfd* bit-pattern store: confirmed correct (informational)
- **Severity**: LOW (confirmed clean, informational)
- **Status**: wontfix
- **Locations**: interpreter.rs:1305, 1311, 1317, 1323
- **Analysis**: `write_f64(ea, fpr)` → `write_u64(ea, fpr.to_bits())` → `val.to_be_bytes()`. Pure
bit-pattern, correct big-endian. SNaN preserved. EA computation and update-form writebacks all
correct. Canary parity confirmed. No bugs.
### PPCBUG-170 — stfiwx: confirmed correct (informational)
- **Severity**: LOW (confirmed clean, informational)
- **Status**: wontfix
- **Location**: interpreter.rs:1329-1335
- **Analysis**: `write_u32(ea, fpr.to_bits() as u32)` correctly extracts the low 32 bits of the
64-bit FPR as a raw bit pattern (the integer word produced by `fctiw`/`fctiwz`) and stores
big-endian. RA=0 handled correctly. No FPSCR effects required. Canary parity confirmed. No bugs.
### PPCBUG-171 — Zero unit tests for all 9 store-float opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs test module
- **Symptom**: No `#[test]` covers any of the 9 FP store arms. Regressions in EA computation,
endianness, update-form writeback order, or double→single narrowing are invisible.
- **Recommended minimum** (10 tests): `stfd` normal + SNaN bit-exact; `stfdu` update writeback;
`stfs` round-trip (1.0); `stfs` overflow (→ ±∞); `stfsx` ra=0; `stfsux` update; `stfiwx` integer
word extract; post-PPCBUG-165 fix: SNaN → FPSCR.VXSNAN set; post-PPCBUG-166 fix: RN=truncate.
IDs PPCBUG-172 through PPCBUG-174 are unallocated — reserved for group 28 follow-up.
---
## Batch 6 — FPU single-precision (group 29)
Per-group report: `audit-out/group-29-fpu-single.md`.
**Context**: The live implementation is substantially more capable than the frozen ppc-manual
snapshots indicated. `to_single()` correctly dispatches on FPSCR.RN; `check_invalid_*` helpers
correctly set VXSNAN, VXISI, VXIMZ, VXZDZ, VXIDI, ZX; `update_after_op` sets OX, UX, and
FPRF. The remaining bugs are: (1) XX/FI/FR (inexact) never set anywhere; (2) fmadd/fmsub
*sx variants missing the VXISI check for the add-phase infinity collision (their *x double
siblings have the same gap); (3) fnmadd/fnmsub NaN sign bit incorrectly flipped by Rust `-`;
(4) fresx produces a full IEEE 1/b instead of the ~12-bit hardware estimate; (5) FPSCR.NI
flush-to-zero not modelled; (6) SNaN→QNaN propagation relies on host SSE behavior rather than
the ISA-canonical derivation.
**8 IDs used (PPCBUG-180..187). 12 IDs unallocated (PPCBUG-188..199).**
### PPCBUG-180 — XX / FI / FR bits never set across all FPU *sx opcodes (and double siblings)
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: `fpscr.rs:184-194` (`update_after_op`); affects interpreter.rs:2252-2494
- **Symptom**: `FPSCR[XX]` (inexact) should be set whenever the mathematical result of an
FP operation cannot be represented exactly in the destination format (single or double) and
a rounding step occurs. `FPSCR[FI]` (fraction inexact) and `FPSCR[FR]` (fraction rounded)
encode the direction. `update_after_op` sets `OX` (overflow to ±∞) and `UX` (subnormal
result) but has no inexact-detection logic. Since most `*sx` operations on arbitrary inputs
require rounding to single precision, XX is almost always wrong (false zero). Games using
FPSCR polling to check exactness receive false "exact" results.
- **Canary parity**: Canary's `UpdateFPSCR` also does not set XX/FI/FR. Both share this gap.
- **Fix**: In `update_after_op` (or a post-`to_single` helper), compare the pre-round f64
result with the post-round f64 result. If they differ, set `XX`; inspect the difference sign
to set `FR`; set `FI = FR || (result was not exactly representable)`.
### PPCBUG-181 — fmaddsx / fnmaddsx missing VXISI check for add-phase ±∞ collision
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:2339-2348 (fmaddsx), 2383-2392 (fnmaddsx)
- **Symptom**: When `FRA × FRC = +∞` and `FRB = -∞` (or vice versa), PowerISA §4.3.4
requires `FPSCR[VXISI]` to be set and the result to be a QNaN. The double-precision sibling
`fmaddx` (line 2327) correctly calls `fpscr::check_invalid_add(ctx, a * c, b, false)` after
the multiply-check. `fmaddsx` omits this call entirely — only `check_invalid_mul` runs.
Games using fused-madd in dot-product accumulators that might overflow to ±∞ (e.g. lighting
accumulators with very large normals) lose the VXISI signal.
- **Fix**:
```rust
// inside fmaddsx arm, after check_invalid_mul:
fpscr::check_invalid_add(ctx, a * c, b, false);
```
Same for fnmaddsx (same operand pair, same `false` sense for the add).
### PPCBUG-182 — fmsubsx / fnmsubsx missing VXISI check for subtract-phase ±∞ collision
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:2361-2370 (fmsubsx), 2405-2414 (fnmsubsx)
- **Symptom**: When `FRA × FRC = ±∞` and `FRB = ±∞` with the same sign, `(±∞) (±∞)`
should fire `FPSCR[VXISI]`. Neither `fmsubsx` nor `fnmsubsx` calls `check_invalid_add`.
- **Fix**:
```rust
// inside fmsubsx arm, after check_invalid_mul:
fpscr::check_invalid_add(ctx, a * c, -b, false);
```
Same for fnmsubsx. The negated `b` turns the subtract into the add-form so that
`check_invalid_add(..., false)` uses the correct infinity-sign comparison.
### PPCBUG-183 — fnmaddsx / fnmsubsx NaN sign bit incorrectly flipped by Rust unary `-`
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:2388 (fnmaddsx), 2410 (fnmsubsx)
- **Symptom**: `to_single(ctx, -(a.mul_add(c, b)))` — Rust's unary `-f64` always flips the
IEEE sign bit, including when the value is NaN. PowerISA §4.3.2 specifies that the final
negation in `fnmadd`/`fnmsub` is NOT applied to a QNaN result: if the fused computation
yields a NaN (due to SNaN input, VXIMZ, or VXISI), the negation is skipped and the NaN is
propagated with its canonical sign unchanged. xenia-rs flips the sign bit of any NaN result,
producing a QNaN with the wrong sign. Observable by storing via `stfd` and inspecting bits.
Games using sign-bit NaN tagging (e.g. `0xFFC00000` vs `0x7FC00000` as distinct sentinels)
are affected.
- **Fix**:
```rust
// fnmaddsx arm:
let inner = a.mul_add(c, b);
let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
// fnmsubsx arm:
let inner = a.mul_add(c, -b);
let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
```
### PPCBUG-184 — fresx produces full-precision IEEE 1/b instead of ~12-bit hardware estimate
- **Severity**: HIGH
- **Status**: open
- **Location**: interpreter.rs:2481-2494
- **Symptom**: `fres` on Xenon hardware produces a reciprocal approximation via a 256-entry
LUT with linear interpolation, accurate to roughly 1/4096 relative error (~12 mantissa
bits). xenia-rs computes `to_single(1.0 / b)` — the fully IEEE-754 correctly-rounded
single-precision reciprocal. The result is up to ~4096× more accurate than hardware.
Newton-Raphson refinement code `x = fres(d); x = x*(2 - d*x)` is not broken by this (NR
converges even from an accurate seed), but code that checks the seed's error magnitude for
convergence termination, or that relies on `fres(d)*d ≠ 1.0` to decide whether to refine,
may take the wrong branch. Also, `fres(d)*d` on xenia is much closer to 1.0 than on hardware,
so a "was the estimate good enough?" check based on the residual will give wrong answers.
- **Canary parity**: Canary uses `f.Recip(f.Convert(frB, FLOAT32_TYPE))` — approximates by
first converting to f32 (quantizing the input), then applying the host reciprocal. Still
produces a fully-accurate IEEE single reciprocal rather than the 12-bit table estimate.
Both emulators share the deviation. Canary's conversion-first approach is slightly closer to
hardware (the input is quantized before the reciprocal), so if a future fix is desired,
Canary's approach is the better reference.
- **Fix (minimal viable)**: Pre-convert input to f32 to match Canary's quantization:
`let b32 = b as f32; to_single(ctx, 1.0_f64 / b32 as f64)`. This matches Canary but still
does not emulate the 12-bit LUT. Full fix requires an `fres` LUT matching Xenon's hardware
table (documented in Xbox 360 SDK / GamePPCLisa docs).
### PPCBUG-185 — FPSCR.NI flush-to-zero not modelled; subnormal results propagate through *sx
- **Severity**: MEDIUM
- **Status**: open
- **Location**: All *sx arms in interpreter.rs; fpscr.rs has `NI` not defined as a constant
- **Symptom**: Xenon firmware sets `FPSCR.NI = 1` at boot. With NI=1, the Xenon FPU flushes
subnormal inputs and results to the appropriate signed zero before and after every
floating-point operation. xenia-rs inherits the host x86 IEEE-754 default (NI=0), which
propagates subnormals. Subnormal differences: (a) subnormal FPR inputs are used as-is by
xenia vs. treated as ±0 by hardware; (b) subnormal results are stored by xenia vs. flushed
to ±0 by hardware. `update_after_op` sets `UX` when the result is subnormal, but does NOT
flush it. Games with NI-dependent behavior — most Xbox 360 titles compiled with default
Xenon ABI settings — may see different float results in subnormal-touching paths.
- **Canary parity**: Canary also inherits host IEEE NI=0 semantics. Both share this gap.
- **Fix**: After `to_single` (or the double-precision result), check `ctx.fpscr & fpscr::NI_BIT`
(needs a constant adding) and if set, flush subnormals: `if result.is_subnormal() { result =
result.signum() * 0.0 }`. Apply to inputs as well for strict correctness.
### PPCBUG-186 — SNaN → QNaN propagation relies on host SSE; not ISA-canonical for all *sx
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:2252-2414 (all arithmetic *sx arms without explicit SNaN guard)
- **Symptom**: When an SNaN input reaches `faddsx`/`fsubsx`/`fmulsx`/`fdivsx`, the code calls
`check_invalid_add/mul/div` (correctly sets VXSNAN) but then performs the operation on the
raw SNaN value: `a + b`, `a * c`, etc. On x86-64 SSE2, the hardware `ADDSD`/`MULSD` ops
produce a QNaN from the first SNaN operand (bit 51 set, other mantissa bits preserved). This
matches ISA §4.3.2.2 for the common case. However, for `mul_add` (VFMADD231SD on AVX), the
SNaN propagation priority may differ: the ISA specifies FRA takes priority over FRB, but
hardware FMA may use a different priority for the three-operand form. The `fsqrtsx` and
`fresx` arms handle SNaN explicitly (via `is_snan` check) but do not synthesize the correct
QNaN result — they rely on `b.sqrt()` / `1.0/b` to produce a NaN, which the host does.
This is a latent risk; active wrong-result cases require bit-level NaN inspection.
### PPCBUG-187 — Zero interpreter execution tests for all 10 group-29 opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs test module (no `#[test]` covers any *sx or fresx)
- **Symptom**: Regressions in rounding, FPSCR side effects, or operand-field decoding are
invisible to CI. The existing fpscr unit tests cover helper functions in isolation; no test
exercises the full `step()` path for any single-precision FPU opcode.
- **Recommended minimum** (12 tests — see group-29 report for encodings):
`fadds` exact; `fadds` VXISI; `fsubs` VXISI; `fmuls` 0×∞; `fdivs` ZX;
`fmadds` VXISI regression (PPCBUG-181); `fmsubs` VXISI regression (PPCBUG-182);
`fnmadds` NaN-sign (PPCBUG-183); `fnmsubs` NaN-sign (PPCBUG-183);
`fsqrts` negative input VXSQRT; `fsqrts` round-trip; `fres` basic reciprocal.
IDs PPCBUG-188 through PPCBUG-199 are unallocated — reserved for group 29 follow-up.
---
## Batch 6 (continued) — FPU arithmetic double (group 30)
Per-group report: `audit-out/group-30-fpu-double.md`.
Group 30 summary: **9 findings (PPCBUG-200..208). 2 MEDIUM cross-cutting, 3 MEDIUM opcode-specific, 4 LOW.** Result arithmetic is correct for all 10 opcodes. FPSCR infrastructure is partially wired: VXSNAN, OX, UX, ZX, VXISI (add/sub), VXIMZ, VXZDZ, VXIDI, VXSQRT all set correctly for the opcodes that need them. Critical gaps: (1) XX/FR/FI bits never set by any opcode — same gap as PPCBUG-180 but now confirmed on the double-precision path; (2) FPSCR.RN not honored for double arithmetic — single-precision has `round_to_single` but double has no equivalent; (3) fmsubx/fnmaddx/fnmsubx omit the VXISI check for ∞-collision in the add step; (4) fnmaddx/fnmsubx flip NaN sign bit via Rust `-` operator but ISA requires NaN sign preserved. frsqrtex uses full-precision 1/sqrt(b) instead of the hardware estimate — acceptable. All FMA forms use `f64::mul_add` for correct single-rounding semantics.
**9 IDs used (PPCBUG-200..208). 11 IDs unallocated (PPCBUG-209..219).**
### PPCBUG-200 — All group-30 opcodes: XX, FR, FI bits never set
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `fpscr.rs:184-194` (`update_after_op`); `interpreter.rs:2248,2268,2289,2310,2335,2357,2379,2401,2463,2510`
- **Symptom**: Same gap as PPCBUG-180 but confirmed for the double-precision path. `update_after_op` only tracks OX (overflow to infinity) and UX (subnormal). FPSCR[XX] (inexact sticky), FPSCR[FR] (round direction), and FPSCR[FI] (inexact for current op) are never updated by any group-30 opcode. Every double-precision arithmetic operation that rounds a non-representable result silently omits these bits.
- **Fix**: Same as PPCBUG-180 — read MXCSR exception flags after each f64 operation and map to FI/XX/FR. For double, no `to_single` step is involved so the comparison must be done via MXCSR or by a post-op bit-level comparison of inputs vs. result.
- **Test gap**: Zero tests verify XX set after any inexact double-precision operation.
### PPCBUG-201 — All group-30 opcodes: FPSCR.RN not honored for double arithmetic
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:2242-2512` (all 10 arms)
- **Symptom**: Host f64 operators always use nearest-even (host MXCSR default). `fpscr.rs` has a complete `rounding_mode(ctx)` helper and directed rounding helpers for single-precision (`round_to_single`), but no equivalent for double arithmetic. Guest `mtfsfi` RN changes have no effect on faddx/fsubx/fdivx/fsqrtx etc.
- **Fix**: Wrap each double-precision arithmetic arm with an MXCSR round-mode set/restore when `ctx.fpscr & fpscr::RN_MASK != 0`. Fast path (RN=0) stays zero-cost.
- **Test gap**: No test changes RN and verifies directed rounding on any double arithmetic opcode.
### PPCBUG-202 — fmaddx: non-FMA `a * c` used in check_invalid_add can spuriously raise/miss VXISI
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:2332`
- **Symptom**: `check_invalid_add(ctx, a * c, b, false)` uses a separate two-rounding multiply to approximate the FMA intermediate product. When the true FMA intermediate is finite but the standalone product overflows to ±∞, VXISI fires spuriously. When the true intermediate is ±∞ but the standalone product is finite (extreme cancellation), VXISI is missed.
- **Fix**: Derive VXISI from input-value properties directly: if `(a.is_infinite() || c.is_infinite())` (product is mathematically infinite) and `b.is_infinite()` with opposing sign → VXISI.
- **Test gap**: No test covers the large-value cancellation case in fmaddx.
### PPCBUG-203 — fmsubx, fnmaddx, fnmsubx: VXISI never raised for ∞-collision in add/sub step
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: `interpreter.rs:2354` (fmsubx), `2376` (fnmaddx), `2398` (fnmsubx)
- **Symptom**: Same pattern as PPCBUG-181/182 for the double-precision variants. These three arms call only `check_invalid_mul` and omit `check_invalid_add`. Per ISA, all four FMA variants must raise VXISI when the add step yields ∞+∓∞. Example for fmsub: `A×C = +∞`, `B = +∞` → `+∞ +∞` → VXISI. Currently the result NaN propagates silently with no FPSCR update. The fnmsub pattern is the canonical Newton-Raphson step — the most common FPU path in Xbox 360 graphics code.
- **Fix**: Add `fpscr::check_invalid_add(ctx, a * c, b, true)` for `fmsubx`/`fnmsubx` and `fpscr::check_invalid_add(ctx, a * c, b, false)` for `fnmaddx` (apply PPCBUG-202 sign-fix simultaneously).
- **Test gap**: Zero tests for VXISI on any of the three opcodes.
### PPCBUG-204 — fmaddx check_invalid_add sub-issue (sign logic reliant on imprecise product)
- **Severity**: LOW (sub-issue of PPCBUG-202)
- **Status**: open
- **Location**: `interpreter.rs:2332`
- **Symptom**: VXISI logic is internally consistent with the passed `a * c` value, but that value can have the wrong sign in extreme overflow/underflow cases. Resolve as part of PPCBUG-202.
### PPCBUG-205 — fnmaddx / fnmsubx: Rust `` flips NaN sign bit; ISA requires NaN sign preserved
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: `interpreter.rs:2377` (fnmaddx), `interpreter.rs:2399` (fnmsubx)
- **Symptom**: Same pattern as PPCBUG-183 for the double-precision variants. Rust's unary `-` applied to a NaN result flips the IEEE-754 sign bit. PowerISA Book I §4.3.4 states the negation is not applied to NaN results. Title code using NaN sentinels (audio middleware, debug fills) receives sign-flipped NaN payloads.
- **Fix**:
```rust
let fma = a.mul_add(c, b); // fnmaddx
let result = if fma.is_nan() { fma } else { -fma };
// and analogously for fnmsubx
```
- **Test gap**: No test exercises fnmaddx/fnmsubx with NaN-producing inputs to check sign of result NaN.
### PPCBUG-206 — frsqrtex edge cases correct; no code change needed (informational)
- **Severity**: LOW (confirmed clean, informational)
- **Status**: wontfix
- **Location**: `interpreter.rs:2496-2512`
- **Analysis**: ZX fires for ±0. VXSQRT guard correctly excludes -0.0. frsqrte(+∞)=+0 correct. Full-precision is acceptable over-precision.
- **Fix**: Add comment: `// Full-precision: hardware gives ~12-14 bit estimate. NR converges identically.`
- **Test gap**: Zero frsqrtex unit tests — add 4 (±0 inputs, negative input+VXSQRT, SNaN, +∞).
### PPCBUG-207 — FMA opcode OX logic correct, OX edge cases untested (informational)
- **Severity**: LOW (confirmed clean, informational)
- **Status**: wontfix
- **Location**: `interpreter.rs:2335,2357,2379,2401`
- **Analysis**: `inputs_were_finite` correctly suppresses OX when an input is already infinite. OX fires when all inputs are finite but the FMA result overflows — ISA-correct.
- **Test gap**: Zero tests for OX scenario in any FMA opcode.
### PPCBUG-208 — Zero tests for fsubx, fdivx, fmsubx, fnmaddx, fnmsubx, fsqrtx, frsqrtex
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` test module
- **Symptom**: 7 of 10 group-30 opcodes have zero tests. `faddx` has 1 happy-path test; `fmulx` has 1; `fmaddx` has 1. None have FPSCR/Rc=1/edge-case coverage.
- **Recommended minimum** (12 tests): `fsubx` normal; `fsubx` VXISI; `fdivx` normal; `fdivx` ZX; `fdivx` VXZDZ; `fmsubx` normal; `fnmaddx` normal; `fnmsubx` normal; `fnmaddx` NaN-sign regression (PPCBUG-205); `fsqrtx` normal; `fsqrtx` negative+VXSQRT; `frsqrtex` positive.
IDs PPCBUG-209 through PPCBUG-219 are unallocated — reserved for group 30 follow-up.
---
## Pending batches
- Batch 2: groups 6-11 — logical immediate, logical register, sign-extend/CLZ, word rotate, doubleword rotate, shift.
- Batch 3: groups 12-17 — compare, branch, trap+sc, CR logical, SPR/MSR, cache+sync.
- Batch 4: groups 18-23 — loads (byte, halfword, word, doubleword, multiple/string, float).
- Batch 5 (partial): groups 24, 26, 27, 28 done; group 25 (store word) pending.
- Batch 6 (partial): groups 29, 30 done; group 31 (FPU convert/compare) pending.
- Batch 7: groups 32-34 — VMX integer (add/sub, compare/min/max, logical/shift).
- Batch 8: groups 35-38 — VMX permute/pack, VMX float, VMX multiply-sum, VMX load/store.
- Phase C: decoder field extractors, decoder opcode-lookup, disassembler formatter parity.
- Phase D: this file gets re-sorted by severity and finalized.
---
## Batch 6 (continued) — FPU sign/move/compare/convert/round (group 31)
Per-group report: `audit-out/group-31-fpu-misc.md`.
Group 31 summary: **9 findings (PPCBUG-221..231; IDs 220/222/226 retracted after analysis).
1 HIGH, 3 MEDIUM, 5 LOW.** The sign-bit manipulation family (`fabsx`, `fnegx`, `fnabsx`, `fmrx`)
and `fselx` are all ISA-correct — Rust arithmetic maps to bit-level operations that preserve SNaN
payloads. `fcmpu` is correct (FPRF and VXSNAN set; no spurious VXVC). The conversion group is
mostly correct for result values and overflow sentinels; the main gaps are FPSCR inexact/FR/FI
tracking (shared with groups 29/30) and one subtle NearestEven tie-breaking defect in
`round_to_i64` that affects `fctidx`. `fcmpo` silently omits VXSNAN/VXVC despite having a
comment acknowledging the gap.
**9 IDs used (PPCBUG-221, 223, 224, 225, 227, 228, 229, 230, 231). IDs 220/222/226 retracted.
IDs PPCBUG-232..239 unallocated.**
### PPCBUG-221 — `fctidx` / `round_to_i64` NearestEven tie-breaking uses f64::EPSILON; broken for |v| > 2^52
- **Severity**: HIGH
- **Status**: open
- **Location**: `fpscr.rs:220238` (`round_to_i64`, `NearestEven` case)
- **Symptom**: The tie-breaking code computes `diff = (v - v.trunc()).abs()` and tests
`(diff - 0.5).abs() < f64::EPSILON` to detect a half-integer. Above `|v| = 2^52`,
`v.trunc() == v` for all representable f64 values (all are exact integers), so `diff == 0.0`
and the tie-breaking branch is never taken — the code falls through to `v.round() as i64`,
which is round-half-away-from-zero instead of round-half-to-even. Every fctid call on a
large odd half-integer (e.g. `(2^52 + 1).5`) produces the wrong integer. In practice these
exact 0.5 cases are rare for large values but can appear in audio sample-count arithmetic
and physics fixed-point pipelines.
- **Fix**: replace the NearestEven arm with a fractional-part-only tie check that is exact for
|v| <= 2^52 and degenerates correctly to truncation above 2^52:
```rust
RoundingMode::NearestEven => {
let t = v.trunc();
let frac = v - t; // exact for |v| <= 2^52; ==0 above (already integer)
let fa = frac.abs();
if fa > 0.5 { t as i64 + if v >= 0.0 { 1 } else { -1 } }
else if fa < 0.5 { t as i64 }
else {
// Exact 0.5 tie — round to even.
let fi = t as i64;
if fi & 1 == 0 { fi } else { fi + if v >= 0.0 { 1 } else { -1 } }
}
}
```
- **Test gap**: add `round_to_i64` tests in `fpscr.rs:tests`: 0.5→0, 1.5→2, 2.5→2, 3.5→4,
-0.5→0, -1.5→-2. Existing tests cover 2.5→2 and 3.5→4 (currently accidentally correct).
### PPCBUG-223 — `fcmpo` omits FPSCR[VXSNAN] and FPSCR[VXVC] on NaN operands
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:26452675`
- **Symptom**: `fcmpo` body is identical to `fcmpu` — it sets FPRF and the CR field correctly
but calls no `fpscr::set_exception`. PowerISA requires: QNaN → `FPSCR[VXVC, VX, FX]`;
SNaN → additionally `FPSCR[VXSNAN]`. `fcmpu` correctly sets VXSNAN for SNaN; `fcmpo` does
not. A comment in the source acknowledges "not modeled yet."
- **Impact**: `fcmpo.` (Rc=1) checking CR1.FX after a NaN compare will see FX=0 instead of
FX=1. `mffsx` after `fcmpo` will not reflect VXVC. Xbox 360 CRT comparison primitives
(`islessgreater`, ordered relational operators) use `fcmpo`.
- **Fix**:
```rust
if fra.is_nan() || frb.is_nan() {
ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: false, so: true };
if fpscr::is_snan(fra) || fpscr::is_snan(frb) {
fpscr::set_exception(ctx, fpscr::VXSNAN | fpscr::VXVC);
} else {
fpscr::set_exception(ctx, fpscr::VXVC);
}
}
```
### PPCBUG-224 — `fcfidx` does not set FPSCR[XX/FX] for inexact i64→f64 conversion
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:25282536`
- **Symptom**: Only FPRF is updated. Per ISA, `fcfid` sets `FPSCR[XX, FX]` (and FR/FI) when
the i64 value has more than 53 significant bits and precision is lost. Any i64 with
`|v| > 2^53` triggers inexact. Common trigger: large frame/sample counters, address values.
- **Fix**: after the conversion, compare `(result as i64) != (bits as i64)` and call
`fpscr::set_exception(ctx, fpscr::XX)` if inexact.
### PPCBUG-225 — `frspx` does not set FPSCR[XX/FX/FR/FI] on inexact rounding
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:25162527`
- **Symptom**: `update_after_op` sets OX/UX only. The ISA requires FR/FI/XX/FX on any f64→f32
rounding that is not exact. `frsp` is the canonical double→single-precision narrowing idiom
in compiler output — virtually every call is inexact.
- **Fix**: after `to_single`, compare result vs b; if different and both finite, call
`fpscr::set_exception(ctx, fpscr::XX | fpscr::FI | ...)` with FR set if magnitude increased.
### PPCBUG-227 — `fctiwx` rounding: `round_to_i32` inherits NearestEven defect via `round_to_i64`
- **Severity**: LOW
- **Status**: open
- **Location**: `fpscr.rs:241243`
- **Symptom**: `round_to_i32` calls `round_to_i64` then clamps. The PPCBUG-221 defect in
`round_to_i64` does not manifest for i32-range values (the epsilon check accidentally works
at this scale), but the structural fragility is inherited. Fixing PPCBUG-221 cures this.
- **Recommendation**: add unit tests `round_to_i32(0.5)==0`, `round_to_i32(1.5)==2`,
`round_to_i32(2.5)==2` to verify correct round-to-even behavior.
### PPCBUG-228 — Zero interpreter execution tests for fabsx/fnegx/fnabsx/fmrx/fselx/fcmpo/fcfidx/fctidx/fctidzx/frspx
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` module
- **Symptom**: 10 of the 13 group-31 opcodes have zero dedicated tests. `test_fcmpu` covers
only the ordered comparison `5.0 > 3.0`. `test_fctiwzx` covers one positive truncation.
`test_fadd`/`test_fmul` are group-30 tests, not group-31.
- **Recommended minimum**: SNaN-preservation test for fabsx/fnegx/fnabsx; fselx with NaN/0/1;
fcmpo QNaN→VXVC (after PPCBUG-223 fix); fcfidx exact and inexact; fctidx tie cases; frspx
inexact → XX set (after PPCBUG-225 fix); fctiwx nearest-even tie; fctiwzx NaN sentinel.
### PPCBUG-229 — `fctidx` / `fctidzx` do not set FPSCR[XX/FX] for inexact inputs
- **Severity**: LOW
- **Status**: open
- **Locations**: `interpreter.rs:25372574`
- **Symptom**: Per ISA, float-to-integer conversions set `FPSCR[XX, FX]` when the source
value is not an integer (the fractional part is discarded). Neither opcode sets XX.
Shared root cause with PPCBUG-224/225.
### PPCBUG-230 — `fctiwx` / `fctiwzx` do not set FPSCR[XX/FX] for inexact inputs
- **Severity**: LOW
- **Status**: open
- **Locations**: `interpreter.rs:25752612`
- **Symptom**: Same omission as PPCBUG-229 for the word-width conversion pair.
### PPCBUG-231 — `frspx` SNaN input result written as QNaN (host platform dependency)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:25192524`
- **Symptom**: Rust's `as f32` (CVTSD2SS) can set the quiet bit on SNaN input, producing a
QNaN in the FPR. Per ISA, `frsp` on SNaN should quieten it — so the QNaN result is
correct in kind. The risk is that the exact QNaN bit-pattern may differ from PPC's
canonical quietening (which ORs bit 22 into the f32 mantissa). Game code inspecting the
NaN payload after frsp may see a different payload. Same structural root cause as
PPCBUG-128 (`lfs` SNaN quietening), but lower severity because frsp IS arithmetic.
IDs PPCBUG-232 through PPCBUG-239 are unallocated — no further bugs found in group 31.
---
## Batch 7 — VMX integer add/sub (group 32)
Per-group report: `audit-out/group-32-vmx-int-addsub.md`.
**Scope**: `vaddubm`, `vaddubs`, `vadduhm`, `vadduhs`, `vadduwm`, `vadduws`, `vaddsbs`, `vaddshs`,
`vaddsws`, `vaddcuw`, `vsububm`, `vsububs`, `vsubuhm`, `vsubuhs`, `vsubuwm`, `vsubuws`, `vsubsbs`,
`vsubshs`, `vsubsws`, `vsubcuw`.
**Overall verdict**: All 20 opcodes are arithmetically correct. No HIGH-severity bugs found.
Lane indexing (big-endian, PPC element 0 = `Vec128::bytes[0]`), saturation arithmetic, VSCR.SAT
sticky-set, and vaddcuw/vsubcuw carry/borrow semantics are all implemented correctly.
4 LOW-severity findings (2 test gaps, 1 code organization, 1 API hazard).
### PPCBUG-240 — 18 of 20 group-32 opcodes have zero interpreter-level tests
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` module
- **Symptom**: Only `test_vaddubs_saturates_and_sets_vscr_sat` covers any group-32 opcode.
`vaddubm`, `vsububm`, `vadduhm`, `vsubuhm`, `vadduwm`, `vsubuwm`, `vaddsbs`, `vsubsbs`,
`vadduhs`, `vsubuhs`, `vaddshs`, `vsubshs`, `vadduws`, `vsubuws`, `vaddsws`, `vsubsws`,
`vaddcuw`, `vsubcuw` — all 18 have no tests. No high risk today but no regression guard.
- **Recommended minimum**: wrap-around test (byte, halfword, word); sat-at-max and sat-at-min tests;
VSCR.SAT sticky-set across two successive saturating instructions; vaddcuw carry lane; vsubcuw
no-borrow lane.
### PPCBUG-241 — `vadduwm` / `vsubuwm` stranded in a separate section from the rest of group-32
- **Severity**: LOW (maintenance hazard)
- **Status**: open
- **Location**: `interpreter.rs:20902104` (stranded) vs. `interpreter.rs:2784` (§4a group-32 section)
- **Symptom**: The two word-modulo opcodes are matched 700 lines above the rest of the group, with
only a comment at line 2819 as a cross-reference. A future sweep of §4a for group-32 changes
would miss them.
- **Fix**: Move both arms into §4a and remove the comment at line 2819.
### PPCBUG-242 — `set_vscr_sat(false)` can non-stickily clear SAT from arithmetic handlers
- **Severity**: LOW (API hazard)
- **Status**: open
- **Location**: `context.rs:252259`
- **Symptom**: `set_vscr_sat(bool)` accepts `false`, which would clear the sticky SAT bit. All
current arithmetic callers pass `true` only (inside `if sat { ... }` guards), so no mis-clear
occurs today. But the API is misleading — a future saturating handler that writes
`set_vscr_sat(lane_sat)` with `lane_sat = false` would silently clear a previously-set bit.
- **Fix**: Rename to `sticky_set_vscr_sat()` (no bool argument, always ORs). Retain
`force_vscr_sat(bool)` for `mtvscr`.
### PPCBUG-243 — `vmx.rs` saturation helpers: u16/i16/u32/i32 variants have zero unit tests
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `crates/xenia-cpu/src/vmx.rs:705799`
- **Symptom**: `vmx.rs` tests cover 5 cases of `sat_add/sub_i8/u8`. The 8 helpers for wider
types (`sat_add_u16`, `sat_sub_u16`, `sat_add_i16`, `sat_sub_i16`, `sat_add_u32`, `sat_sub_u32`,
`sat_add_i32`, `sat_sub_i32`) are mathematically correct but unguarded by any test. Recommended
additions listed in the per-group report.
IDs PPCBUG-244 through PPCBUG-274 are unallocated — no further bugs found in group 32.
---
## Batch 7 — VMX integer compare / min / max / avg (group 33)
Per-group report: `audit-out/group-33-vmx-int-compare.md`.
### PPCBUG-275 — All VC-form vector compare dot forms: `rc_bit()` reads wrong bit; CR6 never updated
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Affected opcodes**: `vcmpequb.`, `vcmpequh.`, `vcmpgtsb.`, `vcmpgtsh.`, `vcmpgtub.`, `vcmpgtuh.`
- **Location**: `decoder.rs:75` + `interpreter.rs:3318`, `3331`, `3344`, `3357`, `3370`, `3383`
- **Symptom**: `rc_bit()` is implemented as `self.raw & 1 != 0` (reads LSB = bit 0 of the word).
For VC-form instructions the Rc flag is at **PPC bit 21 = LSB bit 10**, not bit 0. Bit 0 is
the LSB of the 10-bit XO field. All integer compare XO values are even (XO=6, 70, 518, 774, 582, 838),
so their bit 0 is always 0. The CR6 update block is **unconditionally dead** regardless of
whether the programmer wrote the dot form. `vcmpequb. vMask, vData, vNeedle` + `bc 12,26`
(branch on CR6.LT = all-true) is the canonical AltiVec memchr idiom; it will always fall through.
- **Fix**:
```rust
// decoder.rs — add:
/// Rc bit for VC-form vector compare instructions (PPC bit 21 = LSB bit 10).
#[inline] pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }
```
Replace `instr.rc_bit()` with `instr.vc_rc_bit()` at interpreter.rs:3318, 3331, 3344, 3357,
3370, 3383.
### PPCBUG-276 — `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`: same VC-form Rc bug
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Affected opcodes**: `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`
- **Location**: `interpreter.rs:2237`, `3396`, `3406`
- **Symptom**: Same root cause as PPCBUG-275. XO for vcmpequw=134, vcmpgtuw=646, vcmpgtsw=902 —
all even, bit 0 always 0. Word-compare dot forms never update CR6. `vcmpequw128` uses the
VMX128_R Rc encoding which also likely reads the wrong bit.
- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:2237, 3396, 3406. Separately verify
VMX128_R Rc bit position for `vcmpequw128` (may require its own extractor).
### PPCBUG-277 — Zero tests for all `vcmp*` dot forms and CR6 correctness
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` module
- **Symptom**: No test exercises any of the 10 integer vector compare opcodes. Critical missing:
`vcmpequb.` all-true → CR6.LT=1; `vcmpequb.` all-false → CR6.EQ=1; `vcmpgtsb` signed
boundary (0x80 vs 0x7F must yield false, not true); `vcmpgtsh` at 0x8000 vs 0x7FFF.
### PPCBUG-278 — Zero tests for all 12 `vmax*` / `vmin*` opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` module
- **Symptom**: None of vmaxub/uh/uw/sb/sh/sw, vminub/uh/uw/sb/sh/sw are tested. Critical missing:
`vmaxsb(0x80, 0x7F)` = 0x7F (signed max of -128 and +127); `vminsb(0x80, 0x7F)` = 0x80.
Without these, signed vs unsigned confusion in min/max would not be caught.
### PPCBUG-279 — Zero tests for all 6 `vavg*` opcodes; no signed-boundary or rounding coverage
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` module; `vmx.rs` test module
- **Symptom**: `avg_u8` through `avg_i32` helpers have no unit tests. Key rounding case:
`avg_u8(0, 1)` must be 1 (round up), not 0 (truncation). `avg_i32(i32::MIN, i32::MIN)` must
be `i32::MIN` without overflow.
IDs PPCBUG-280 through PPCBUG-314 are unallocated — no further bugs found in group 33.
---
## Batch 6 — VMX integer logical / shift / rotate / select (group 34)
Per-group report: `audit-out/group-34-vmx-logic-shift.md`.
Group 34 summary: the bitwise logical ops (vand/vandc/vor/vxor/vnor and their 128 variants)
are all ISA-correct — Vec128 is `[u8; 16]` with no padding bits, so `!(u32)` flips exactly
32 bits per lane with no upper-bit pollution (the PPCBUG-029/030/031 class does not apply to
VMX register files). The per-lane shifts (vslb/vsrb/vsrab, vslh/vsrh/vsrah, vslw/vsrw/vsraw
and their 128 variants) all correctly mask the shift count to the lane width before shifting;
vsraw uses i32 arithmetic right shift which is correctly defined in Rust for shift-by-31.
The per-lane rotates (vrlb/vrlh/vrlw and 128 variants) are correct. The whole-register bit
shifts (vsl/vsr) and whole-register byte shifts (vslo/vsro and 128 variants) correctly
extract the shift count from VB.b[15] with the proper bit masks. vsel and vsel128 are correct
including the read-before-write ordering on vsel128's vc=vd aliasing.
**One HIGH bug found**: vrlimi128 extracts both the rotate-amount (z) field and the
blend-mask (IMM) field from the wrong bit positions of the instruction word.
**0 MEDIUM bugs with code change needed. 1 HIGH. 10 LOW (test gaps and informational).**
### PPCBUG-315 — vrlimi128 z and IMM fields extracted from wrong bit positions
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: interpreter.rs:35513552
- **Symptom**: `shift = ((instr.raw >> 16) & 0x3)` reads integer bits 1617 — the low 2 bits
of the 5-bit IMM (blend-mask) field — instead of the 2-bit `z` (rotate) field at integer
bits 67. `mask = (instr.raw >> 2) & 0xF` reads integer bits 25 — VD128h extension bits
and a reserved field — instead of the low 4 bits of IMM at integer bits 1619.
**Every `vrlimi128` executes with a wrong rotate amount and a wrong per-word select mask.**
The only benign case is the degenerate encoding where `z == IMM[1:0]` and the garbage mask
happens to equal the intended mask — unlikely in real code.
- **VX128_4 field layout** (LSB-0 integer bit numbering after PPC big-endian byte-swap to host):
- `VD128l : 5` at integer bits 2125 (PPC bits 610)
- `IMM : 5` at integer bits 1620 (PPC bits 1115) — blend mask, 4 bits used
- `VB128l : 5` at integer bits 1115 (PPC bits 1620)
- `z : 2` at integer bits 67 (PPC bits 2425) — rotate amount 0..3
- `VD128h : 2` at integer bits 23 (PPC bits 2829)
- **Fix**:
```rust
let shift = ((instr.raw >> 6) & 0x3) as usize; // z field: integer bits 6-7
let mask = (instr.raw >> 16) & 0xF; // IMM low 4 bits: integer bits 16-19
```
- **Canary reference**: `ppc_decode_data.h:585608` `FormatVX128_4`; `ppc_emit_altivec.cc:1318,1324`.
- **Note**: the rotate logic (`b[(shift + i) % 4]`) and mask-select logic (`(mask >> (3-i)) & 1`)
in the interpreter body are ISA-correct — only the field extraction is wrong.
- **Test gap**: no interpreter execution test for vrlimi128 (PPCBUG-325).
### PPCBUG-316 — Zero interpreter execution tests for vslb/vsrb/vsrab (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:34403463
### PPCBUG-317 — Zero interpreter execution tests for vslh/vsrh/vsrah (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:34723503
### PPCBUG-318 — vslo/vsro byte-shift count max is 15 (correct; informational)
- **Severity**: LOW (informational / wontfix)
- **Status**: wontfix
- `N` is a 4-bit field; max shift is 15 bytes = 120 bits (not 128). VD retains
the 8 LSBs of VA in position [127:120] at N=15. ISA-correct.
### PPCBUG-319 — vsel128 vc=vd read-before-write ordering (correct; informational)
- **Severity**: LOW (informational / wontfix)
- **Status**: wontfix
- `c = ctx.vr[vc]` is read before `ctx.vr[vd] = result`. Correctly sequenced.
### PPCBUG-320 — Zero interpreter execution tests for vslw/vsrw/vsraw/vrlw (+128 variants)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:21082155
### PPCBUG-321 — Zero interpreter execution tests for vsl/vsr
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:35083521
### PPCBUG-322 — Zero interpreter execution tests for vslo/vsro (+128 variants)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:35233541
### PPCBUG-323 — Zero interpreter execution tests for vand/vandc/vor/vxor/vnor (+128 variants)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:19001944
### PPCBUG-324 — Zero interpreter execution tests for vsel/vsel128
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:19451967
### PPCBUG-325 — Zero interpreter execution tests for vrlb/vrlh/vrlw/vrlimi128 (+128 variants)
- **Severity**: LOW (test gap; fix PPCBUG-315 before writing vrlimi128 tests)
- **Status**: open
- **Location**: interpreter.rs:34643503, 21442155, 35503565
IDs PPCBUG-326 through PPCBUG-354 are unallocated — no further bugs found in group 34.
---
## Batch 8 — VMX permute / merge / splat / pack / unpack (group 35)
Per-group report: `audit-out/group-35-vmx-permute.md`.
**Summary**: 5 HIGH, 3 MEDIUM, 9 LOW. Four VX128_* field-extraction bugs; one missing post-pack permutation logic; VSCR.SAT and pack saturation semantics are all correct. Zero interpreter tests for any group-35 opcode.
### PPCBUG-360 — vperm128: VC register read from wrong field (vd128() instead of VX128_2 VC bits 23-25)
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:1979`
- **Symptom**: `vperm128` uses the VX128_2 instruction form. The permute-control register VC is a 3-bit field at PPC bits 23-25 (LSB integer bits 6-8). The code does `vc = instr.vd128()` which reads PPC bits 6-10 + 21-22 — a completely different set of bits. Every `vperm128` therefore permutes with a control vector read from the wrong register, producing garbage output. `vperm128` is one of the most-used VMX128 ops in Xbox 360 graphics code (texture/vertex data layout).
- **Fix**:
```rust
// decoder.rs — add accessor:
#[inline] pub fn vc128_2(&self) -> usize { ((self.raw >> 6) & 0x7) as usize }
// interpreter.rs:1979 — replace:
vc = instr.vc128_2(); // VX128_2 VC field at PPC bits 23-25
```
- **ISA ref**: `ppc-manual/vmx/vperm.md`, VX128_2 encoding; `ppc_decode_data.h:541-561`; `ppc_emit_altivec.cc:1203-1204` (`VX128_2_VC`).
### PPCBUG-361 — vsldoi128: SH field MSB reads bit 4 (reserved) instead of bit 9
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:2012`
- **Symptom**: VX128_5 SH is a 4-bit field at LSB integer bits 6-9. Code does `((raw >> 6) & 0x7) | (((raw >> 4) & 0x1) << 3)`. This reads bit 4 (a reserved field, always 0 in valid encodings) as the MSB of SH instead of bit 9. Shifts of 8-15 bytes silently resolve as shifts of 0-7 bytes. `vsldoi128` with `SH >= 8` (common in vector rotation patterns) always produces the wrong result.
- **Fix**:
```rust
let sh = ((instr.raw >> 6) & 0xF) as usize; // SH field: integer bits 6-9
```
- **ISA ref**: `ppc-manual/vmx/vsldoi.md`, VX128_5 encoding; `ppc_decode_data.h:609-634`; canary `VX128_5_SH`.
### PPCBUG-362 — vpermwi128: PERMh (high 3 bits of 8-bit PERM immediate) read from VD128l bits instead of bits 6-8
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:4089`
- **Symptom**: VX128_P PERM = `PERMl[4:0] | (PERMh[2:0] << 5)` where PERMl is at integer bits 16-20 and PERMh is at integer bits 6-8. Code does `(raw >> 16) & 0xFF` which reads bits 16-23. Bits 21-23 are VD128l[4:2], not PERMh. The top 3 bits of the 8-bit PERM immediate are wrong; output word lane selections for lanes 0 and 1 are controlled by garbage bits. Same pattern as PPCBUG-315.
- **Fix**:
```rust
let imm = ((instr.raw >> 16) & 0x1F) | (((instr.raw >> 6) & 0x7) << 5); // VX128_P PERM
```
- **ISA ref**: `ppc_decode_data.h:664-686`; `ppc_emit_altivec.cc:1214`.
### PPCBUG-363 — vpkd3d128: post-pack permutation (pack + z fields) entirely absent; output always placed in wrong lane when pack != 0
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:3783-3808`
- **Symptom**: Canary's `vpkd3d128` does three things: (1) pack VB by type, (2) permute the result with the existing VD register using a control determined by `pack` (IMM[1:0]) and `shift` (z field at integer bits 6-7), (3) store. Xenia-rs does only (1) and (3), skipping the entire lane-placement permutation. When `pack != 0`, the packed value must be merged into a specific 32-bit or 64-bit slot of VD — this merge never happens. `pack=0` is the only safe case. Most D3D vertex pack sequences use `pack=1` (32-bit slot) with varying `shift`.
- **Fix**: Extract `pack = uimm & 3` and `shift = (instr.raw >> 6) & 3` (z field), read existing `ctx.vr[vd]`, apply the permutation table from `ppc_emit_altivec.cc:2125-2188`, write back.
- **ISA ref**: `ppc_emit_altivec.cc:2088-2191`.
### PPCBUG-364 — vsldoi (non-128): correct; PPCBUG-365 — vsplt*: correct; informational
- **Severity**: LOW (wontfix)
- **Status**: wontfix
- `vsldoi` correctly extracts SH as `(raw >> 6) & 0xF`. `vspltb/vsplth/vspltw` correctly read UIMM from the VA position (integer bits 16-20, masked to lane width). No bugs.
### PPCBUG-366 — vspltisb / vspltish: sign-extension idiom is correct but non-obvious; future regression risk
- **Severity**: MEDIUM
- **Status**: open (clarity fix recommended)
- **Location**: `interpreter.rs:2059-2060`, `2064-2066`
- **Symptom**: `simm | !0x1F` where `simm` is typed `i8`/`i16` is functionally correct (Rust narrows `!0x1F` to the target type), but the pattern is fragile under refactoring. Recommend:
```rust
let simm = (((instr.raw >> 16) & 0x1F) as i32).wrapping_shl(27).wrapping_shr(27) as i8;
```
### PPCBUG-367 — vupkhpx / vupklpx: channel replication vs zero-extend divergence; canary is unimplemented
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `vmx.rs:318-330`
- **Symptom**: `unpack_pixel_555` replicates 5-bit RGB channels (`r << 3 | r >> 2`) to fill 8 bits. ISA specifies zero-extension into bits 7:3, leaving bits 2:0 as zero. The replicate approach produces slightly different values (and slightly higher values), diverging from hardware.
- **Fix**: `let r8 = r << 3;` (drop the `| r >> 2` replication term).
### PPCBUG-368 — vpkpx: pack_pixel_555 channel assignment unverified against hardware; canary comparison inconclusive
- **Severity**: MEDIUM
- **Status**: open (needs hardware trace or more detailed canary analysis)
- **Location**: `vmx.rs:310-316`
- **Symptom**: The xenia-rs layout comment says R=bits 8-15, G=16-23, B=24-31. Canary's `vkpkx_in_low` uses different shift amounts (`>> 9` for R, `>> 6` for G, `>> 3` for B), suggesting either a different input layout assumption or the channels are swapped. Without a hardware reference, cannot determine which is authoritative.
### PPCBUG-369 — vpkd3d128 z-field not extracted (sub-issue of PPCBUG-363)
- **Severity**: LOW (tracked under PPCBUG-363)
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:3785`
- The `z` field (VX128_4, integer bits 6-7) is never extracted. Correct extraction: `(instr.raw >> 6) & 0x3`.
### PPCBUG-370 — Zero interpreter tests for vperm / vperm128 (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:1970-1995`
### PPCBUG-371 — Zero interpreter tests for vsldoi / vsldoi128 (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:1997-2020`
### PPCBUG-372 — Zero interpreter tests for vpermwi128 (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:4087-4099`
### PPCBUG-373 — Zero interpreter tests for vmrghb / vmrglb / vmrghh / vmrglh (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:3570-3600`
### PPCBUG-374 — Zero interpreter tests for vspltb / vsplth / vspltw / vspltw128 (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:2022-2048`
### PPCBUG-375 — Zero interpreter tests for vspltisb / vspltish / vspltisw / vspltisw128 (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:2050-2068`
### PPCBUG-376 — Zero interpreter tests for all vpk* (16 ops) + VSCR.SAT coverage (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:3607-3718`
### PPCBUG-377 — Zero interpreter tests for vupkhsb / vupklsb / vupkhsh / vupklsh (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:3722-3754`
### PPCBUG-378 — Zero interpreter tests for vpkd3d128 / vupkd3d128 (test gap; blocked on PPCBUG-363)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:3783-3835`
IDs PPCBUG-379 through PPCBUG-419 are unallocated — no further bugs found in group 35.
---
## Batch 9 — VMX float arithmetic / compare / convert / estimate (group 36)
Per-group report: `audit-out/group-36-vmx-float.md`.
Group 36 summary: **21 findings (PPCBUG-420..440). 6 HIGH, 8 MEDIUM, 7 LOW.** The most
critical bugs are: (1) four VMX float compare VC-form opcodes use `rc_bit()` (bit 0) instead
of the correct VC-form Rc bit (bit 10) — CR6 is never updated, same root cause as PPCBUG-275;
(2) vmaddfp128 and vmaddcfp128 have their multiplicand and accumulator operands swapped —
every matrix multiply / Newton-Raphson step using these opcodes produces the wrong result;
(3) VMX128_R dot-form compares (vcmpeqfp128. etc.) decode as Invalid due to missing key4
entries in decode_op6.
**6 HIGH, 8 MEDIUM, 7 LOW. 21 IDs used (PPCBUG-420..440). 39 IDs unallocated (PPCBUG-441..479).**
### PPCBUG-420 — vcmpeqfp / vcmpgefp / vcmpgtfp: `rc_bit()` reads wrong bit; CR6 never updated
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Affected opcodes**: `vcmpeqfp.`, `vcmpgefp.`, `vcmpgtfp.`
- **Location**: `interpreter.rs:1875`, `1885`, `1895`
- **Symptom**: `rc_bit()` = `self.raw & 1` reads LSB bit 0. For VC-form the Rc flag is at
PPC bit 21 = LSB bit 10. All XO values (vcmpeqfp=198, vcmpgefp=454, vcmpgtfp=710) have
bit 0 = 0, so CR6 is never updated for any float compare dot form. `vcmpeqfp.` + `bc 12,24`
(branch all-equal) always falls through.
- **Cross-reference**: PPCBUG-275 (identical root cause for integer vcmp). Canary reads
`i.VXR.Rc` (ppc_emit_altivec.cc:625, 633, 641).
- **Fix**: Add `pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }` to
`decoder.rs` and replace `instr.rc_bit()` at interpreter.rs:1875, 1885, 1895.
### PPCBUG-421 — vcmpbfp: `rc_bit()` reads wrong bit (VC-form); Rc gate permanently dead
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:3428`
- **Symptom**: Same root cause as PPCBUG-420. XO=966, bit 0 = 0; CR6 update never fires
for `vcmpbfp.`. The CR6 value logic (`eq = !any_out`) is correct; only the gate is wrong.
- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:3428.
### PPCBUG-422 — vcmpeqfp128 / vcmpgefp128 / vcmpgtfp128 / vcmpbfp128: `rc_bit()` reads wrong bit (VX128_R-form)
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:1875`, `1885`, `1895`, `3428` (shared arms with non-128 forms)
- **Symptom**: For VX128_R-form, Rc is at PPC bit 27 = LSB bit 4 (confirmed from canary's
`VX128_R` bitfield: `uint32_t Rc : 1` at bit 4 from LSB). `rc_bit()` reads bit 0. Fix
PPCBUG-423 first (dot forms decode as Invalid before this even matters).
- **Fix**: Add `pub fn vx128r_rc_bit(&self) -> bool { (self.raw >> 4) & 1 != 0 }` and use
it in the VX128_R compare arms.
### PPCBUG-423 — vcmpeqfp128. / vcmpgefp128. / vcmpgtfp128. / vcmpbfp128.: dot forms decode as `Invalid`
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs:640-648` (decode_op6 VMX128 compare key4 table)
- **Symptom**: decode_op6 extracts `key4 = (bits22-24 << 3) | bit27`. When Rc=1, PPC bit 27
is set, making key4 = non-dot value + 1. Dot-form key4 values (1, 9, 17, 25, 33) are all
absent from the match table. Decoder returns `PpcOpcode::Invalid`. Any game shader using a
VMX128-form float compare dot form traps with unimplemented opcode.
- **Fix**: Add dot-form entries to the key4 match table mapping to the same opcodes (the
interpreter arm uses `instr.vx128r_rc_bit()` to conditionally update CR6):
```rust
0b000001 => return PpcOpcode::vcmpeqfp128,
0b001001 => return PpcOpcode::vcmpgefp128,
0b010001 => return PpcOpcode::vcmpgtfp128,
0b011001 => return PpcOpcode::vcmpbfp128,
0b100001 => return PpcOpcode::vcmpequw128,
```
### PPCBUG-424 — vmaddfp128: operand swap — computes VA×VB+VD instead of VA×VD+VB
- **Severity**: HIGH
- **Status**: applied (52ece4b, 2026-05-02)
- **Location**: `interpreter.rs:1771` (`r[i] = ai.mul_add(bi, di)`)
- **Symptom**: Canary (ppc_emit_altivec.cc:806-809) documents `(VD) <- (VA × VD) + VB` and
routes as `MulAdd(VA, VD, VB)`. Xenia-rs reads VA, VB, VD then computes
`ai.mul_add(bi, di)` = `VA × VB + VD` — VB and VD roles swapped. Every shader using
vmaddfp128 for matrix multiply or Newton-Raphson accumulation accumulates the wrong value.
The existing denorm-flush test aliases vA=vD=v2, making the swap invisible.
- **Fix**: `r[i] = ai.mul_add(di, bi);`
### PPCBUG-425 — vmaddcfp128: operand swap — computes VD×VB+VA instead of VA×VD+VB
- **Severity**: HIGH
- **Status**: applied (52ece4b, 2026-05-02)
- **Location**: `interpreter.rs:4065` (`r[i] = di.mul_add(bi, ai)`)
- **Symptom**: Canary (ppc_emit_altivec.cc:819) documents `(VD) <- (VA × VD) + VB`.
Xenia-rs computes `VD × VB + VA`. Both the first multiplicand and the addend are wrong.
- **Fix**: `r[i] = ai.mul_add(di, bi);`
- **Test gap**: zero tests for `vmaddcfp128`. Add test with distinct VA, VB, VD registers.
### PPCBUG-426 — vnmsubfp: two rounding steps instead of fused FMA; NaN sign may be flipped
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:1786` (`r[i] = bi - ai * ci`)
- **Symptom**: `vmaddfp` uses single-rounded `ai.mul_add(ci, bi)`, but `vnmsubfp` uses
`bi - ai * ci` (two operations, two rounding steps). ISA specifies a single fused operation.
Canary acknowledges the same limitation (ppc_emit_altivec.cc:1136). Additionally, the
implicit negation in subtraction may flip the sign bit of a NaN result (see PPCBUG-183).
- **Fix**: `r[i] = -ai.mul_add(ci, -bi);` — single FMA rounding: `-(ai*ci + (-bi))` = `bi - ai*ci`.
### PPCBUG-427 — vnmsubfp128: same two-rounding form as vnmsubfp
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:1803` (`r[i] = di - ai * bi`)
- **Symptom**: Same class as PPCBUG-426 for the VMX128 form.
- **Fix**: `r[i] = -ai.mul_add(bi, -di);`
### PPCBUG-428 — vrefp / vrefp128: full-precision 1/x instead of ~12-bit hardware estimate
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:1853` (`r[i] = 1.0 / b[i]`)
- **Symptom**: Same class as PPCBUG-184 (fresx). Xenon vrefp provides ~12-bit accuracy;
xenia-rs computes full IEEE-754 division. Canary also uses full precision in practice.
### PPCBUG-429 — vrsqrtefp / vrsqrtefp128: full-precision 1/sqrt(x) instead of ~12-bit estimate
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:1862` (`r[i] = 1.0 / b[i].sqrt()`)
- **Symptom**: Same class as PPCBUG-428 for reciprocal square root.
### PPCBUG-430 — vexptefp / vexptefp128: full-precision exp2(x) instead of ~12-bit estimate
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:3934` (`r[i] = b[i].exp2()`)
- **Symptom**: Same class as PPCBUG-428. NaN/Inf edge cases may diverge.
### PPCBUG-431 — vlogefp / vlogefp128: full-precision log2(x) instead of ~12-bit estimate
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:3944` (`r[i] = b[i].log2()`)
- **Symptom**: Same class as PPCBUG-428.
### PPCBUG-432 — vrfin / vrfin128: Rust `round()` is round-half-away-from-zero; ISA requires round-to-nearest-even
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:2172` (`r[i] = b[i].round()`)
- **Symptom**: `vrfin(0.5)` → ISA = 0.0; Rust = 1.0. `vrfin(2.5)` → ISA = 2.0; Rust = 3.0.
Canary uses SSE2 `ROUNDPS` which is round-to-nearest-even.
- **Fix**: Use `f32::round_ties_even()` (stable since Rust 1.77).
### PPCBUG-433 — vctsxs / vcfpsxws128: NaN input returns 0 instead of saturating to INT_MIN (0x80000000)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `vmx.rs:217` (`if x.is_nan() { return (0, true); }`)
- **Symptom**: AltiVec ISA: NaN in vctsxs saturates to INT_MIN (0x80000000). Xenia-rs returns 0.
- **Fix**: `if x.is_nan() { return (i32::MIN, true); }`
### PPCBUG-434 — vctuxs NaN → 0 is correct; informational
- **Severity**: LOW (wontfix)
- **Status**: wontfix
- **Location**: `vmx.rs:225`
- **Note**: Unsigned NaN saturates to 0 per ISA. Xenia-rs is correct. Add a comment.
### PPCBUG-435 — vaddfp / vsubfp / vmulfp128: subnormal inputs not flushed when VSCR.NJ=1
- **Severity**: MEDIUM (latent — Xbox 360 always boots with NJ=1)
- **Status**: open
- **Location**: `interpreter.rs:1713`, `1729`, `1812`
- **Symptom**: VSCR.NJ=1 requires flush-to-zero for subnormal inputs. vmaddfp family correctly
calls `vmx::flush_denorm()`; plain add/sub/mul do not check VSCR.
### PPCBUG-436 — vmsum3fp128 / vmsum4fp128: per-product intermediates not individually flushed
- **Severity**: MEDIUM (latent)
- **Status**: open
- **Location**: `interpreter.rs:4076`, `4083`
- **Symptom**: `flush_denorm` on final sum only. Per-lane products can be subnormal and
accumulate before the final flush.
### PPCBUG-437 — vmaddfp / vmaddfp128 / vmaddcfp128 / vnmsubfp128: subnormal output not flushed
- **Severity**: MEDIUM (latent)
- **Status**: open
- **Location**: `interpreter.rs:17521754`, `17711773`, `40644067`, `18031805`
- **Symptom**: VSCR.NJ=1 requires flushing subnormal results. Inputs flushed; outputs are not.
### PPCBUG-438 — Zero tests for vcmpeqfp / vcmpgefp / vcmpgtfp / vcmpbfp and dot forms
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` test module
### PPCBUG-439 — Zero tests for vrfiz / vrfin / vrfip / vrfim and 128-bit variants
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs:21582192`
### PPCBUG-440 — Zero tests for vctsxs / vctuxs / vcfsx / vcfux and 128-bit variants
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs:38423923`
IDs PPCBUG-441 through PPCBUG-479 are unallocated — no further bugs found in group 36.
---
## Batch 8 — VMX integer multiply-sum / multiply-half / sums / special (group 37)
Per-group report: `audit-out/group-37-vmx-mulsum.md`.
**Note**: All opcodes in this group are `XEINSTRNOTIMPLEMENTED()` stubs in xenia-canary; correctness is derived from the IBM ISA and `ppc-manual/vmx/` snapshots. `vrlimi128` is already tracked as PPCBUG-315.
### PPCBUG-482 — `vmhaddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)
- **Severity**: WITHDRAWN
- **Status**: no bug
- **Note**: Draft analysis suggested >>16; the spec snapshot `ppc-manual/vmx/vmhaddshs.md`
explicitly shows `prod = (VA[i]*VB[i]) >> 15` and the pathological-case example confirms
`0x8000*0x8000 >> 15 = 32768`. Xenia-rs matches the spec exactly. No code change.
### PPCBUG-483 — `vmhraddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)
- **Severity**: WITHDRAWN
- **Status**: no bug
- **Note**: `ppc-manual/vmx/vmhraddshs.md` explicitly shows `(product + 0x4000) >> 15`.
Xenia-rs matches. No code change needed.
### PPCBUG-487 — vsumsws/vsum2sws/vsum4sbs/vsum4ubs/vsum4shs: VB operand mis-named as "c"/"VC"
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:3249-3307`
- **Symptom**: All five vsum* handlers use a VX-form instruction (two operands: VA and VB).
The code names the VB source `c` and the comment references "vC" — implying a non-existent
third register operand. Only `instr.ra()` and `instr.rb()` are valid for VX form; there is
no `rc()`. The arithmetic is correct (rb() is called), but the naming misleads maintainers
into thinking there is a VA-form three-operand encoding.
- **Fix**: Rename `c` → `b` and update comments to say `VB` instead of `vC` in all five
handler bodies.
### PPCBUG-490 — Zero tests for all six vmsum* opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` section
- **Symptom**: No unit test for `vmsumubm`, `vmsummbm`, `vmsumuhm`, `vmsumuhs`, `vmsumshm`,
`vmsumshs`. Critical missing: saturation + VSCR.SAT for `vmsumuhs`/`vmsumshs`; mixed-sign
byte product for `vmsummbm`; modulo wrap for `vmsumshm`.
### PPCBUG-491 — Zero tests for `vmhaddshs` and `vmhraddshs`
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` section
- **Symptom**: No test for either multiply-high-add instruction. Key cases: `VA = 0x8000`,
`VB = 0x8000` (minus-one-times-minus-one saturating case); `VA = VB = 0x7FFF, VC = 0x7FFF`
(add post-shift result to max accumulator). Verify VSCR.SAT is set on saturation and clear
on non-saturating inputs.
### PPCBUG-492 — Zero tests for `vmladduhm`
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` section
### PPCBUG-493 — Zero tests for all eight `vmule*` / `vmulo*` opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` section
- **Symptom**: No test for `vmuleub`, `vmuloub`, `vmulesb`, `vmulosb`, `vmuleuh`, `vmulouh`,
`vmulesh`, `vmulosh`. Key: even vs odd lane distinction (`vmulesh` vs `vmulosh`) is untested.
### PPCBUG-494 — Zero tests for all five vsum* opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` section
- **Symptom**: No test for `vsumsws`, `vsum2sws`, `vsum4sbs`, `vsum4ubs`, `vsum4shs`.
Missing: zero-output-lanes verification for `vsumsws` (w[0..2] must be 0) and `vsum2sws`
(w[0], w[2] must be 0); VSCR.SAT on overflow for all signed/unsigned variants.
### PPCBUG-495 — `vsumsws` comment says "vC[3]" should say "VB[3]"
- **Severity**: LOW (cosmetic)
- **Status**: open
- **Location**: `interpreter.rs:3248`
IDs PPCBUG-480, PPCBUG-481, PPCBUG-482 (withdrawn), PPCBUG-483 (withdrawn), PPCBUG-484,
PPCBUG-485, PPCBUG-486, PPCBUG-488, PPCBUG-489, PPCBUG-496, PPCBUG-497, PPCBUG-498 are
either withdrawn (no bug found after re-examination), informational, or references to
existing IDs. IDs PPCBUG-499 through PPCBUG-509 are unallocated — no further bugs found
in group 37.
---
## Batch 8 — VMX load/store (group 38)
Per-group report: `audit-out/group-38-vmx-loadstore.md`.
**Opcodes**: lvebx, lvehx, lvewx, lvewx128, lvlx, lvlx128, lvlxl, lvlxl128, lvrx, lvrx128,
lvrxl, lvrxl128, lvsl, lvsl128, lvsr, lvsr128, lvx, lvx128, lvxl, lvxl128, stvebx, stvehx,
stvewx, stvewx128, stvlx, stvlx128, stvlxl, stvlxl128, stvrx, stvrx128, stvrxl, stvrxl128,
stvx, stvx128, stvxl, stvxl128.
Group 38 summary: The load family (lvx, lvxl, lvlx, lvrx, lvsl, lvsr, lvebx, lvehx, lvewx,
lvewx128 and all 128/LRU-hint variants) is arithmetically correct. EA computation, alignment
masking, big-endian byte ordering, RA=0 special cases, and lane indexing all match the ISA and
the `ea_indexed` helper. **5 HIGH bugs found** — the systemic `invalidate_for_write` gap
(PPCBUG-107 family) applies to ALL 16 VMX store opcodes, and `stvewx128` has an additional
severe memory-corruption bug (writes 16 bytes instead of 1 word). **1 MEDIUM** (behavioral
divergence between lvebx/lvehx/lvewx and canary's full-line simplification — xenia-rs is
architecturally more correct). **1 MEDIUM** (lvsr sh=0 edge-case correctness, documentation
gap). **3 LOW** test-coverage gaps.
### PPCBUG-510 — `stvewx128` stores all 16 bytes instead of one word; 12-byte memory corruption (HIGH)
- **Severity**: HIGH
- **Status**: applied (cedee3c, 2026-05-02)
- **Location**: interpreter.rs:2776-2781
- **Symptom**: Uses `& !0xF` (16-byte alignment) then stores all 16 bytes of the vector.
ISA semantics: word-align EA, extract the word lane `(EA & 0xF) >> 2`, store 4 bytes only.
The non-128 `stvewx` (interpreter.rs:1675-1687) is correct — `stvewx128` was not updated
to match. Corrupts 12 adjacent bytes on every execution.
- **Canary reference**: `InstrEmit_stvewx_` (cc:170-185) — `ea & ~3`, extract lane, `ByteSwap`,
store 4 bytes only. `stvewx128` routes through the same helper as `stvewx`.
- **Fix**: mirror the `stvewx` body with `instr.vs128()` substituted for `instr.rs()`.
### PPCBUG-511 — `stvx`, `stvx128`, `stvxl`, `stvxl128` missing `invalidate_for_write` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: interpreter.rs:1598-1603 (stvx), 1605-1610 (stvx128), 1699-1705 (stvxl/stvxl128)
- **Root cause**: PPCBUG-107 (systemic)
- **Symptom**: Under `--parallel`, a 16-byte stvx to a reserved line does not clear the
reservation table slot. The reserving thread's `stwcx.` spuriously succeeds.
- **Fix**: per PPCBUG-107 pattern — add `invalidate_for_write(ea)` guard before the byte loop.
### PPCBUG-512 — `stvebx`, `stvehx`, `stvewx`, `stvewx128` missing `invalidate_for_write` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: interpreter.rs:1655 (stvebx), 1664 (stvehx), 1675 (stvewx), 2776 (stvewx128)
- **Root cause**: PPCBUG-107 (systemic)
- **Note**: `stvewx128` must also fix PPCBUG-510 before adding the invalidation call (or the
invalidation covers the wrong, over-wide address range).
### PPCBUG-513 — `stvlx`, `stvlx128`, `stvlxl`, `stvlxl128` missing `invalidate_for_write` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: interpreter.rs:2746-2749 (stvlx/stvlxl), 2751-2754 (stvlx128/stvlxl128)
- **Root cause**: PPCBUG-107 (systemic)
- **Note**: partial stores can span a 128-byte line boundary when `ea & 0xF != 0` and
`n = 16 - shift` crosses the line; two `invalidate_for_write` calls may be needed.
### PPCBUG-514 — `stvrx`, `stvrx128`, `stvrxl`, `stvrxl128` missing `invalidate_for_write` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: interpreter.rs:2756-2759 (stvrx/stvrxl), 2761-2764 (stvrx128/stvrxl128)
- **Root cause**: PPCBUG-107 (systemic)
- **Note**: stvrx at shift=0 is a no-op (no bytes written); guard can skip the call in
that case. Otherwise invalidate `ea & !0xF` (the preceding aligned block).
### PPCBUG-515 — `lvebx`, `lvehx`, `lvewx` implement element semantics; canary uses full-line load (MEDIUM)
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:1613-1653
- **Symptom**: xenia-rs places the loaded byte/halfword/word into the correct lane and preserves
other lanes from VD (ISA-correct for the "undefined" lanes). Canary does a full aligned
16-byte `lvx`-style load that overwrites all lanes. Both are valid under the ISA's "undefined"
specification, but game code compiled against canary may observe the canary behavior. The
divergence is documented and no code change is required unless canary compatibility becomes
an explicit goal.
### PPCBUG-516 — `lvsr` sh=0 produces {16,17,...,31}; correct per ISA but undocumented (MEDIUM)
- **Severity**: MEDIUM (documentation gap — computation is correct)
- **Status**: open
- **Location**: interpreter.rs:2218-2226
- **Symptom**: When EA is 16-byte aligned, `lvsr` produces byte values all >= 16 (the "select
entirely from VB" identity for `vperm`). The formula `(16 - sh) + i` cannot overflow u8
because `sh <= 15` guarantees `(16 - sh) + 15 <= 31`. No computation bug — but there is no
comment explaining why values > 15 are correct. Add a comment and a `debug_assert!(sh <= 15)`.
### PPCBUG-517 — Zero test coverage for lvlx/lvrx/stvlx/stvrx boundary edge cases (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: vmx.rs tests (lines 756-792); interpreter.rs test module
- **Missing**: shift=15 for lvlx (1 byte loaded), shift=1 for lvrx (15 bytes), stvlx/stvrx
round-trip, stvrx at shift=0 confirmed no-op, full lvlx+lvrx+vor unaligned memcpy idiom
verified byte-exact.
### PPCBUG-518 — Zero interpreter-level execution tests for all 36 VMX load/store opcodes (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs test module
- **Missing**: lvx alignment masking, stvx byte-order verification, lvebx lane placement,
lvsl/lvsr permute index values, lvewx128 after PPCBUG-510 fix. 17 recommended minimum tests
enumerated in per-group report.
### PPCBUG-519 — `stvrx` aligned no-op is silent; no debug trace (LOW)
- **Severity**: LOW
- **Status**: open
- **Location**: vmx.rs:284-292 (`store_vector_right`)
- **Symptom**: shift=0 returns immediately with no trace event. Confusing during memory-
visibility debugging. Add `tracing::trace!` in debug builds.
IDs PPCBUG-520 through PPCBUG-559 are unallocated — no further bugs found in group 38.
---
## Phase C1 — Decoder field extractors
Per-group report: `audit-out/phase-c1-decoder-fields.md`.
Comprehensive audit of all `DecodedInstr` field accessors in `decoder.rs` lines 21-165, cross-checked against ISA form specs, Canary `FormatXxx` structs, and the interpreter's inline re-extraction. Phase B already found PPCBUG-040/046/275/315/360-363/420-422. Phase C1 adds 8 new findings (PPCBUG-560..567).
**Confirmed-clean** (no new finding): `op`, `rd`/`rs`/`rt`, `ra`, `rb`, `rc`, `simm16`, `uimm16`, `d`, `ds`, `li`, `bd`, `bo`, `bi`, `aa`, `lk`, `oe`, `to`, `mb`/`me` (M-form only), `sh`, `spr`, `crm`, `crfd`/`crfs`, `l`, `crbd`/`crba`/`crbb`, `nb`, `va128`/`vb128`/`vd128`/`vs128`, `extract_vx128_uimm5`.
### PPCBUG-560 — sh64() test helper wrong bit order; masks PPCBUG-040 from unit tests (HIGH)
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `xenia-rs/crates/xenia-cpu/tests/disasm_goldens.rs:160-176` (function `rldicl`)
- **Symptom**: The `rldicl` test helper encodes `sh[5:1]` at PPC bits 16-20 and `sh[0]` at PPC bit 30. The ISA encodes `sh[4:0]` at PPC bits 16-20 and `sh[5]` at PPC bit 30. The wrong `sh64()` formula `(sh_lo << 1) | sh_hi` correctly inverts the wrong encoding, making the test pass — but fails on real binary code.
**Counterexamples** (ISA-encoded input → `sh64()` output):
| True shift | sh64() result | Error |
|-----------|--------------|-------|
| 1 | 2 | +1 |
| 16 | 32 | +16 |
| 32 | 1 | -31 |
| 33 | 3 | -30 |
| 63 | 63 | 0 (coincidence) |
Only `sh=0` and `sh=63` decode correctly. All other shifts (1-62) are wrong against real code.
- **Fix for `sh64()`** (per PPCBUG-040):
```rust
pub fn sh64(&self) -> u32 {
(extract_bits(self.raw, 30, 30) << 5) | extract_bits(self.raw, 16, 20)
}
```
- **Fix for test helper** (must be in same commit):
```rust
// Correct: sh_lo = sh & 0x1F → PPC bits 16-20; sh_hi = sh >> 5 → PPC bit 30
(30 << 26) | (rs << 21) | (ra << 16) | ((sh & 0x1F) << 11)
| (mb_lo << 6) | (mb_hi << 5) | (0 << 2) | ((sh >> 5) << 1) | rc
```
- **Cross-reference**: PPCBUG-040 (primary finding). PPCBUG-560 is the test-infrastructure companion.
### PPCBUG-561 — Missing `mb_md()` accessor on `DecodedInstr`; interpreter inlines wrong formula at 6 sites (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs` — accessor absent; `disasm.rs:1256` has correct local helper; `interpreter.rs` lines 696, 706, 716, 726, 736, 746 each inline the wrong formula
- **Symptom**: Interpreter uses `(instr.mb() << 1) | ((instr.raw >> 1) & 1)` which: (a) reads `SH5` (PPC bit 30, host bit 1) instead of `MB5` (PPC bit 26, host bit 5) as the high bit; (b) places the high bit at position 0 instead of position 5. `disasm.rs` has the correct version already — expose it as `DecodedInstr::mb_md()`.
- **Cross-reference**: PPCBUG-046 (primary finding).
- **Fix**:
```rust
// Add to decoder.rs:
#[inline] pub fn mb_md(&self) -> u32 {
extract_bits(self.raw, 21, 25) | (extract_bits(self.raw, 26, 26) << 5)
}
```
Replace all 6 inline sites in `interpreter.rs` with `instr.mb_md()`.
### PPCBUG-562 — Missing `vc_rc_bit()` and `vx128r_rc_bit()` per-form Rc accessors (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs` — no per-form Rc accessors; `interpreter.rs` uses generic `rc_bit()` (bit 31) for both VC and VX128_R forms
- **Symptom**: Generic `rc_bit()` reads PPC bit 31 (LSB). VC-form Rc is at PPC bit 21 = `(raw >> 10) & 1`. VX128_R-form Rc is at PPC bit 27 = `(raw >> 4) & 1`. Using bit 31 for these forms means the CR6 update gate is permanently disabled for all dot-form VMX vector compares — root cause of PPCBUG-275/420/421/422.
- **Fix**:
```rust
/// Rc for VC-form vector compare (vcmpeqfp, vcmpgefp, vcmpgtfp, vcmpbfp, etc.) — PPC bit 21.
#[inline] pub fn vc_rc_bit(&self) -> bool { extract_bits(self.raw, 21, 21) != 0 }
/// Rc for VX128_R-form compare (vcmpeqfp128, vcmpgefp128, etc.) — PPC bit 27.
#[inline] pub fn vx128r_rc_bit(&self) -> bool { extract_bits(self.raw, 27, 27) != 0 }
```
- **Cross-reference**: PPCBUG-275 / PPCBUG-420 / PPCBUG-421 / PPCBUG-422.
### PPCBUG-563 — Missing `vx128_4_z()` and `vx128_4_imm()` for VX128_4 form (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs` — accessors absent; `interpreter.rs:3551-3552` (vrlimi128) reads wrong bit positions
- **Symptom**: VX128_4 form has `IMM` (5-bit) at PPC bits 11-15 (host bits 16-20) and `z` (2-bit) at PPC bits 24-25 (host bits 6-7). Interpreter `vrlimi128` uses `(raw >> 16) & 0x3` for shift (reads VB128l partial) and `(raw >> 2) & 0xF` for mask (reads VD128h region).
- **Fix**:
```rust
#[inline] pub fn vx128_4_imm(&self) -> u32 { extract_bits(self.raw, 11, 15) }
#[inline] pub fn vx128_4_z(&self) -> u32 { extract_bits(self.raw, 24, 25) }
```
- **Cross-reference**: PPCBUG-315.
### PPCBUG-564 — Missing `vx128_p_perm()` for VX128_P form; PERMh reads XO bits (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:4089` (vpermwi128) uses `(raw >> 16) & 0xFF` which reads PERMl (correct) but uses XO/reserved bits 21-23 for PERMh instead of PPC bits 23-25
- **Symptom**: Top 3 bits of the 8-bit PERM selector are wrong for every `vpermwi128` instruction. Lane selections for words 0 and 1 are garbage.
- **Fix**:
```rust
#[inline] pub fn vx128_p_perm(&self) -> u32 {
extract_bits(self.raw, 11, 15) | (extract_bits(self.raw, 23, 25) << 5)
}
```
- **Cross-reference**: PPCBUG-362.
### PPCBUG-565 — Missing `vx128_5_sh()` for VX128_5 form; vsldoi128 MSB reads reserved bit (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:2012` (vsldoi128) uses `(raw >> 4) & 0x1` for the shift MSB (reads PPC bit 27 = reserved) instead of PPC bit 22 = host bit 9 = `(raw >> 9) & 1`
- **Symptom**: vsldoi128 shift amounts ≥ 8 (where the 4th bit matters) use a garbage bit. The correct 4-bit SH is at PPC bits 22-25 (host bits 6-9) = `(raw >> 6) & 0xF`.
- **Fix**:
```rust
#[inline] pub fn vx128_5_sh(&self) -> u32 { extract_bits(self.raw, 22, 25) }
```
- **Cross-reference**: PPCBUG-361.
### PPCBUG-566 — Missing XER TBC field accessor documentation for lswx/stswx (LOW)
- **Severity**: LOW
- **Status**: open
- **Location**: `decoder.rs` — XER[25:31] (7-bit transfer byte count) is runtime state, not an instruction field; no accessor exists and no documentation notes the gap
- **Symptom**: `lswx`/`stswx` use XER[25:31] as their byte count. The interpreter has no way to read this via the normal accessor pattern. Not a bit-position error, but a structural gap.
- **Recommendation**: add `ctx.xer_tbc() -> u8` to `PpcContext` returning `(ctx.xer() >> 25) & 0x7F`. Document that these are the only instructions that read XER as a count operand.
### PPCBUG-567 — Zero unit tests pin any scalar field accessor (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `decoder.rs` unit tests; `tests/disasm_goldens.rs`
- **Symptom**: Phase 4 tests pin `va128`/`vb128`/`vd128`/`vs128` only. No test verifies: `sh64()` against ISA-encoded instructions (existing test validates wrong round-trip — PPCBUG-560), `mb_md()` (absent), `vc_rc_bit()`/`vx128r_rc_bit()` (absent), `ds()` for negative displacement, `spr()` for LR/CTR/XER beyond DEC.
- **Recommended additions**: decoder-level unit tests using ISA-correct encodings for `sh64`, `mb_md`, the two new Rc accessors, `ds` negative, `spr` for LR=8 and CTR=9. See phase-c1-decoder-fields.md for concrete encoding examples.
IDs PPCBUG-568 through PPCBUG-599 are unallocated — no further bugs found in Phase C1 scope.
---
## Phase C2 — Decoder opcode-lookup tables
Per-group report: `audit-out/phase-c2-decoder-lookup.md`.
**Methodology**: complete line-by-line comparison of all `decode_opNN` functions in
`xenia-rs/crates/xenia-cpu/src/decoder.rs` against
`xenia-canary/src/xenia/cpu/ppc/ppc_opcode_lookup_gen.cc`, plus cross-reference of
`ppc-manual/forms/` for VC, VX128_R, VX128_5, VA, VX128_3, VX128_4 forms.
**Overall verdict**: the decoder is structurally sound and entry-by-entry matches
Canary for all real Xbox 360 instructions, with one pre-known exception (PPCBUG-600 =
PPCBUG-423). Zero new wrong-entry bugs. One new medium-severity cross-reference bug
(dot-form gap), one medium maintainability risk (key-ordering dependency), three LOWs
(test gaps, reserved-encoding misidentification, undocumented fast-path).
### PPCBUG-600 — `decode_op6` key4: VMX128 compare dot-forms decode as Invalid (MEDIUM)
- **Severity**: MEDIUM (cross-reference for PPCBUG-423; same root cause, Phase C2 ID)
- **Status**: applied (52b05b1, 2026-05-01) (dup-of:423 for the fix; this ID is for Phase C2 tracking)
- **Location**: `decoder.rs:640-648` (`decode_op6`, key4 match table)
- **Symptom**: The VX128_R form places `Rc` at PPC bit 27. The key4 formula is
`(bits 22-24 << 3) | bit27`. When Rc=1 (dot-form), bit27=1 and key4 is odd.
Only even key4 values are in the table. Five dot-form encodings fall through to
`PpcOpcode::Invalid`:
- `vcmpeqfp128.` → key4=0b000001 (1), decodes as Invalid
- `vcmpgefp128.` → key4=0b001001 (9), decodes as Invalid
- `vcmpgtfp128.` → key4=0b010001 (17), decodes as Invalid
- `vcmpbfp128.` → key4=0b011001 (25), decodes as Invalid
- `vcmpequw128.` → key4=0b100001 (33), decodes as Invalid
- **Contrast**: standard VMX VC-form compares (op=4 key3) are correct because their
Rc bit (bit21) is outside the key3 window (bits 22-31). VMX128_R uses a different
form where Rc is at bit27, which is inside the key4 window.
- **Fix**: Add 5 dot-form entries to key4 in `decode_op6`:
```rust
0b000001 => return PpcOpcode::vcmpeqfp128,
0b001001 => return PpcOpcode::vcmpgefp128,
0b010001 => return PpcOpcode::vcmpgtfp128,
0b011001 => return PpcOpcode::vcmpbfp128,
0b100001 => return PpcOpcode::vcmpequw128,
```
The interpreter's existing `instr.rc_bit()` check already handles CR6 update for
dot-forms — decoder just needs to emit the right opcode.
- **See also**: PPCBUG-423 (Phase B original finding) for impact assessment and
full context.
### PPCBUG-601 — `decode_op6` key ordering creates undocumented correctness dependency (MEDIUM)
- **Severity**: MEDIUM (maintainability risk; no current wrong-decode for real code)
- **Status**: open
- **Location**: `decoder.rs:603-637` (`decode_op6`, key1/key2/key3 dispatch)
- **Symptom**: key1 (bits 21-22 << 5 | bits 26-27), key2 (bits 21-23 << 4 | bits 26-27),
and key3 (bits 21-27) all overlap. Correctness depends on an implicit invariant:
vpkd3d128 and vrlimi128 (matched by key2) always have bits 26-27 = `01`, while all
15 key3 unary entries always have bits 26-27 = `11`. If a future instruction were
added to key2 with bits 26-27 = `11`, it would shadow a key3 entry. No comment in
the source documents this constraint.
- **Fix**: Add a comment block above the key2/key3 dispatches explaining the invariant:
```
// key2 matches bits 26-27 == 01 only (vpkd3d128, vrlimi128).
// key3 entries all have bits 26-27 == 11. No overlap is possible
// for any currently-defined Xbox 360 instruction.
```
### PPCBUG-602 — `decode_op4` vsldoi128 fallback: over-broad single-bit catch-all (LOW)
- **Severity**: LOW (only fires for reserved/undefined encodings in practice)
- **Status**: open
- **Location**: `decoder.rs:558-561`
- **Symptom**: The VX128_5 form for vsldoi128 is identified by op=4, bit27=1. The
dispatch uses a bare `if extract_bits(code, 27, 27) == 1` after the other tables,
rather than an exact VX128_5-form check. Reserved VA extended opcodes that happen
to have their key4 bit4 (= word bit27) set decode as vsldoi128 instead of Invalid.
Example: VA XO=0b100011 (35, reserved gap between vmladduhm=34 and vmsumubm=36)
— key4 misses, bit27=1 fires → decoded as vsldoi128. ISA specifies reserved
encodings should trap; this silently assigns a meaning.
- **Fix (optional)**: Strengthen to an exact match:
```rust
// VX128_5 form: SH@22-25, VA128h@26, XO=bit27. Bits 28-31 carry VD128h/VB128h.
// Only vsldoi128 uses this form. Verify the XO bit and absence of load/store marker.
if extract_bits(code, 27, 27) == 1 && extract_bits(code, 30, 31) != 0b11 {
return PpcOpcode::vsldoi128;
}
```
Alternatively, accept current behavior and add a comment.
### PPCBUG-603 — Primary opcode 9 maps to Invalid; correct but undocumented (LOW)
- **Severity**: LOW (test gap / documentation only)
- **Status**: open
- **Location**: `decoder.rs:369` (the `_ => PpcOpcode::Invalid` arm of `lookup_opcode`)
- **Symptom**: Primary opcode 9 (`dozi` in original POWER ISA) is undefined on
Xenon/750CL and correctly decodes as Invalid. Canary also returns `PPC_DECODER_MISS`.
No comment documents this intentional absence.
- **Fix**: Add `// 9 = dozi (POWER-only, not present on Xenon)` comment near the
match, or explicitly add `9 => PpcOpcode::Invalid` with a comment.
### PPCBUG-604 — Zero decoder unit tests for decode_op5, decode_op6, decode_op30, decode_op63 (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `decoder.rs:897-1107` (test module)
- **Symptom**: The 10 existing decoder tests cover addi, lwz, branch, stw, ori, and
cache mechanics. None exercise VMX128 (op=5, op=6), rotate-doubleword (op=30), or
FPU (op=63) opcode paths. In particular, no test would have caught PPCBUG-600
(vcmpeqfp128 dot-form decodes as Invalid) before it caused a runtime trap.
- **Recommended minimum additions** (8 tests):
1. `vcmpeqfp128` (Rc=0) → decodes as `vcmpeqfp128`.
2. `vcmpeqfp128.` (Rc=1) → decodes as `vcmpeqfp128` (tests PPCBUG-600 fix).
3. `vcmpeqfp` (op=4, Rc=0) → key3 check, bit21=0.
4. `vcmpeqfp.` (op=4, Rc=1) → key3 check, bit21=1, same decode.
5. `vsldoi128` (op=4, bit27=1) → fallback fires.
6. `rldicl` (op=30) → decode_op30.
7. `fadd` (op=63, Rc=0) → arithmetic table.
8. `fadd.` (op=63, Rc=1) → same decode as fadd.
### PPCBUG-605 — `decode_op31` sradix fast-path is correct but undocumented (LOW)
- **Severity**: LOW (documentation gap only)
- **Status**: open
- **Location**: `decoder.rs:702-705`
- **Symptom**: The sradix pre-check uses bits 21-29 (9 bits). The subsequent main
table uses bits 21-30 (10 bits). Because no main-table entry has bits 21-29 =
0b110011101, the fast-path cannot shadow a legitimate main-table entry. However,
this is not documented in the source, and a reader might worry that sradix (Rc=0,
bits 21-30 = 0b1100111010) or sradix. (Rc=1, same bits 21-30) could conflict with
a future entry at key 0b1100111010.
- **Fix**: Add a comment: `// sradix: XS-form, XO=413 (bits 21-29=0b110011101).`
`// No main-table entry uses bits 21-30 starting with 0b110011101x.`
IDs PPCBUG-606 through PPCBUG-639 are unallocated — no further bugs found in Phase C2.
---
## Phase C3 — Disassembler formatter parity
Per-group report: `audit-out/phase-c3-disasm.md`.
**Methodology**: Full line-by-line audit of `disasm.rs:format()` and all ~70 per-class helpers.
Cross-referenced against `xenia-canary/src/xenia/cpu/ppc/ppc_opcode_disasm_gen.cc`,
`tests/golden/extended_mnemonics.json`, and `tests/golden/base_mnemonics.json`.
Checked: mnemonic correctness (Rc/OE/LK/AA/L-field), operand formatting (signed vs unsigned,
hex vs decimal), simplified-mnemonic priority, branch-condition extended forms, VMX register
naming, VX128 field extraction, and golden test coverage.
**Overall verdict**: The formatter is structurally sound. All OE/Rc/LK/AA suffix handling, the
simplified mnemonic priority order, VMX 5-bit and VMX128 7-bit register naming, SPR mnemonics,
and CR-logical extended forms are correct. Two HIGH bugs found: the `bdnz`/`bdz` extended
mnemonic appends a spurious condition suffix, and the pre-existing `sync`/`lwsync` bug
(PPCBUG-088) is re-assessed as HIGH in disassembler scope. Two MEDIUM bugs: decimal vs hex
for SIMM immediates and D-form displacements (diverges from every real PPC disassembler).
Several LOW findings for golden fixture correctness and edge cases.
**Key finding**: the disassembler's VX128 field extraction (vperm128 VC, vsldoi128 SH,
vpermwi128 PERM) is CORRECT in all three cases where the interpreter (PPCBUG-360/361/362)
has the wrong extraction. The disassembler was written independently and got them right.
### PPCBUG-640 — `fmt_bc`: pure `bdnz`/`bdz` emits `bdnzge`/`bdzge` (spurious condition suffix) (HIGH)
- **Severity**: HIGH
- **Status**: applied (d4f6ea7, 2026-05-02)
- **Location**: `disasm.rs:829-834`
- **Symptom**: For `bcx` with BO=16 (`bdnz`: decrement CTR, branch if CTR≠0, CR ignored):
- `decr = (16 & 4) == 0` = true
- `uncond = (16 & 16) != 0` = true
- Code falls into the `if decr` branch and computes `cond_name_opt` from `(cr_bit=0, cond_true=false)` → `Some("ge")`
- Emits: **`bdnzge`** — WRONG. ISA simplified form is `bdnz`.
For BO=18 (`bdz`): same path → **`bdzge`** — WRONG.
The bug is absent in `fmt_bclr` which has an explicit `if decr && uncond` guard at line 872
producing `bdnzlr`/`bdzlr` correctly. `fmt_bc` lacks this guard.
The golden fixture "bdnz 0x82000040" (PPCBUG-650 companion) pins the wrong output.
- **Fix**: In `fmt_bc`, inside the `if decr` block, gate the condition string on `!uncond`:
```rust
if decr {
let z = if bo & 0x02 != 0 { "z" } else { "nz" };
let cond_str = if uncond { "" } else { cond_name_opt.unwrap_or("") };
let ext_mnem = format!("bd{z}{cond_str}{a}{l}");
let ext_ops = format!("{cr}0x{target:08X}");
with_ext(&base_mnem, base_ops, 8, &ext_mnem, ext_ops, 8)
}
```
Also update golden fixtures PPCBUG-650.
- **Impact**: All analysis-DB queries for `bdnz` loops (common in pixel-shader and vertex
processing loops) return zero rows; they are stored as `bdnzge`. Developers inspecting
loop structures see a misleading condition name on a CTR-only branch.
### PPCBUG-641 — `sync` emits `"sync"` for `lwsync` (L=1) — re-assessment of PPCBUG-088 (HIGH)
- **Severity**: HIGH (disassembler scope; PPCBUG-088 was LOW for interpreter scope)
- **Status**: open (see PPCBUG-088 for fix)
- **Location**: `disasm.rs:364`
- **Symptom**: `PpcOpcode::sync` always emits `"sync"`. The L-field at PPC bit 10 selects
`lwsync` (L=1, encoding `0x7C2004AC`). `lwsync` is the acquire barrier in every Xbox 360
spinlock. Every `lwsync` in the disassembly DB is stored as `mnemonic='sync'`.
`SELECT * WHERE mnemonic='lwsync'` returns zero rows regardless of binary content.
- **Note**: the golden fixture for lwsync (PPCBUG-649) currently pins the wrong output.
### PPCBUG-642 — `fmt_bcctr` missing extended form for CTR-decrement/ignore-CR BO values (MEDIUM)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `disasm.rs:880-902`
- **Symptom**: `bcctrx` with BO=16 (decrement CTR, ignore CR) falls through to `base()` with
no extended form. `fmt_bclr` (the equivalent for bclrx) correctly handles the same case with
an explicit `decr && uncond` check at line 872, producing `bdnzlr`.
Note: `bcctr` with CTR-decrement is undefined by PowerISA; this encoding should never appear
in valid compiled code. The inconsistency is a maintenance concern rather than a runtime bug.
- **Fix**: Add a `decr && uncond` check before the `cond_branch_ext` call in `fmt_bcctr`,
mirroring lines 872-876 in `fmt_bclr`. Or add a comment explaining the ISA undefined status.
### PPCBUG-643 — SIMM immediate display: decimal diverges from Canary and real disassemblers (MEDIUM)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `disasm.rs:946` (addi), `976` (addic), `989` (subfic), `990` (mulli),
`1003` (cmpi), `1048-1061` (fmt_ld/fmt_st), and all similar SIMM sites
- **Symptom**: SIMM immediates are formatted via Rust's `{imm}` (decimal). Canary uses
`"-0x{:X}"` / `"0x{:X}"` (signed hex) for every SIMM field. GNU objdump, IDA Pro,
and all standard PPC disassemblers use hex. The inconsistency is internal to xenia-rs:
`addis`/`oris`/`xoris` use hex (`0x{imm_u:X}`), but `addi`/`addic`/`mulli` use decimal.
This misleads analysis-DB queries that mix instructions (e.g. `addi r3, r1, -4` vs
`addis r3, r0, 0x8000`).
- **Impact**: Medium — the output is not *wrong* (the value is correctly computed), but
cross-referencing with Canary output or objdump requires manual conversion.
### PPCBUG-644 — D-form load/store displacement uses decimal instead of hex (MEDIUM)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `disasm.rs:1053` (`fmt_ld`), `1061` (`fmt_st`), `1069` (`fmt_ds`)
- **Symptom**: `format!("{rn}, {d}({})", gpr(ra))` outputs decimal for the displacement.
Canary outputs `"-0x8(r1)"` not `"-8(r1)"`. Every standard PPC disassembler uses hex.
Affects 25+ D-form and DS-form opcodes. Negative displacements (-8, -16, etc.) are
especially confusing in decimal when reading stack frame accesses.
- **Fix**:
```rust
let d_str = if d < 0 { format!("-0x{:X}", -d) } else { format!("0x{:X}", d) };
base(mnem, format!("{rn}, {d_str}({})", gpr(ra)), 8)
```
Update all golden fixture rows with displacement values.
### PPCBUG-645 — `cntlzdx` Rc suffix: moot for valid encodings, but WONTFIX (LOW)
- **Severity**: LOW
- **Status**: wontfix
- **Location**: `disasm.rs:286`
- **Note**: `fmt_x_unary_rc` would emit `cntlzd.` for Rc=1, but valid `cntlzd` encodings
always have Rc=0. Canary emits `cntlzd` always. No impact for valid code.
### PPCBUG-646 — `fmt_rlwimi` inslwi/insrwi priority overlap: confirmed correct (LOW)
- **Severity**: LOW
- **Status**: wontfix
- **Note**: After careful analysis, the `inslwi` guard excludes `insrwi` overlap cases
(`sh != 31u32.wrapping_sub(me)`). Priority is correct. Informational only.
### PPCBUG-647 — `fmt_rlwinm` `extrwi` uses `wrapping_sub` which can give misleading results for invalid encodings (LOW)
- **Severity**: LOW
- **Status**: open
- **Location**: `disasm.rs:1137`
- **Symptom**: `let b = sh.wrapping_sub(n) % 32;` — for invalid `sh < n` encodings,
`wrapping_sub` gives a large u32, `% 32` gives a confusing value. For all compiler-emitted
encodings `sh >= n` holds. Add `&& sh >= 32 - mb` to the guard to avoid the fallthrough.
### PPCBUG-648 — `fmt_mftb` TBR=268: ext mnemonic identical to base mnemonic (LOW)
- **Severity**: LOW
- **Status**: open
- **Location**: `disasm.rs:1443`
- **Symptom**: `268 => with_ext("mftb", base_ops, 8, "mftb", gpr(rd), 8)` — base is `mftb`,
extended is also `mftb`. `display()` picks the extended form (omitting the `268` operand),
making it ambiguous vs. `mftbu`. Consider: either emit base-only (`mftb r3, 268`) or rename
the base to `mftb.raw` for disambiguation.
### PPCBUG-649 — Golden fixture for `lwsync` pins wrong output (no ext_mnemonic) (LOW)
- **Severity**: LOW (test coverage gap)
- **Status**: applied (2be25bd, 2026-05-02)
- **Location**: `tests/golden/extended_mnemonics.json`, entry "lwsync"
- **Symptom**: Fixture has `mnemonic: "sync"` and no `ext_mnemonic`. After PPCBUG-088/641
fix, expected output is `mnemonic: "sync"`, `ext_mnemonic: "lwsync"`. Current fixture
defeats regression detection — the test passes with wrong output.
### PPCBUG-650 — Golden fixtures for `bdnz`/`bdz` pin wrong extended mnemonic (LOW)
- **Severity**: LOW (companion to PPCBUG-640)
- **Status**: applied (d4f6ea7, 2026-05-02)
- **Location**: `tests/golden/extended_mnemonics.json`, rows "bdnz 0x82000040" and "bdz 0x82000040"
- **Symptom**: Both rows have `ext_mnemonic: "bdnzge"` and `ext_mnemonic: "bdzge"`.
After PPCBUG-640 fix, correct values are `"bdnz"` and `"bdz"`.
### PPCBUG-651 — `fmt_vmx128_pack_d3d` shared by `vpkd3d128` and `vrlimi128`: confirmed correct (LOW)
- **Severity**: LOW
- **Status**: wontfix
- **Note**: Both opcodes use VX128_4 form. Shared formatter outputs identical operand lists
(`vd, vb, imm, z`) which is correct for both. Informational only.
### PPCBUG-652 — Zero golden fixtures for any VMX128 opcode disassembly (LOW)
- **Severity**: LOW (test coverage gap)
- **Status**: open
- **Location**: `tests/golden/` — all three JSON files
- **Symptom**: No fixture pins the formatted output of any VMX128 instruction. Regressions
in VMX128 field extraction (e.g. a re-introduction of PPCBUG-360/361/362 in the disassembler)
would be invisible. Recommend adding at minimum: `vaddfp128`, `vperm128`, `vsldoi128`,
`vpkd3d128`, `vcmpeqfp128.`, `vmaddfp128`.
### PPCBUG-653 — `fmt_trap_imm` unconditional trap extended form: confirmed not-a-bug (LOW)
- **Severity**: LOW
- **Status**: wontfix
- **Note**: `twi 31, rA, IMM` (to=31) has no ISA simplified mnemonic unless RA=0 and IMM=0
(which matches `tw 31, r0, r0 = trap`). The `fmt_trap_imm` correctly emits base-only for
`twi 31, rA, N`. Informational.
### PPCBUG-654 — `fmt_rldimi` `insrdi` guard excludes valid `mb=0` (b=0) case (LOW)
- **Severity**: LOW
- **Status**: open
- **Location**: `disasm.rs:1220`
- **Symptom**: Guard `if mb > 0` excludes `insrdi rA, rS, n, 0` (b=0 → mb=0). A valid
compiler-emitted `rldimi` with sh+mb+n=64 and mb=0 falls through to base form instead of
displaying the `insrdi` simplified mnemonic.
- **Fix**: Remove the `mb > 0` guard; the inner `n > 0` guard is sufficient to avoid
degenerate cases.
IDs PPCBUG-655 through PPCBUG-679 are unallocated — no further bugs found in Phase C3.
---
## Phase C4 — Post-merge audit corrections (2026-05-02)
### PPCBUG-700 — VMX128 register accessors disagreed with canary's bitfield layout (HIGH)
- **Severity**: HIGH (silent mis-decoding of any VMX128 instruction with a register >= 32)
- **Status**: applied
- **Locations**: `decoder.rs:138-160` (`va128`/`vb128`/`vd128`), `decoder.rs:80` (`vx128r_rc_bit`)
- **Discovery**: independent reviewer of the P3 phase merge, comparing our rust accessors
against canary's `FormatVX128`/`VX128_2`/`VX128_4`/`VX128_5`/`VX128_R` bitfield struct
in `xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663`.
- **Symptom**: this entry contradicts the audit's own line 2958 ("confirmed-clean")
assessment. The previous audit miscounted bit-field offsets — under x86_64 LSB-first
C++ bitfield packing, the canary fields land at:
- `VA128 = VA128l(5) | VA128h(1)<<5 | VA128H(1)<<6` = PPC[11-15] | PPC[26]<<5 | PPC[21]<<6 (3 fields, 7 bits)
- `VB128 = VB128l(5) | VB128h(2)<<5` = PPC[16-20] | PPC[30-31]<<5 (2 fields, 7 bits)
- `VD128 = VD128l(5) | VD128h(2)<<5` = PPC[6-10] | PPC[28-29]<<5 (2 fields, 7 bits)
- `Rc` (VX128_R only) = PPC[25] (host bit 6) — not PPC[27] as PPCBUG-422/562 prescribed.
Rust code instead used va128: PPC[11-15] | PPC[29]<<5 (one bit, wrong position); vb128:
PPC[16-20] | PPC[28]<<5 | PPC[30]<<6 (wrong positions); vd128: PPC[6-10] | PPC[21]<<5 |
PPC[22]<<6 (wrong positions); vx128r_rc_bit at PPC[27].
- **Why it lurked**: the buggy convention was internally consistent with hand-crafted
test fixtures (which set bit 29 / 21 / 22 to encode "high" registers, matching the
buggy accessor). Real Xbox 360 game code follows canary's convention, so any production
encoding with VR >= 32 was silently mis-decoded — but no unit test exercised that path.
- **Fix**: rewrite the four accessors to canary's bit positions; rewrite the
`vmx128_test_word` helper and unit tests; re-encode the goldens for vmaddfp128/
vmaddcfp128/vnmsubfp128/vperm128/vsrw128/vpermwi128/vrlimi128. Drop the speculative
`key4_dt` dot-form dispatch in `decode_op6` (canary has no separate dot-form opcodes
for VX128_R compute ops; Rc is a runtime modifier). Update `encode_vpkd3d128` test
helper for canary's VD128h placement.
- **Cross-reference**: invalidates the audit's confirmed-clean note at line 2958.
Subsumes the partial fix-shape proposed in PPCBUG-422 (Rc-bit position).