21 IDs (040, 046, 275, 276, 315, 360, 361, 362, 363, 369, 420, 421, 422,
423, 560, 561, 562, 563, 564, 565, 600) marked applied (52b05b1, 2026-05-01)
in audit-findings.md. P2 progress section appended to audit-report-2026-04-29.md.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3417 lines
188 KiB
Markdown
3417 lines
188 KiB
Markdown
# PPC Instruction Audit — Findings Tracker
|
||
|
||
**Started**: 2026-04-29 (single session, audit-only)
|
||
**Trigger**: `addis` 32-bit-ABI sign-extension fix surfaced a likely systemic class of bugs.
|
||
**Status**: in flight. Per-group reports live in `audit-out/`. This file is the consolidated, stable-ID index.
|
||
**Workflow**: audit only this session; fix session(s) reference these IDs.
|
||
|
||
## Conventions
|
||
|
||
- Every finding has an ID `PPCBUG-NNN` for cross-referencing.
|
||
- **Status**: `open` (audit found it, not yet fixed) | `applied` (fix landed) | `wontfix` (intentional) | `dup-of:NNN` (collapsed into another finding).
|
||
- **Severity**:
|
||
- **HIGH** = wrong arithmetic / control flow on plausible Xbox 360 user code.
|
||
- **MEDIUM** = wrong status flag / latent under broken upstream invariants / edge case.
|
||
- **LOW** = test gap / cosmetic / dead-code-only.
|
||
- All file:line refs are `xenia-rs/crates/xenia-cpu/src/interpreter.rs` unless otherwise noted.
|
||
- Suggested fixes are written as one-line patches where possible; see the per-group report for full context.
|
||
|
||
## Cross-cutting recommendation
|
||
|
||
The single recurring root cause is **violating the 32-bit ABI invariant that all GPR writes truncate to 32 bits**. The cleanest fix is to systematically apply `as u32 as u64` at every GPR writeback in every integer ALU op. The existing CA/CR0/OE helpers will then be correct without further changes (because their inputs become guaranteed-clean). The audit reports list each fix individually; the fix session may choose to apply them as one sweep or one-at-a-time.
|
||
|
||
A defensive secondary recommendation: even after the writeback truncation, instructions whose CA computation does its own internal arithmetic on 64-bit operands (`subfcx`, `subfex`, `addic`, `addicx`, `subficx`) should additionally truncate their compare operands. This guards against any future regression that re-pollutes the GPR file.
|
||
|
||
---
|
||
|
||
## Batch 1 — integer ALU (groups 1-5)
|
||
|
||
Per-group reports: `audit-out/group-01-add-imm.md`, `group-02-add-reg.md`, `group-03-sub-reg.md`, `group-04-multiply.md`, `group-05-divide.md`.
|
||
|
||
### PPCBUG-001 — addi sign-extension, no truncation
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:114-118
|
||
- **Symptom**: `addi rT, r0, -1` (= `li rT, -1`) writes `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`. Identical shape to addis.
|
||
- **Fix**:
|
||
```rust
|
||
ctx.gpr[instr.rd()] = ra_val.wrapping_add(instr.simm16() as i64 as u64) as u32 as u64;
|
||
```
|
||
- **Test gap**: existing `test_addi` only covers positive simm16. Add a test for `li rT, -1` and verify the upper 32 bits are zero.
|
||
|
||
### PPCBUG-002 — addic untruncated writeback + 64-bit CA compare
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:133-140
|
||
- **Symptom**: (a) GPR writeback not truncated (same shape as addi). (b) CA computed via 64-bit `result < ra` — Canary's `AddDidCarry` explicitly truncates both operands to int32 first.
|
||
- **Fix**:
|
||
```rust
|
||
let ra32 = ra as u32;
|
||
let imm = instr.simm16() as i32 as u32;
|
||
let result32 = ra32.wrapping_add(imm);
|
||
ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
|
||
ctx.gpr[instr.rd()] = result32 as u64;
|
||
```
|
||
- **Test gap**: zero unit tests for addic.
|
||
|
||
### PPCBUG-003 — addicx untruncated writeback + 64-bit CA + CR0 regression
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:141-150
|
||
- **Symptom**: same as PPCBUG-002 plus a CR0 regression: live code uses `update_cr_signed(0, result as i64)` (64-bit signed). The frozen snapshot in `ppc-manual/alu/addicx.md` shows the previously-correct `result as i32 as i64` form. Live code has drifted.
|
||
- **Fix**: PPCBUG-002 fix plus `update_cr_signed(0, result32 as i32 as i64)`.
|
||
- **Test gap**: zero unit tests.
|
||
- **Note**: confirms the manual's frozen snapshots are useful drift detectors — see if other opcodes have similarly regressed.
|
||
|
||
### PPCBUG-004 — mulli untruncated 64-bit signed product
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:159-164
|
||
- **Symptom**: RA read as full `i64`, product stored as `u64` without truncation. Per ISA in 32-bit ABI, both factors should be i32 and product should fit in 32 bits (overflow silently wraps per ISA).
|
||
- **Fix**:
|
||
```rust
|
||
let ra = ctx.gpr[instr.ra()] as i32 as i64;
|
||
let imm = instr.simm16() as i64;
|
||
ctx.gpr[instr.rd()] = (ra.wrapping_mul(imm) as u32) as u64;
|
||
```
|
||
- **Test gap**: zero unit tests.
|
||
|
||
### PPCBUG-005 — subficx untruncated writeback + 64-bit CA compare
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:151-158
|
||
- **Symptom**: (a) `imm.wrapping_sub(ra)` on 64-bit values writes poisoned upper bits; sign-extended `imm` for negative SIMM has bits 32-63 set. (b) CA `imm >= ra` is 64-bit unsigned compare; wrong relative to Canary's 32-bit form.
|
||
- **Fix**:
|
||
```rust
|
||
let ra32 = ra as u32;
|
||
let imm32 = instr.simm16() as i32 as u32;
|
||
let result32 = imm32.wrapping_sub(ra32);
|
||
ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 };
|
||
ctx.gpr[instr.rd()] = result32 as u64;
|
||
```
|
||
- **Test gap**: zero unit tests.
|
||
|
||
### PPCBUG-006 — negx active GPR poisoning + 64-bit OE overflow check
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:319-330
|
||
- **Symptom**: (a) `(!ra).wrapping_add(1)` unconditionally sets upper 32 bits to all-ones because `!ra` flips them. Even a clean `r3 = 5` produces `0xFFFFFFFF_FFFFFFFB` instead of `0x00000000_FFFFFFFB`. **This is active, not latent — every neg in 32-bit-ABI code poisons the GPR.** (b) `neg_ov_64` overflow predicate tests `ra == 0x8000_0000_0000_0000` (64-bit INT_MIN) instead of `ra == 0x0000_0000_8000_0000` (32-bit INT_MIN).
|
||
- **Fix**:
|
||
```rust
|
||
let result = (!(ra as u32)).wrapping_add(1);
|
||
ctx.gpr[instr.rd()] = result as u64;
|
||
if instr.oe() {
|
||
overflow::apply(ctx, (ra as u32) == 0x8000_0000);
|
||
}
|
||
if instr.rc_bit() { ctx.update_cr_signed(0, result as i32 as i64); }
|
||
```
|
||
- **Test gap**: existing `nego_sets_ov_only_on_int_min` tests 64-bit INT_MIN — add a 32-bit INT_MIN case.
|
||
|
||
### PPCBUG-007 — subfcx CA via 64-bit unsigned compare
|
||
- **Severity**: HIGH (defensive — same shape as the compare that broke addis)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:258
|
||
- **Symptom**: `if rb >= ra { 1 } else { 0 }` is the exact 64-bit unsigned compare that the addis bug exploited. Wrong CA when either operand has poisoned upper 32 bits. Apply defensively even if all upstream sources are cleaned, because a wrong CA bit is unrecoverable downstream.
|
||
- **Fix**:
|
||
```rust
|
||
let ra32 = ra as u32;
|
||
let rb32 = rb as u32;
|
||
let result32 = rb32.wrapping_sub(ra32);
|
||
ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 };
|
||
ctx.gpr[instr.rd()] = result32 as u64;
|
||
```
|
||
- **Test gap**: zero dedicated unit tests for subfcx — the most critical opcode in Group 3 had no coverage. Add 6+ tests including the exact 0x828F3F98 / 0x828F3F68 case from the addis incident.
|
||
|
||
### PPCBUG-008 — subfex CA via 64-bit unsigned compare + `!ra` poisons writeback
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:268-284
|
||
- **Symptom**: (a) CA `if rb > ra || (rb == ra && ca != 0)` is 64-bit; same shape as PPCBUG-007. (b) Writeback uses `(!ra).wrapping_add(rb).wrapping_add(ca)` — `!ra` always sets upper 32 bits, guaranteed GPR poison even with clean inputs (same shape as PPCBUG-006).
|
||
- **Fix**:
|
||
```rust
|
||
let ra32 = ra as u32;
|
||
let rb32 = rb as u32;
|
||
let ca = ctx.xer_ca as u32;
|
||
let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca);
|
||
ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 };
|
||
ctx.gpr[instr.rd()] = result32 as u64;
|
||
```
|
||
|
||
### PPCBUG-009 — mullwx untruncated 64-bit signed product
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:331-344
|
||
- **Symptom**: 32x32 multiply produces 64-bit signed `i64` product, written to GPR via `as u64` without truncation. When product overflows i32 (which `mullw_ov` correctly detects), upper 32 bits are non-zero and corrupt downstream 64-bit unsigned compares — same class as addis.
|
||
- **Fix** (one line; OE handler unchanged):
|
||
```rust
|
||
ctx.gpr[instr.rd()] = product as u32 as u64;
|
||
```
|
||
|
||
### PPCBUG-010 — divwx quotient sign-extended to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: open (must be applied in same commit as PPCBUG-011)
|
||
- **Location**: interpreter.rs:373
|
||
- **Symptom**: `(ra / rb) as i64 as u64` sign-extends a negative i32 quotient. `-10 / 3 = -3` writes `0xFFFFFFFF_FFFFFFFD` instead of `0x00000000_FFFFFFFD`. Canary's `InstrEmit_divwx` uses `f.ZeroExtend(v, INT64_TYPE)` — explicit zero-extension.
|
||
- **Fix**: `ctx.gpr[instr.rd()] = (ra / rb) as u32 as u64;`
|
||
|
||
### PPCBUG-011 — divwx CR0 update breaks after PPCBUG-010 fix
|
||
- **Severity**: MEDIUM (coupled to PPCBUG-010 — must land together)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:379
|
||
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.rd()] as i64)` accidentally works today because the sign-extended GPR has consistent sign in i64 view. After PPCBUG-010, GPR holds `0x00000000_FFFFFFFD` for `-3` and `as i64` reads positive — CR0.LT will be wrong for negative quotients.
|
||
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64);`
|
||
|
||
### PPCBUG-012 — addx writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:167-179
|
||
- **Symptom**: 64-bit `wrapping_add` result written to GPR untruncated. Latent: only triggers if upstream operands have poisoned upper 32 bits. With PPCBUG-001 etc. unfixed, that invariant is broken — addx amplifies the poison.
|
||
- **Fix**: `ctx.gpr[instr.rd()] = result as u32 as u64;`
|
||
|
||
### PPCBUG-013 — addcx writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:180-193
|
||
- **Fix**: same shape as PPCBUG-012.
|
||
|
||
### PPCBUG-014 — addex writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:194-209
|
||
- **Fix**: same shape as PPCBUG-012.
|
||
|
||
### PPCBUG-015 — addzex writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:210-224
|
||
- **Fix**: same shape as PPCBUG-012.
|
||
|
||
### PPCBUG-016 — addmex writeback not truncated (latent + edge case)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:225-240
|
||
- **Symptom**: same writeback issue plus the `wrapping_sub(1)` produces all-ones upper 32 bits when low 32 bits underflow — guaranteed poison even if inputs are clean (same shape as PPCBUG-006/008).
|
||
- **Fix**: truncate operands and result to 32 bits.
|
||
|
||
### PPCBUG-017 — subfx writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:241-253
|
||
- **Fix**: same shape as PPCBUG-012.
|
||
|
||
### PPCBUG-018 — subfzex writeback not truncated + `!ra` poisons
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:285-302
|
||
- **Symptom**: `(!ra).wrapping_add(ca)` flips upper 32 bits — guaranteed poison.
|
||
- **Fix**: truncate ra to u32, do arithmetic on u32, write `as u64`.
|
||
|
||
### PPCBUG-019 — subfmex writeback poisoning + always-true CA edge
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:303-318
|
||
- **Symptom**: (a) writeback poisoned via `(!ra)`. (b) CA predicate `(!ra) != 0` is always true when ra has clean upper 32 bits (because `!ra` flips them) — so CA is always 1, even in the documented edge case where 32-bit `ra == 0xFFFFFFFF && ca == 0` should yield CA=0.
|
||
- **Fix**: operate on u32, then `xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 }`.
|
||
|
||
### PPCBUG-020 — CR0 update uses 64-bit signed compare in all sub-register ops
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:250, 264, 281, 299, 315, 327, 341, 379, 396, 410, 419, 428, 445, 462 (every Rc=1 path in groups 2-5)
|
||
- **Symptom**: `update_cr_signed(0, result as i64)` views result as 64-bit signed. In 32-bit ABI, bit 31 determines LT/GT, not bit 63. A result like `0x00000000_80000000` is negative in 32-bit but positive in 64-bit — CR0.LT inverted.
|
||
- **Fix (catch-all)**: change to `result as u32 as i32 as i64` everywhere. Once PPCBUG-001..-019 truncate writebacks, the upper 32 bits of `result` are zero and this distinction becomes moot — but applying both is cheap and provides defense in depth.
|
||
- **Note**: this is one logical fix duplicated across all rc paths; the fix session should grep `update_cr_signed(0, .* as i64)` to find them all.
|
||
|
||
### PPCBUG-021 — OE overflow checks at bit 63 in all sub-register ops
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Locations**: throughout — `add_ov_64`, `sub_ov_64`, `sum_overflow_64`, `mullw_ov`, etc. (defined in `xenia-cpu/src/overflow.rs`)
|
||
- **Symptom**: signed-overflow check operates on 64-bit boundary. For 32-bit-ABI ops (`addo`, `subfo`, `subfco`, etc.), should check at bit 31. With PPCBUG-006 a tighter form was given for `negx`. The pattern probably needs systematic review across overflow.rs.
|
||
- **Fix**: open a follow-up audit of overflow.rs after batch B completes.
|
||
|
||
### PPCBUG-022 — mulld_ov missing INT_MIN * -1 edge case
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `xenia-cpu/src/overflow.rs` (`mulld_ov` helper)
|
||
- **Symptom**: 64-bit signed multiply overflow check doesn't handle `i64::MIN * -1`.
|
||
- **Fix**: add the special case to the helper.
|
||
|
||
### PPCBUG-023 — andisx CR0 update uses 64-bit signed compare; should use 32-bit
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:475
|
||
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` interprets the result as 64-bit signed. The `andisx` result is bounded by `0x0000_0000_FFFF_0000`, which is always non-negative in 64-bit view. In 32-bit ABI, bit 31 is the sign bit — results with bit 31 set (e.g. `andis. rA, rS, 0x8000` with rS=0x80000000 → result=0x80000000) should yield CR0.LT=1, but xenia-rs gives CR0.GT=1. The ppc-manual frozen snapshot for `andisx` shows the correct `as i32 as i64` form; the live code has drifted. Common trigger: `andis. rA, rS, 0x8000` to test the sign bit of a 32-bit word.
|
||
- **Fix**:
|
||
```rust
|
||
ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);
|
||
```
|
||
- **Test gap**: zero tests for `andisx`. Add at minimum: result with bit 31 set (expect LT=1), result with bits 0–30 set (expect GT=1), result=0 (expect EQ=1).
|
||
|
||
---
|
||
|
||
## Batch 2 — logical immediate (group 6)
|
||
|
||
Per-group report: `audit-out/group-06-logic-imm.md`.
|
||
|
||
Group 6 summary: only 1 new bug found. The `simm16` sign-extension pattern does not apply (all ops use `uimm16`). `ori`, `oris`, `xori`, `xoris`, and `andix` are ISA-correct; `andisx` has a CR0 interpretation bug (PPCBUG-023). All 6 opcodes have inadequate test coverage (LOW gaps for 5 of them, MEDIUM gap for `andisx` tied to the bug).
|
||
|
||
---
|
||
|
||
## Batch 3 — word rotate-and-mask (group 9)
|
||
|
||
Per-group report: `audit-out/group-09-word-rotate.md`.
|
||
|
||
Group 9 summary: core arithmetic is clean — `rlw_mask`, rotate logic, and result write are all ISA-correct. The single recurring defect is the Rc=1 CR0 path using `as i64` instead of `as u32 as i32 as i64` (instances of PPCBUG-020 specific to these three opcodes). `rlwimix` zeroes the upper 32 bits of RA instead of preserving them per ISA, but this is safe under 32-bit ABI invariant and classified LOW. Test coverage is poor: 1 partial test for `rlwinmx`, zero for the other two.
|
||
|
||
### PPCBUG-024 — rlwinmx CR0 update uses 64-bit signed compare; should use 32-bit
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:667
|
||
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` — result is a zero-extended u32, so bit 31 set yields +2147483648 in 64-bit signed view but -2147483648 in 32-bit ABI. CR0.LT/GT inverted for results with bit 31 set. `rlwinm.` is the most common dot-form instruction in compiler output (all `slwi.`, `srwi.`, `clrlwi.`, bitfield-test-and-branch idioms).
|
||
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
|
||
- **Test gap**: `test_rlwinm` exists but non-Rc only, result has bit 31 clear. Add Rc=1 tests with bit 31 set in result.
|
||
|
||
### PPCBUG-025 — rlwimix CR0 update uses 64-bit signed compare; should use 32-bit
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:679
|
||
- **Symptom**: same class as PPCBUG-024. `rlwimi.` is compiler-generated for struct bitfield writes; when the inserted value occupies or sets bit 31 of RA, CR0.LT is wrong.
|
||
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
|
||
- **Test gap**: zero tests for `rlwimix`. Add basic insert (non-Rc) + Rc=1 with bit-31-set case.
|
||
|
||
### PPCBUG-026 — rlwnmx CR0 update uses 64-bit signed compare; should use 32-bit
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:690
|
||
- **Symptom**: same class as PPCBUG-024. `rlwnm.` is less frequent but used in variable-shift normalisation patterns.
|
||
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
|
||
- **Test gap**: zero tests for `rlwnmx`.
|
||
|
||
### PPCBUG-027 — rlwimix zeroes upper 32 bits of RA instead of preserving them (ISA deviation, LOW)
|
||
- **Severity**: LOW
|
||
- **Status**: open (no fix action required for 32-bit ABI emulation)
|
||
- **Location**: interpreter.rs:677-678
|
||
- **Symptom**: `let ra = ctx.gpr[instr.ra()] as u32` discards upper 32 bits; result written as `as u64` zero-extends. Per ISA, `(RA) & ¬MASK(MB+32, ME+32)` preserves upper 32 bits of RA. Canary confirms: `f.And(f.LoadGPR(i.M.RA), f.LoadConstantUint64(~m))` with `~m` non-zero in upper half.
|
||
- **Impact**: under 32-bit ABI, if the 32-bit GPR invariant holds, upper 32 bits of RA are already zero before `rlwimix`, so both behaviours are identical. The deviation is only observable if an upstream bug (PPCBUG-001..023) has leaked non-zero upper bits into RA — in which case `rlwimix` would silently clean them (beneficial side-effect). No isolated fix needed; resolves automatically when upstream bugs are fixed.
|
||
- **Note**: if 64-bit mode support is ever added, this will become a HIGH bug.
|
||
|
||
---
|
||
|
||
## Batch 2 — logical register (group 7) [renumbered from collision]
|
||
|
||
Per-group report: `audit-out/group-07-logic-reg.md` (note: report uses original IDs PPCBUG-023..029 from the subagent's local numbering; tracker uses PPCBUG-028..033 here to avoid collision with groups 6 and 9).
|
||
|
||
The group 7 subagent also flagged a CR0 regression across all 8 opcodes — that is an extension of PPCBUG-020 (catch-all for CR0 64-bit-signed regressions). Adding andx, andcx, orx, orcx, xorx, norx, nandx, eqvx Rc=1 paths to PPCBUG-020's scope rather than creating a new ID.
|
||
|
||
### PPCBUG-028 — orcx active GPR poisoning
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:509-513
|
||
- **Symptom**: writes `rs | !rb`. Rust's `!` on `u64` flips all 64 bits — the upper 32 bits of `!rb` are unconditionally all-ones, OR'd into the result. With clean inputs `orc r5, r3, r4` writes `0xFFFFFFFF_xxxxxxxx`. Active poisoning, same shape as PPCBUG-006/008.
|
||
- **Fix**: operate on u32, write `as u64`:
|
||
```rust
|
||
let result = (ctx.gpr[instr.rs()] as u32) | !(ctx.gpr[instr.rb()] as u32);
|
||
ctx.gpr[instr.ra()] = result as u64;
|
||
```
|
||
- **Test gap**: zero tests.
|
||
|
||
### PPCBUG-029 — norx active GPR poisoning (the `not` simplified mnemonic)
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:519-523
|
||
- **Symptom**: writes `!(rs | rb)` — outer `!` flips upper 32 bits unconditionally. **`nor rA, rS, rS` is the canonical `not` simplified mnemonic** used pervasively in PPC code; every `not` in 32-bit-ABI Xbox 360 binaries actively poisons the GPR.
|
||
- **Fix**: u32 arithmetic, write `as u64`.
|
||
|
||
### PPCBUG-030 — nandx active GPR poisoning
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:524-528
|
||
- **Symptom**: writes `!(rs & rb)` — same shape as norx. The simplified mnemonic `nand` is also `nand rA, rS, rS` (= `nor . . .` in some assemblers).
|
||
- **Fix**: u32 arithmetic.
|
||
|
||
### PPCBUG-031 — eqvx active GPR poisoning
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:529-533
|
||
- **Symptom**: writes `!(rs ^ rb)` — same shape. The idiom `eqv rA, rS, rS` "set rA to all-ones (i.e. -1 in 32-bit ABI)" produces `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`.
|
||
- **Fix**: u32 arithmetic.
|
||
|
||
### PPCBUG-032 — andx / orx / xorx writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:494-498 (andx), 504-508 (orx), 514-518 (xorx)
|
||
- **Symptom**: 64-bit bitwise on full GPR values. Latent — clean if both operands are clean; pollutes if either is poisoned upstream.
|
||
- **Fix**: `as u32 as u64` truncation at writeback. Once all upstream poison sources are fixed, these become unnecessary; until then, defensive truncation.
|
||
|
||
### PPCBUG-033 — andcx active poisoning via `!rb` sub-expression
|
||
- **Severity**: MEDIUM (the `!rb` always poisons; outer `&` masks it away when rs is clean — fully active when rs is poisoned)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:499-503
|
||
- **Symptom**: writes `rs & !rb`. The `!rb` always has all-ones upper bits; if rs has clean upper bits (zero), the result is clean. If rs is poisoned upstream, the poison propagates AND the always-set bits in `!rb` make it look "guaranteed". This is closer to active than latent.
|
||
- **Fix**: `(rs as u32) & !(rb as u32)` then `as u64`.
|
||
|
||
## Batch 2 — sign-extend / count-leading-zeros (group 8) [renumbered]
|
||
|
||
Per-group report: `audit-out/group-08-extend-clz.md` (report uses local IDs PPCBUG-023..030; tracker uses PPCBUG-034..039).
|
||
|
||
### PPCBUG-034 — extsbx writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:537
|
||
- **Symptom**: `as i8 as i64 as u64` — a byte with high bit set (0x80) writes `0xFFFFFFFF_FFFFFF80` instead of `0x00000000_FFFFFF80`. Active poisoning on every negative byte. `extsb` is emitted by compilers to canonicalize signed-byte arguments — common code path.
|
||
- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i8 as i32 as u32 as u64;`
|
||
- **Test gap**: zero unit tests.
|
||
- **Note**: Canary's JIT does the same sign-extension but is rescued by x86's 32-bit-write zeroing the upper 32 of host registers. Pure interpreter has no such escape.
|
||
|
||
### PPCBUG-035 — extshx writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:542
|
||
- **Symptom**: `as i16 as i64 as u64` — same shape as PPCBUG-034 for halfwords.
|
||
- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i16 as i32 as u32 as u64;`
|
||
|
||
### PPCBUG-036 — extsbx CR0 coupling
|
||
- **Severity**: MEDIUM (must land in same commit as PPCBUG-034)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:538
|
||
- **Symptom**: `update_cr_signed(0, ra as i64)` — currently latent because the unfixed sign-extended value's i64 sign matches bit 7 of the byte. After PPCBUG-034 lands, the truncated value's i64 view becomes always non-negative — CR0.LT will never fire for negative byte results.
|
||
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);` — must land with PPCBUG-034.
|
||
|
||
### PPCBUG-037 — extshx CR0 coupling
|
||
- **Severity**: MEDIUM (must land with PPCBUG-035)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:543
|
||
- **Symptom**: same coupling shape as PPCBUG-036 for halfwords.
|
||
|
||
### PPCBUG-038 — extswx ISA-correct, document asymmetry
|
||
- **Severity**: LOW (informational / wontfix)
|
||
- **Status**: wontfix
|
||
- **Location**: interpreter.rs:547
|
||
- **Symptom**: `as i32 as i64 as u64` produces full 64-bit sign-extension. This IS the documented purpose of extsw — argument-register canonicalization in 64-bit mode. Behavior is intentional. After PPCBUG-034/035 land, document the asymmetry with extsb/extsh in a comment.
|
||
|
||
### PPCBUG-039 — cntlzdx counts upper 32 always-zero bits in 32-bit ABI
|
||
- **Severity**: LOW
|
||
- **Status**: open (probably dead code in Xbox 360 binaries)
|
||
- **Location**: interpreter.rs:556-562
|
||
- **Symptom**: counts leading zeros in full 64. If a 32-bit-ABI binary emits cntlzd, the result is `32 + cntlzw(low32)` not `cntlzw(low32)`. ISA-correct for 64-bit mode; only matters if the binary actually emits it.
|
||
- **Test gap**: zero tests.
|
||
|
||
#### Clean opcodes from group 8
|
||
|
||
- `cntlzwx` (interpreter.rs:551-555) — `(rs as u32).leading_zeros()` reads only low 32 bits, result range 0..=32, upper 32 zero. CR0 path benign because result is small. **Test gap only**, LOW.
|
||
- `extswx` CR0 path is correct per ISA (PPCBUG-038 wontfix).
|
||
|
||
## Batch 2 — shift (group 11) [renumbered]
|
||
|
||
Per-group report: `audit-out/group-11-shift.md` (uses local IDs PPCBUG-050..055; tracker uses PPCBUG-040..045).
|
||
|
||
### PPCBUG-040 — DECODER BUG: `sh64()` wrong bit order for sradi (HIGH)
|
||
- **Severity**: HIGH (this is a decoder-level bug, file:line is in `decoder.rs` not `interpreter.rs`)
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `xenia-rs/crates/xenia-cpu/src/decoder.rs:91-93` (the `sh64()` accessor method on `DecodedInstr`)
|
||
- **Symptom**: the XS-form `sradix` (sradi) shift amount is assembled as `SH[4:0] << 1 | SH[5]` instead of the correct `SH[5] << 5 | SH[4:0]`. **Every `sradi rA, rS, N` instruction where N is not 0 or 63 executes with a completely wrong shift count.** Example: `sradi rA, rS, 32` shifts by 1 instead. This is a silent, structural mis-decoding — none of the interpreter changes can paper over it.
|
||
- **Cross-reference**: Canary's `(i.XS.SH5 << 5) | i.XS.SH` pattern is the correct ISA encoding.
|
||
- **Fix**: in `decoder.rs:sh64()` body, swap the bit order:
|
||
```rust
|
||
pub fn sh64(&self) -> u32 {
|
||
// SH5 is at bit 30 of the encoded word; SH[4:0] is at bits 16-20.
|
||
let sh_lo = extract_bits(self.raw, 16, 20);
|
||
let sh_hi = extract_bits(self.raw, 30, 30);
|
||
(sh_hi << 5) | sh_lo
|
||
}
|
||
```
|
||
- **Impact**: `sradi` is used by compilers for arithmetic right shifts on 64-bit values. In Xbox 360 32-bit-ABI binaries it should not be common, but it's emitted by some compilers for sign-magnitude conversions and 64-bit fixed-point arithmetic. **This is the kind of silent decoder bug the user explicitly wanted the audit to catch.**
|
||
- **Test gap**: no decoder unit test pins `sh64()` for non-trivial SH values. Add fixture cases in `disasm_goldens.rs` for `sradi rA, rS, 1`, `sradi rA, rS, 32`, `sradi rA, rS, 63`.
|
||
- **Note**: any other instruction that uses the same XS-form SH split-encoding is suspect. Phase C decoder audit must verify `sradi` and `sradix` are the only consumers of `sh64()`.
|
||
|
||
### PPCBUG-041 — srawx writeback sign-extends to 64 bits
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:583, 588 (two writeback paths for the count<32 and count>=32 branches)
|
||
- **Symptom**: `result as i64 as u64` violates the 32-bit-ABI zero-extension convention. A negative shifted value writes `0xFFFFFFFF_xxxxxxxx` instead of `0x00000000_xxxxxxxx`.
|
||
- **Fix**: `result as u32 as u64` in both writeback paths.
|
||
- **Note**: subagent verified the CA computation is **independently correct** — uses `(rs as u32) << (32 - sh) != 0` which is the canonical ISA shifted-out-bits test on 32-bit operands. **Do not change CA logic.**
|
||
|
||
### PPCBUG-042 — srawix writeback sign-extends to 64 bits
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:600, 605 (same shape as PPCBUG-041 for srawi)
|
||
- **Fix**: `result as u32 as u64`.
|
||
|
||
### PPCBUG-043 — srawx / srawix CR0 coupling
|
||
- **Severity**: MEDIUM (must land with PPCBUG-041 and PPCBUG-042)
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:593, 607
|
||
- **Symptom**: currently masked by the sign-extended writeback (sign-extension makes the 64-bit and 32-bit sign agree). After truncating the writeback, `as i64` will misread the sign for negative results.
|
||
- **Fix**: `as u32 as i32 as i64` in both Rc=1 paths, applied with PPCBUG-041/042.
|
||
|
||
### PPCBUG-044 — slwx / srwx CR0 misclassifies negative 32-bit results
|
||
- **Severity**: LOW (zero-extended results have bit 31 set in low 32, but always positive in i64 view → CR0.LT never fires for slw/srw with bit-31-set results)
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:568, 576
|
||
- **Fix**: `as u32 as i32 as i64`.
|
||
|
||
### PPCBUG-045 — Zero unit tests for any shift opcode
|
||
- **Severity**: LOW (test gap only)
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:563-658 (entire shift group: slwx, srwx, srawx, srawix, sldx, srdx, sradx, sradix)
|
||
- **Recommendation**: add at least one functional test per opcode. Especially: `srawix r3, r3, 1` with rs=0xFFFFFFFE (CA should be 0), `srawix r3, r3, 1` with rs=0x80000001 (CA should be 1, result=0xC0000000); `sradix r3, r3, 32` (currently wrong per PPCBUG-040).
|
||
|
||
#### Clean opcodes from group 11
|
||
|
||
- `slwx` writeback at line 568 (zero-ext 32-bit result via `(rs as u32 << count) as u64`) — clean.
|
||
- `srwx` writeback at line 576 — clean.
|
||
- `sldx`, `srdx`, `sradx` — 64-bit ops, ISA-correct (probably dead in 32-bit-ABI binaries).
|
||
- `sradix` body logic is structurally correct; failure is solely from PPCBUG-040 giving it a wrong shift count.
|
||
|
||
## Batch 2 — doubleword rotate (group 10) [renumbered]
|
||
|
||
Per-group report: `audit-out/group-10-dword-rotate.md` (uses local IDs PPCBUG-027/028; tracker uses PPCBUG-046/047).
|
||
|
||
### PPCBUG-046 — DECODER BUG: wrong bit position for MB[5] in all 6 doubleword-rotate opcodes (HIGH)
|
||
- **Severity**: HIGH (decoder-level; impacts the canonical zero-extend-to-32 idiom)
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Locations**: interpreter.rs — every arm of `rldiclx`, `rldicrx`, `rldicx`, `rldimix`, `rldclx`, `rldcrx` (lines 693-754)
|
||
- **Symptom**: each arm computes `let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1)`. The bit at `(instr.raw >> 1) & 1` is **PPC bit 30**, which in MD form is `sh[0]` (the low bit of the shift amount) — NOT `mb[5]`. The high bit of the 6-bit MB field lives at PPC bit 26 = `(instr.raw >> 5) & 1`.
|
||
|
||
As written, the code computes `(mb[4:0] << 1) | sh[0]`. Ironically `disasm.rs:1256` (the `mb_md()` helper) has the correct formula. The interpreter was written independently with the wrong bit position — probably a copy-error from `sh64()` where bit 30 really is the split bit.
|
||
- **Concrete impact**:
|
||
- `clrldi r3, r4, 32` is the canonical "zero-extend low 32 bits" idiom emitted constantly in 32-bit-ABI PPC code. Encoded as `rldicl r3, r4, 0, mb=32`. With mb=32, `mb[5]=1, mb[4:0]=0`. The interpreter decodes mb=0 → mask is all-ones → instruction becomes a no-op. Any downstream 64-bit compare (subfcx CA, cmpld) on that register sees a polluted 64-bit value instead of a clean 32-bit zero-extended one. **This is the same class of bug that caused the addis/BST incident.**
|
||
- For `rldcr` (MDS form), the XO field's LSB at bit 30 is always 1 (Rc=0 opcode), so `me[5]` is forcibly set to 1 for every non-record-form invocation — effectively adding 32 to all me values.
|
||
- **Fix** (one line per opcode):
|
||
```rust
|
||
// Replace in all 6 arms:
|
||
let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1);
|
||
// With:
|
||
let mb = instr.mb() | (((instr.raw >> 5) & 1) << 5);
|
||
```
|
||
Or, cleaner: expose `mb_md()` (currently in disasm.rs:1256) as a method on `DecodedInstr` in `decoder.rs` and have the interpreter call `instr.mb_md()` — single source of truth for MD-form mb extraction.
|
||
- **Test gap**: zero execution tests for any of the 6 opcodes; only disasm-golden string-output tests.
|
||
- **Note**: this is the second decoder bug found by the audit (PPCBUG-040 / `sh64()` for `sradi` is the first). Phase C decoder audit must verify whether other MD/MDS/XS form accessors have similar bit-position errors.
|
||
|
||
### PPCBUG-047 — Zero execution tests for any doubleword-rotate opcode
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:693-754 (all 6 opcodes)
|
||
- **Recommendation**: at minimum, a `clrldi r3, r4, 32` test verifying the result is exactly the low 32 bits of r4. After PPCBUG-046 lands, this test would have caught the MB-reconstruction bug.
|
||
|
||
#### What's correct in group 10
|
||
|
||
- `sh64()` accessor — correctly reconstructs 6-bit shift from MD split encoding (cross-check: `disasm.rs` agrees).
|
||
- `rld_mask_left()` / `rld_mask_right()` mask helpers — verified against Canary's XEMASK.
|
||
- `rldicx`/`rldimix` mask formulas (`63 - sh` for right edge) — correct.
|
||
- `rldimix` read-modify-write merge — correct 64-bit mask-insert.
|
||
- CR0 `as i64` — correct here because these ARE genuine 64-bit ops (unlike word rotate).
|
||
- `rldcl`/`rldcr` register-shift extraction (`gpr[rb] & 0x3F`) — correct.
|
||
- No 32-bit writeback truncation needed: these are intentionally 64-bit; 32-bit-ABI compilers only emit them with masks that yield 32-bit-clean results.
|
||
|
||
## Batch 3 — branch (group 13)
|
||
|
||
Per-group report: `audit-out/group-13-branch.md`.
|
||
|
||
Group 13 summary: the branch implementation is substantively correct. All BO/BI bit masks,
|
||
CTR decrement-before-test ordering, AA absolute vs relative dispatch, LK unconditional write
|
||
(including not-taken path in `bcx`), LR-read-before-LR-write atomicity in `bclrx`, and
|
||
`get_cr_bit()` field indexing are all ISA-correct and match Canary. The only execution bugs
|
||
are a latent 64-bit CTR zero-test (PPCBUG-053/054, active under current GPR-pollution environment)
|
||
and severely thin test coverage (PPCBUG-055).
|
||
|
||
### PPCBUG-053 — CTR zero-test uses 64-bit compare; should use 32-bit in `bcx`/`bclrx`
|
||
- **Severity**: MEDIUM (effectively HIGH given unfixed PPCBUG-001..031 GPR pollution)
|
||
- **Status**: open
|
||
- **Locations**: `interpreter.rs:849` (`bcx` `ctr_ok`), `interpreter.rs:879` (`bclrx` `ctr_ok`)
|
||
- **Symptom**: `ctx.ctr != 0` compares all 64 bits. In 32-bit ABI the CTR is logically 32-bit.
|
||
Canary explicitly truncates to 32 bits: `ctr = f.Truncate(ctr, INT32_TYPE)`. When CTR upper
|
||
32 bits are non-zero (due to upstream GPR pollution flowing through `mtspr CTR, rN`), the
|
||
64-bit test disagrees with the 32-bit ISA semantic. Most dangerous with `neg; mtctr; bdnz`:
|
||
`negx` (PPCBUG-006) always sets upper 32 bits, so the 32-bit CTR counter can reach zero
|
||
while the 64-bit CTR is still non-zero → infinite loop.
|
||
- **Fix**:
|
||
```rust
|
||
// Replace in both bcx and bclrx:
|
||
let ctr_ok = (bo & 0b00100) != 0
|
||
|| (((ctx.ctr as u32) != 0) ^ ((bo & 0b00010) != 0));
|
||
```
|
||
Or, alternatively, truncate at decrement:
|
||
```rust
|
||
if bo & 0b00100 == 0 {
|
||
ctx.ctr = ctx.ctr.wrapping_sub(1) as u32 as u64;
|
||
}
|
||
```
|
||
- **Test gap**: zero tests for CTR-decrement branches (bdnz, bdz, bdnzt, bdnzf, bdzt, bdzf).
|
||
|
||
### PPCBUG-054 — `mtspr CTR` writeback not truncated to 32 bits
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1411`
|
||
- **Symptom**: `crate::context::spr::CTR => ctx.ctr = val` writes the full 64-bit GPR to CTR.
|
||
Acts as a firewall gap: any upstream 64-bit GPR pollution flows directly into CTR, where it
|
||
will be tested by PPCBUG-053's 64-bit comparison. Defensive fix prevents CTR from ever
|
||
acquiring non-zero upper 32 bits independently of the GPR-pollution fix.
|
||
- **Note**: the `bcctrx` branch-target read (`(ctx.ctr as u32) & !3`) already truncates
|
||
correctly; the bug is confined to the `ctr != 0` zero-test in `bcx`/`bclrx`.
|
||
- **Fix**: `crate::context::spr::CTR => ctx.ctr = val as u32 as u64,`
|
||
- **Cross-reference**: Group 16 (SPR/MSR) subagent should verify this write-point.
|
||
|
||
### PPCBUG-055 — Severely inadequate test coverage for all four branch opcodes
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Locations**: `interpreter.rs` test module (lines 4455–4491)
|
||
- **Current coverage**: `bx` forward (1 test), `bl` LR update (1 test), `bcx` taken beq (1 test via `test_cmp_and_bc`). Zero tests for: `bclrx`, `bcctrx`, any CTR-decrement variant, not-taken path, backward branch, AA=1 absolute, `bcl` LR-write-on-not-taken.
|
||
- **Recommended minimum**: blr, bctr, bdnz (taken and not-taken at boundary CTR=1), bclrl old-LR-as-target, bcl LK-write-on-not-taken. See per-group report for concrete encoding patterns.
|
||
|
||
---
|
||
|
||
## Batch 3 — trap + system call (group 14)
|
||
|
||
Per-group report: `audit-out/group-14-trap-sc.md`.
|
||
|
||
Group 14 summary: the core trap evaluation (`trap.rs`) is correct — TO bit constants, signed/unsigned
|
||
comparison dispatch, and word-vs-doubleword width handling are all ISA-conformant. The live interpreter
|
||
arm properly evaluates the TO field (replacing the old unconditional-trap stub). Three MEDIUM issues
|
||
found: PC ordering on trap return, missing LEV dispatch for `sc`, and the Xbox 360 typed-trap
|
||
convention (`twi 31, r0, IMM`) not handled. Two LOW findings for stale manual snapshots and test gaps.
|
||
|
||
### PPCBUG-063 — `ctx.pc` already at CIA+4 when `StepResult::Trap` returns
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1543 (`ctx.pc += 4`) before interpreter.rs:1549 (`return StepResult::Trap`)
|
||
- **Symptom**: any trap handler that reads `ctx.pc` to find the faulting instruction sees CIA+4 instead
|
||
of CIA. The existing `tracing::warn!` compensates with `.wrapping_sub(4)`, confirming the asymmetry.
|
||
On real hardware, SRR0 = CIA (trapping instruction address). Current risk LOW (no handler inspects
|
||
pc), but HIGH if any SEH/exception-delivery path is added (critical for the C++ throw investigation).
|
||
- **Fix**: save CIA before incrementing, restore it when firing the trap:
|
||
```rust
|
||
let trap_pc = ctx.pc;
|
||
ctx.pc += 4;
|
||
if fired { ctx.pc = trap_pc; return StepResult::Trap; }
|
||
```
|
||
Alternatively store CIA in a separate `ctx.srr0`-equivalent field and leave `ctx.pc` at NIA.
|
||
- **Note**: `sc` correctly leaves `ctx.pc` at NIA (the return address) — that is a different and
|
||
correct design choice. The inconsistency between sc and trap is the bug.
|
||
|
||
### PPCBUG-064 — `sc` ignores `LEV` field; `sc 2` (HVcall) silently misdispatched
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:915-918
|
||
- **Symptom**: `sc 2` (Xbox 360 hypervisor call) returns `StepResult::SystemCall` identically to
|
||
`sc 0`. Canary dispatches LEV=0 to `syscall_handler` and LEV=2 to `f.function()` (the HVcall
|
||
path). For pure game-title code (LEV=0 only) this is invisible; XDK kernel-mode components and
|
||
some HV-aware titles may use `sc 2`.
|
||
- **Fix**: decode the 7-bit LEV field (bits 20-26 of SC-form encoding), add a `HypervisorCall`
|
||
variant to `StepResult`, and dispatch accordingly.
|
||
|
||
### PPCBUG-065 — `twi 31, r0, IMM` typed-trap not handled; SIMM type code discarded
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1532-1551 (trap arm)
|
||
- **Symptom**: `twi 31, r0, IMM` (TO=31=unconditional, RA=r0) is used by the Xbox 360 CRT/kernel
|
||
to encode typed C++ exceptions — the 16-bit SIMM carries the exception type discriminator. xenia-rs
|
||
fires the trap correctly but discards SIMM. The caller sees a generic `StepResult::Trap` with no
|
||
type information, preventing correct C++ SEH dispatch.
|
||
- **Canary reference**: `ppc_emit_control.cc:611-616` special-cases `RA==0 && TO==31` and calls
|
||
`f.Trap(type)` with the SIMM as the type code.
|
||
- **Fix**: add a `trap_type: Option<u16>` payload to `StepResult::Trap`. Detect `twi` with `to()==31`
|
||
and `ra()==0` and populate it with `instr.simm16() as u16`.
|
||
- **Note**: directly relevant to the Sylpheed `std::runtime_error` throw investigation
|
||
(project_xenia_rs_sylpheed_throw_2026_04_28.md) — the typed-trap SIMM carries the CRT exception
|
||
class that the kernel uses to route to the correct handler.
|
||
|
||
### PPCBUG-066 — Stale frozen snapshots in ppc-manual for td/tdi/tw/twi
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `ppc-manual/branch/td.md`, `tdi.md`, `tw.md`, `twi.md`
|
||
- **Symptom**: all four show the old unconditional-trap stub (`// For now, just trace and continue`)
|
||
instead of the current TO-field-evaluating implementation.
|
||
- **Fix**: regenerate after PPCBUG-063 and PPCBUG-065 are resolved.
|
||
|
||
### PPCBUG-067 — Test gaps for trap and sc
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs `#[cfg(test)] mod tests`
|
||
- **Missing coverage**: `sc` smoke test (fires SystemCall, advances PC); `td` vs `tw` on 64-bit-clean
|
||
operands (width discrimination); `tdi`/`td` signed/unsigned LT/GT conditions; `tw 31, r0, r0`
|
||
unconditional `trap` encoding; `twi 31, r0, N` typed-trap; negative simm16 in `twi`.
|
||
|
||
---
|
||
|
||
## Batch 3 — SPR / MSR / TB / FPSCR / VSCR moves (group 16)
|
||
|
||
Per-group report: `audit-out/group-16-spr-msr.md`.
|
||
|
||
Group 16 summary: the core paths are clean — `mfcr`, `mtcrf`, `mfspr`, `mtspr`, `mftb`, `mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`, `mfvscr`, `mtvscr` are all functionally ISA-correct. The `spr()` decoder accessor correctly inverts the PPC XFX half-swap encoding. The one MEDIUM finding is `mtmsrd` silently ignoring the `L=1` partial-MSR-write semantics. Five LOW test-gap findings cover near-total absence of unit tests for this entire group.
|
||
|
||
### PPCBUG-078 — `mtmsrd` L=1 partial-MSR-write not modelled
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1458-1461`
|
||
- **Symptom**: xenia-rs merges `mtmsr` and `mtmsrd` into a single body that unconditionally writes `ctx.msr = ctx.gpr[instr.rs()]`. PowerISA specifies that `mtmsrd` with instruction bit 15 (`L`) = 1 performs a partial update: only `MSR[EE]` (u64 bit 15) and `MSR[RI]` (u64 bit 0) are modified; all other MSR bits preserved. Kernel code using `mtmsrd L=1` to re-enable external interrupts silently corrupts the entire MSR in xenia-rs. Canary acknowledges the same TODO.
|
||
- **Fix**:
|
||
```rust
|
||
PpcOpcode::mtmsrd => {
|
||
let l = (instr.raw >> (31 - 15)) & 1;
|
||
if l == 1 {
|
||
let mask: u64 = (1u64 << 15) | 1u64;
|
||
let rs = ctx.gpr[instr.rs()];
|
||
ctx.msr = (ctx.msr & !mask) | (rs & mask);
|
||
} else {
|
||
ctx.msr = ctx.gpr[instr.rs()];
|
||
}
|
||
ctx.pc += 4;
|
||
}
|
||
```
|
||
- **Test gap**: zero tests for `mtmsr` or `mtmsrd`.
|
||
|
||
### PPCBUG-079 — `mtspr` silent drop of unknown-SPR writes without value logging
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1430-1433`
|
||
- **Symptom**: Unknown SPR writes are silently discarded with only a `tracing::warn!()` that omits the value being written. Reduces debuggability; no correctness impact for known Xbox 360 titles.
|
||
- **Fix** (optional): `tracing::warn!("mtspr: unimplemented SPR {} <= 0x{:016x}", spr, val)`.
|
||
|
||
### PPCBUG-080 — `mfvscr` does not zero the upper 96 bits of VD per ISA
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2198-2201`
|
||
- **Symptom**: ISA requires `mfvscr VD` to place VSCR in the rightmost word of VD and zero bytes 0-11. xenia-rs copies the full 128-bit `ctx.vscr` into `ctx.vr[VD]`, leaving stale data in bytes 0-11 if `ctx.vscr` was populated from a non-zeroed vector. Canary explicitly zero-extends.
|
||
- **Fix**:
|
||
```rust
|
||
PpcOpcode::mfvscr => {
|
||
let vscr_word = ctx.vscr.as_u32x4()[3];
|
||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array([0, 0, 0, vscr_word]);
|
||
ctx.pc += 4;
|
||
}
|
||
```
|
||
|
||
### PPCBUG-081 — Zero unit tests for `mfcr` / `mtcrf`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1436-1453`
|
||
- **Recommended additions**: full mfcr round-trip; `mtcrf 0xFF`; `mtcrf 0x80` (CR0 only); `mtcrf 0x38` (ABI CR2|CR3|CR4 restore).
|
||
|
||
### PPCBUG-082 — Minimal unit tests for `mfspr` / `mtspr`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1376-1435`
|
||
- **Note**: only DEC and TBL_WRITE covered; add LR, CTR, XER, TBL/TBU, VRSAVE.
|
||
|
||
### PPCBUG-083 — Zero unit tests for `mftb`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1462-1470`
|
||
|
||
### PPCBUG-084 — Zero interpreter-level round-trip tests for FPSCR move instructions
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2678-2720`
|
||
- **Note**: `fpscr.rs` helper-level tests exist; interpreter dispatch (`mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`) is untested end-to-end.
|
||
|
||
### PPCBUG-085 — Zero unit tests for `mfvscr` / `mtvscr`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2198-2205`
|
||
|
||
IDs PPCBUG-086 and PPCBUG-087 are unallocated — reserved for group 16 follow-up findings.
|
||
|
||
---
|
||
|
||
## Batch 3 — cache + sync (group 17)
|
||
|
||
Per-group report: `audit-out/group-17-cache-sync.md`.
|
||
|
||
Group 17 summary: the cleanest group audited so far. Both `dcbz` and `dcbz128` have correct EA computation (ra=0 special case, 64-bit→u32 truncation, alignment masks `& !31` / `& !127`, byte counts 32/128). The nine no-op opcodes (dcbf, dcbi, dcbst, dcbt, dcbtst, icbi, sync, eieio, isync) are all listed in one arm and complete. The `dcbz128` Xbox 360 specific opcode (RT=1 bit distinguishes from dcbz) dispatches correctly. **0 HIGH, 0 MEDIUM, 2 LOW** findings.
|
||
|
||
### PPCBUG-088 — sync disasm ignores L field; `lwsync` (L=1) shows as "sync"
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `xenia-rs/crates/xenia-cpu/src/disasm.rs:364`
|
||
- **Symptom**: The `PpcOpcode::sync` disasm arm outputs `"sync"` unconditionally regardless of the L field (PPC bit 10). When L=1 (word `0x7C2004AC`), the instruction should disassemble as `"lwsync"`. The `extended_mnemonics.json` golden already accepts `"sync"` as output for the lwsync case, meaning the test currently passes with the wrong string.
|
||
- **Impact**: Disassembly output for `lwsync` (very common in Xbox 360 acquire-barrier idioms) shows as `sync`. No interpreter impact; both L=0 and L=1 are correctly treated as no-op PC advance.
|
||
- **Fix**:
|
||
```rust
|
||
PpcOpcode::sync => {
|
||
// L field at PPC bit 10
|
||
if extract_bits(instr.raw, 10, 10) == 1 {
|
||
base("lwsync", String::new(), 0)
|
||
} else {
|
||
base("sync", String::new(), 0)
|
||
}
|
||
}
|
||
```
|
||
Update `extended_mnemonics.json` golden to add `"ext_mnemonic": "lwsync"` for that entry.
|
||
|
||
### PPCBUG-089 — Zero interpreter execution tests for group 17
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `xenia-rs/crates/xenia-cpu/src/interpreter.rs` (test module)
|
||
- **Symptom**: No `#[test]` covers `dcbz`, `dcbz128`, or any no-op (sync/isync/eieio/dcbf/icbi). A regression in dcbz byte count or alignment would go undetected.
|
||
- **Recommended additions**: `dcbz` with misaligned address (verifies 32-byte aligned zero), `dcbz128` with misaligned address (verifies 128-byte aligned zero), both ra=0 and ra!=0 cases, `sync`/`isync`/`dcbf` no-op PC-advance smoke tests.
|
||
|
||
---
|
||
|
||
## Batch 3 — CR logical + CR moves (group 15)
|
||
|
||
Per-group report: `audit-out/group-15-cr-logical.md`.
|
||
|
||
Group 15 summary: **cleanest group audited to date**. All 8 CR logical ops (`crand`, `crandc`,
|
||
`creqv`, `crnand`, `crnor`, `cror`, `crorc`, `crxor`), `mcrf`, and `mcrxr` are ISA-correct.
|
||
The `cr_logical` helper's use of `fn(bool, bool) -> bool` prevents the `!u64` bit-pollution class
|
||
(PPCBUG-028–031 in group 7). CR bit indexing in `get_cr_bit`/`set_cr_bit` is correct (bit/4 =
|
||
field, bit%4 = within-field sub-index matching PPC MSB-0 numbering, with sub `{0=LT, 1=GT, 2=EQ,
|
||
3=SO}`). `mcrxr` correctly maps XER{SO,OV,CA} to CR{LT,GT,EQ} with SO=false and unconditionally
|
||
clears the XER bits. `mcrfs` nibble extraction, field shift formula (`28 - crfs*4`), and
|
||
CLEARABLE_MASK (all 14 ISA-clearable exception bits, no FEX/VX) are all correct. One MEDIUM ISA
|
||
violation: `mcrfs` omits VX summary recomputation. Two LOW findings: a misleading test comment and
|
||
zero coverage for all 8 CR logical ops + `mcrf`.
|
||
|
||
### PPCBUG-068 — `mcrfs` does not recompute VX summary bit after clearing VX* exception bits
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:4250` (`ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK)`)
|
||
- **Symptom**: When `mcrfs` clears VX* exception bits (VXSNAN, VXISI, VXIDI, VXZDZ, VXIMZ,
|
||
VXVC, VXSOFT, VXSQRT, VXCVI) from any source field, the VX summary bit (FPSCR[2], `fpscr::VX
|
||
= 1<<29`) is left stale. If those VX* bits were the only contributors to VX, it should become
|
||
0 but remains 1. A subsequent `mcrfs cr0, 0` will then report VX=1 in CR0.EQ, misleading the
|
||
caller into thinking an invalid-operation exception is still active.
|
||
- **Fix**:
|
||
```rust
|
||
// After ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK); add:
|
||
if (ctx.fpscr & fpscr::VX_ALL) != 0 {
|
||
ctx.fpscr |= fpscr::VX;
|
||
} else {
|
||
ctx.fpscr &= !fpscr::VX;
|
||
}
|
||
// FEX recomputation omitted — xenia doesn't model enabled-exception dispatch.
|
||
```
|
||
- **Test gap**: existing test only covers crfS=0 (FX+OX) — no VX* bits involved. Add a test
|
||
that sets only VXSNAN, runs `mcrfs cr0, 1`, then verifies VX is now 0.
|
||
|
||
### PPCBUG-069 — `mcrfs` test comment claims OX(so)=0 but OX is set in the test
|
||
|
||
- **Severity**: LOW (cosmetic; the assert is correct, only the comment is wrong)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:5402`
|
||
- **Symptom**: Comment reads `"FX(lt)=1 and OX(so)=0"`. FPSCR was set to `(1<<31)|(1<<28)`,
|
||
which sets both FX and OX. The nibble is `0b1001`, so `so=true`. The assert `cr[2].as_u8()
|
||
== 0b1001` is correct; only the comment is wrong.
|
||
- **Fix**: `// FX(lt)=1, FEX(gt)=0, VX(eq)=0, OX(so)=1 → 0b1001 = 9`
|
||
|
||
### PPCBUG-070 — Zero execution tests for all 8 CR logical ops and `mcrf`
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Locations**: `interpreter.rs:1473–1484`
|
||
- **Missing minimum**: `crclr` idiom (`crxor BT,BT,BT`, BT=1 → 0), `crset` idiom
|
||
(`creqv BT,BT,BT`, BT=0 → 1), `crmove` idiom (`cror BT,BA,BA`), `crnot` idiom
|
||
(`crnor BT,BA,BA`, BA=1 → 0), cross-field `crand`/`crandc`, and a full `mcrf
|
||
cr0, cr3` field-copy + source-field-intact test.
|
||
|
||
---
|
||
|
||
## Pre-pass hints REFUTED by audit
|
||
|
||
These were flagged by the orchestrator's regex scan but the subagents found them to be safe:
|
||
|
||
- **`divwux` writeback** (interpreter.rs:390) — both operands cast to `u32` before division, `as u64` zero-extends correctly. **Clean.**
|
||
- **`mulhwx` intermediate cast** (interpreter.rs:349) — `((result >> 32) as i32 as i64 as u64) & 0xFFFF_FFFF` is redundant but the trailing mask saves correctness. Cosmetic only.
|
||
- **`mulhwux` writeback** (interpreter.rs:359) — `(result >> 32) & 0xFFFF_FFFF` clean unsigned. Clean.
|
||
- **CR0 stale-prepass-claim**: pre-pass document mentioned `result as i32 as i64`; live code actually uses `result as i64` — so the *claim that the live form is i64* is **correct**, but the prepass implied an i32 form was already there. PPCBUG-020 is the real finding.
|
||
|
||
---
|
||
|
||
## Batch 4 — load float (group 23)
|
||
|
||
Per-group report: `audit-out/group-23-load-float.md`.
|
||
|
||
Group 23 summary: the double-precision load family (`lfd`, `lfdu`, `lfdux`, `lfdx`) is fully
|
||
ISA-correct — EA computation, endianness, update-form writeback, and bit-pattern fidelity are
|
||
all clean. The single-precision family (`lfs`, `lfsu`, `lfsux`, `lfsx`) has one HIGH bug:
|
||
Rust's `as f64` float cast compiles to x86 `CVTSS2SD` which unconditionally sets the IEEE quiet
|
||
bit in the output, silently converting f32 SNaN loads to f64 QNaN. The ISA requires the SNaN
|
||
to pass through unchanged. FPSCR.NI does not apply to loads (correct by omission). One LOW
|
||
test-gap finding. **2 IDs used (PPCBUG-128, PPCBUG-129). 8 IDs unallocated (PPCBUG-130..137).**
|
||
|
||
### PPCBUG-128 — lfs/lfsu/lfsx/lfsux silently quieten SNaN via `as f64` Rust float cast
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1064 (lfs), 1070 (lfsx), 1087 (lfsu), 1093 (lfsux)
|
||
- **Symptom**: All four single-precision load arms use `mem.read_f32(ea) as f64` where
|
||
`read_f32` = `f32::from_bits(read_u32(ea))`. The `as f64` Rust float cast compiles to x86
|
||
`CVTSS2SD`, which unconditionally sets bit 51 of the f64 mantissa (the IEEE quiet/signalling
|
||
discriminator bit) for any NaN input. An f32 SNaN (e.g. `0x7F800001`) is loaded and written
|
||
to the FPR as the f64 QNaN `0x7FF8000002000000` instead of the SNaN `0x7FF0000002000000`.
|
||
|
||
**ISA requirement**: "A signalling NaN passes through unchanged into the FPR — it will signal
|
||
at the next FP arithmetic instruction." (lfs.md Special Cases). The FPR must hold the SNaN;
|
||
VXSNAN fires at the consuming arithmetic op, not at the load.
|
||
|
||
**Impact**: (a) Game code storing f32 SNaN sentinels (physics engines mark unset float slots
|
||
with SNaN) and then loading+inspecting them: `fpscr::is_snan(ctx.fpr[rd])` returns false
|
||
after the load, breaking sentinel detection. (b) Arithmetic ops consuming the loaded value
|
||
see a QNaN rather than SNaN, so VXSNAN is never set; games relying on VXSNAN to detect
|
||
uninitialized-read bugs get false negatives.
|
||
|
||
- **Canary parity**: Canary's JIT also uses CVTSS2SD via `f.Convert()`. Both emulators share
|
||
this deviation. The bug is a structural consequence of using semantic float widening rather
|
||
than a bit-pattern-preserving widening routine.
|
||
- **Fix**: replace the float cast with a bit-manipulation widening that preserves the SNaN bit:
|
||
```rust
|
||
fn widen_f32_bits_to_f64(raw32: u32) -> u64 {
|
||
let sign = ((raw32 >> 31) as u64) << 63;
|
||
let exp32 = ((raw32 >> 23) & 0xFF) as i32;
|
||
let mant32 = (raw32 & 0x007F_FFFF) as u64;
|
||
if exp32 == 0xFF {
|
||
// NaN or Infinity — propagate mantissa left-shifted by 29 bits.
|
||
// SNaN (bit22=0) stays SNaN (bit51=0); QNaN (bit22=1) stays QNaN (bit51=1).
|
||
sign | (0x7FFu64 << 52) | (mant32 << 29)
|
||
} else if exp32 == 0 {
|
||
// ±Zero or subnormal f32.
|
||
if mant32 == 0 { return sign; } // ±zero
|
||
// Subnormal: normalize by finding leading bit, then adjust exponent.
|
||
let shift = mant32.leading_zeros() - (64 - 23);
|
||
let exp64 = (1023u64 - 126).wrapping_sub(shift as u64);
|
||
let mant64 = (mant32 << (shift + 1 + 29)) & 0x000F_FFFF_FFFF_FFFF;
|
||
sign | (exp64 << 52) | mant64
|
||
} else {
|
||
// Normal f32 → normal f64.
|
||
let exp64 = (exp32 as u64) - 127 + 1023;
|
||
sign | (exp64 << 52) | (mant32 << 29)
|
||
}
|
||
}
|
||
// In each lfs* arm:
|
||
ctx.fpr[instr.rd()] = f64::from_bits(widen_f32_bits_to_f64(mem.read_u32(ea)));
|
||
```
|
||
This function also correctly handles subnormal f32 → normal f64 widening (which the `as f64`
|
||
cast already gets right numerically, but now goes through a consistent code path).
|
||
- **Test gap**: add a test loading an f32 SNaN (`0x7F800001`) via `lfs` and asserting
|
||
`fpscr::is_snan(ctx.fpr[rd])` is `true` and bit 51 of `ctx.fpr[rd].to_bits()` is 0.
|
||
|
||
### PPCBUG-129 — Zero interpreter execution tests for all 8 float-load opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs test module; `tests/disasm_goldens.rs:249-250` (disasm-only)
|
||
- **Symptom**: No `#[test]`-decorated function exercises any float-load interpreter arm.
|
||
A regression in EA computation, endianness, f32→f64 widening, or update-form writeback
|
||
would go undetected. The SNaN bug (PPCBUG-128) was undetected partly due to this gap.
|
||
- **Recommended minimum**:
|
||
1. `lfs` normal: `0x3F800000` (1.0f32) → assert `fpr[rd] == 1.0f64` exact.
|
||
2. `lfs` negative displacement: base minus 4.
|
||
3. `lfs` ra=0 path (absolute addressing).
|
||
4. `lfd` normal: store PI bits, assert exact bit equality via `.to_bits()`.
|
||
5. `lfd` SNaN: store `0x7FF0_0000_0000_0001u64`, assert exact bit equality after load.
|
||
6. `lfsu` / `lfsux` / `lfdu` / `lfdux`: verify loaded FPR value AND rA update address.
|
||
7. After PPCBUG-128 fix: `lfs` SNaN round-trip test.
|
||
|
||
IDs PPCBUG-130 through PPCBUG-137 are unallocated — no further bugs found in group 23.
|
||
|
||
---
|
||
|
||
## Files modified by the audit
|
||
|
||
- `xenia-rs/audit-prepass-findings.md` — Phase A pre-pass red flags (orchestrator regex output).
|
||
- `xenia-rs/audit-out/group-01-add-imm.md` — Group 1 report (Sonnet subagent).
|
||
- `xenia-rs/audit-out/group-02-add-reg.md` — Group 2 report.
|
||
- `xenia-rs/audit-out/group-03-sub-reg.md` — Group 3 report.
|
||
- `xenia-rs/audit-out/group-04-multiply.md` — Group 4 report.
|
||
- `xenia-rs/audit-out/group-05-divide.md` — Group 5 report.
|
||
- `xenia-rs/audit-out/group-06-logic-imm.md` — Group 6 report.
|
||
- `xenia-rs/audit-out/group-09-word-rotate.md` — Group 9 report.
|
||
- `xenia-rs/audit-out/group-13-branch.md` — Group 13 report.
|
||
- `xenia-rs/audit-out/group-14-trap-sc.md` — Group 14 report.
|
||
- `xenia-rs/audit-out/group-15-cr-logical.md` — Group 15 report.
|
||
- `xenia-rs/audit-out/group-16-spr-msr.md` — Group 16 report.
|
||
- `xenia-rs/audit-out/group-17-cache-sync.md` — Group 17 report.
|
||
- `xenia-rs/audit-out/group-18-load-byte.md` — Group 18 report.
|
||
- `xenia-rs/audit-out/group-19-load-halfword.md` — Group 19 report.
|
||
- `xenia-rs/audit-out/group-21-load-doubleword.md` — Group 21 report.
|
||
- `xenia-rs/audit-out/group-22-load-mlsr.md` — Group 22 report.
|
||
- `xenia-rs/audit-out/group-23-load-float.md` — Group 23 report.
|
||
- `xenia-rs/audit-out/group-24-store-byte-half.md` — Group 24 report.
|
||
- `xenia-rs/audit-out/group-26-store-doubleword.md` — Group 26 report.
|
||
- `xenia-rs/audit-findings.md` — this consolidated tracker.
|
||
|
||
**No source code under `xenia-rs/crates/` has been modified.**
|
||
|
||
---
|
||
|
||
## Batch 4 — load byte (group 18)
|
||
|
||
Per-group report: `audit-out/group-18-load-byte.md`.
|
||
|
||
Group 18 summary: **cleanest group audited to date — zero HIGH or MEDIUM bugs.** All four opcodes
|
||
(`lbz`, `lbzu`, `lbzx`, `lbzux`) are ISA-correct: EA computation (rA=0 special case, D-field
|
||
sign-extension, 32-bit EA truncation), zero-extension of the byte result to 64 bits, and
|
||
update-form writeback all match the ISA spec and Canary cross-reference. Two LOW findings only.
|
||
|
||
### PPCBUG-090 — lbzu/lbzux: rD==rA "invalid form" silently misloads rD
|
||
|
||
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:951-956 (lbzu), 963-968 (lbzux)
|
||
- **Symptom**: When `rD == rA` (invalid form, UISA undefined), the byte load into `gpr[rD]` at
|
||
line 953/965 is immediately overwritten by the EA writeback at line 954/966. Net result:
|
||
`gpr[rD]` holds the EA, not the loaded byte. Canary has the same behaviour. No practical impact
|
||
under normal compiler output.
|
||
- **Recommendation**: add `debug_assert!(instr.rd() != instr.ra())` in debug builds.
|
||
|
||
### PPCBUG-091 — Zero interpreter execution tests for all four lbz* opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs test module; disasm_goldens.rs:247 (disasm-only, no execution)
|
||
- **Symptom**: No `#[test]` exercises lines 945-968. A regression in EA computation,
|
||
zero-extension, or the update writeback would go undetected.
|
||
- **Recommended minimum**: `lbz` with ra=0 + negative displacement; `lbzu` normal case (verify
|
||
both byte result and rA update); `lbzx` with ra=0; `lbzux` normal case. Each test should
|
||
assert `gpr[rD] <= 0xFF` to catch any future accidental sign-extension.
|
||
|
||
IDs PPCBUG-092, PPCBUG-093, PPCBUG-094 are unallocated — no further bugs found in group 18.
|
||
|
||
---
|
||
|
||
## Batch 4 — load halfword (group 19)
|
||
|
||
Per-group report: `audit-out/group-19-load-halfword.md`.
|
||
|
||
Group 19 summary: **4 HIGH bugs confirmed — all pre-pass flags validated.** The four `lha*` opcodes
|
||
(`lha`, `lhax`, `lhau`, `lhaux`) all use `as i16 as i64 as u64`, sign-extending a negative halfword
|
||
to 64 bits in violation of the 32-bit ABI. Every negative halfword load (common for `int16_t` PCM
|
||
samples, packed vertex deltas, `short[]` arrays) actively poisons the upper 32 bits of the
|
||
destination GPR — identical shape to the `addis` bug. The four `lhz*` opcodes and `lhbrx` are all
|
||
clean (`as u64` zero-extension; `swap_bytes() as u64` byte-reversal; correct endian handling; correct
|
||
EA computation and update writebacks). Two LOW findings: rD==rA invalid-form in update variants,
|
||
and zero unit tests for all nine opcodes.
|
||
|
||
### PPCBUG-095 — `lha`: GPR writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:990
|
||
- **Symptom**: `mem.read_u16(ea) as i16 as i64 as u64` — memory `0x8000` writes
|
||
`0xFFFFFFFF_FFFF8000` instead of `0x00000000_FFFF8000`. Active GPR poisoning for every
|
||
negative halfword. Common trigger: `int16_t` struct fields, PCM samples, packed vertex deltas.
|
||
- **Fix**:
|
||
```rust
|
||
ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64;
|
||
```
|
||
- **Test gap**: zero unit tests. Add: memory `0x8000` → `gpr[rD] == 0x00000000_FFFF8000`;
|
||
memory `0x7FFF` → `gpr[rD] == 0x00000000_00007FFF`.
|
||
|
||
### PPCBUG-096 — `lhax`: GPR writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:996
|
||
- **Symptom**: identical to PPCBUG-095. Indexed form emitted for array access with GPR index.
|
||
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
|
||
- **Test gap**: zero unit tests.
|
||
|
||
### PPCBUG-097 — `lhau`: GPR writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1007
|
||
- **Symptom**: identical to PPCBUG-095. Update form emitted for auto-incrementing `short[]` loops;
|
||
poison accumulates across all iterations.
|
||
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
|
||
- **Test gap**: zero unit tests. Add: verify both `gpr[rD]` (upper-32 = 0) and `gpr[rA]` (EA update).
|
||
|
||
### PPCBUG-098 — `lhaux`: GPR writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1013
|
||
- **Symptom**: identical to PPCBUG-095, update+indexed form.
|
||
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
|
||
- **Test gap**: zero unit tests.
|
||
- **Note**: PPCBUG-095..098 are the same one-line fix at four sites. Fix session sweep:
|
||
`rg -n 'as i16 as i64 as u64' interpreter.rs` finds exactly these four lines.
|
||
|
||
### PPCBUG-099 — `lhau`/`lhaux`: rD==rA invalid-form silently destroys load result
|
||
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1005-1016
|
||
- **Symptom**: same as PPCBUG-090 (`lbzu`/`lbzux`) — EA writeback overwrites `gpr[rD]` when
|
||
`rD == rA`. Net: `gpr[rD]` holds EA, not the loaded value.
|
||
- **Recommendation**: `debug_assert!(instr.rd() != instr.ra())` in both arms.
|
||
|
||
### PPCBUG-100 — Zero execution tests for all nine halfword-load opcodes
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs test module
|
||
- **Symptom**: No `#[test]` exercises any of the 9 opcodes. The HIGH sign-extension bug would
|
||
have been caught by any test that checks `gpr[rD] <= 0x0000_0000_FFFF_FFFF`.
|
||
- **Recommended minimum**: `lha` with negative halfword (assert upper 32 zero), `lhz` same,
|
||
`lhau` verify both rD and rA, `lhzux` verify both rD and rA, `lhbrx` verify byte-swap.
|
||
|
||
IDs PPCBUG-101, PPCBUG-102, PPCBUG-103, PPCBUG-104 are unallocated — no further bugs found in group 19.
|
||
|
||
---
|
||
|
||
## Batch 4 — load word (group 20)
|
||
|
||
Per-group report: `audit-out/group-20-load-word.md`.
|
||
|
||
Group 20 summary: **1 HIGH bug (reservation invalidation never called), 1 MEDIUM (cross-thread
|
||
reservation isolation), 1 MEDIUM (lwa 64-bit sign-extension hazard), 3 LOW test gaps.** The
|
||
zero-extending family (`lwz`/`lwzu`/`lwzx`/`lwzux`) is entirely correct — `mem.read_u32(ea) as u64`
|
||
cleanly zero-extends; EA computation, update writebacks, and RA0 handling all match ISA and Canary.
|
||
`lwbrx` is correct: the double-swap (`from_be_bytes` then `swap_bytes()`) correctly produces a
|
||
little-endian word read, zero-extended. The sign-extending family (`lwa`/`lwax`/`lwaux`) is
|
||
ISA-correct for 64-bit mode but a 32-bit-ABI hazard — classified MEDIUM because `lwa` is a
|
||
64-bit-mode instruction unlikely to appear in Xbox 360 32-bit-ABI binaries. The HIGH finding is
|
||
that `ReservationTable::invalidate_for_write` is defined and unit-tested but **never called** from
|
||
any store instruction, breaking multi-threaded `lwarx`/`stwcx.` atomicity under `--parallel`.
|
||
|
||
### PPCBUG-105 — lwa / lwax / lwaux sign-extend to 64 bits; 32-bit-ABI hazard
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1032 (lwa), 1038 (lwax), 1043 (lwaux)
|
||
- **Symptom**: `mem.read_u32(ea) as i32 as i64 as u64` — a word with high bit set (e.g. `0x8000_0000`)
|
||
writes `0xFFFF_FFFF_8000_0000` to rD. ISA-correct for 64-bit-mode `lwa`. In 32-bit ABI, the poisoned
|
||
upper 32 bits produce wrong CA / CR results in downstream 64-bit unsigned compares — same shape as
|
||
the `addis` bug.
|
||
- **Likelihood**: LOW on real Xbox 360 32-bit-ABI binaries (compilers use `lwz` for word loads; `lwa`
|
||
is a 64-bit-mode instruction). Risk elevated if the binary contains 64-bit-mode kernel code.
|
||
- **Note**: Canary also uses `SignExtend(..., INT64_TYPE)` — both are ISA-correct. Pre-pass flagged
|
||
HIGH; audit downgrades to MEDIUM because `lwa` is unlikely in 32-bit-ABI Xbox 360 code.
|
||
|
||
### PPCBUG-106 — lwa no-update-form undocumented (LOW / informational)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1029-1034
|
||
- **Symptom**: `lwa` arm has no RA writeback. Correct per ISA (no `lwau` in PowerISA). Undocumented.
|
||
- **Fix**: add comment `// No lwau in PowerISA; lwa is DS-form non-update only.`
|
||
|
||
### PPCBUG-107 — `invalidate_for_write` never called from stores; lwarx/stwcx. atomicity broken under `--parallel` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: `reservation.rs:234` (definition, never called from interpreter); `interpreter.rs:1182-1278` (all store arms, none call it)
|
||
- **Symptom**: `ReservationTable::invalidate_for_write(addr)` is defined and correctly unit-tested but
|
||
no interpreter store arm calls it. Under M3 `--parallel` with the table enabled, a plain `stw` by
|
||
thread B to a cache line reserved by thread A does NOT clear thread A's table slot. Thread A's
|
||
subsequent `stwcx.` calls `t.try_commit()`, which succeeds — spurious success, violating
|
||
store-conditional atomicity. All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic
|
||
counters) built on `lwarx`/`stwcx.` are broken in multi-threaded mode.
|
||
- **Concrete scenario**: thread A: `lwarx r3, 0, r4` (reserves line). Thread B: `stw r5, 0(r4)`
|
||
(same address; should invalidate). Thread A: `stwcx. r6, 0, r4` → should fail (CR0.EQ=0) but
|
||
succeeds (CR0.EQ=1). Thread A's store silently overwrites thread B's store.
|
||
- **Fix**: in every store arm, before `mem.write_*`, add:
|
||
```rust
|
||
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
|
||
if t.has_active_reservers() { t.invalidate_for_write(ea); }
|
||
}
|
||
```
|
||
`has_active_reservers()` is a single `Relaxed` atomic load — negligible cost for non-atomic code
|
||
(common case returns false immediately). Alternative: inject the table into the memory layer so
|
||
`write_u32`/`write_u64` call it automatically.
|
||
- **Test gap**: add interpreter-level test: `lwarx` reserve a line, intervening `stw` to the same
|
||
line, `stwcx.` must fail (CR0.EQ=0).
|
||
|
||
### PPCBUG-108 — Legacy per-ctx reservation path: cross-thread invalidation impossible (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: interpreter.rs:1148-1153 (stwcx legacy path)
|
||
- **Symptom**: When table is None/disabled, reservation state lives in per-thread `PpcContext` fields.
|
||
A store by thread B cannot clear `ctx_A.has_reservation`. Safe in strict lockstep (one host thread).
|
||
Broken under real parallelism with the table inadvertently disabled.
|
||
- **Fix**: add a `debug_assert!` in `lwarx`/`stwcx.` that table is enabled when multiple host threads
|
||
are active. The M3 scheduler should always enable the table before spawning a second host thread.
|
||
|
||
### PPCBUG-109 — Zero unit tests for lwa / lwax / lwaux
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs test module
|
||
- **Recommended minimum**:
|
||
- `lwa` with `0x8000_0000` → `gpr[rD] == 0xFFFF_FFFF_8000_0000`.
|
||
- `lwa` with `0x7FFF_FFFF` → `gpr[rD] == 0x0000_0000_7FFF_FFFF`.
|
||
- `lwax` with ra=0.
|
||
- `lwaux`: verify loaded value and rA update.
|
||
|
||
### PPCBUG-110 — Zero unit tests for lwbrx
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs test module
|
||
- **Recommended minimum**: memory `[0x11, 0x22, 0x33, 0x44]` at EA → `gpr[rD] == 0x4433_2211`; ra=0;
|
||
assert `gpr[rD] <= 0xFFFF_FFFF`.
|
||
|
||
### PPCBUG-111 — lwarx / stwcx test suite missing key cases
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:5167-5207 (two tests exist)
|
||
- **Missing**: `lwarx` ra=0; `stwcx.` without prior `lwarx` → CR0.EQ=0; second `lwarx` displaces
|
||
first; post-PPCBUG-107-fix store-invalidation test; `lwarx` zero-extension assertion.
|
||
|
||
IDs PPCBUG-112, PPCBUG-113, PPCBUG-114 are unallocated — reserved for group 20 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 4 — load doubleword (group 21)
|
||
|
||
Per-group report: `audit-out/group-21-load-doubleword.md`.
|
||
|
||
Group 21 summary: **cleanest load group audited — zero HIGH bugs.** All six instructions (`ld`,
|
||
`ldu`, `ldux`, `ldx`, `ldbrx`, `ldarx`) are ISA-correct: 64-bit load, big-endian byte order,
|
||
EA computation (RA=0, DS-form, u32 truncation), update-form writebacks, and reservation tracking
|
||
all pass scrutiny against Canary and the ISA spec. `ldbrx`'s double-swap pattern was investigated
|
||
and confirmed correct (PPCBUG-115 informational). One MEDIUM documentation finding, two LOW findings.
|
||
|
||
### PPCBUG-115 — `ldbrx` byte-swap confirmed correct (informational)
|
||
|
||
- **Severity**: LOW (confirmed clean, informational only)
|
||
- **Status**: wontfix
|
||
- **Location**: `interpreter.rs:4157-4159`
|
||
- **Analysis**: `mem.read_u64` uses `u64::from_be_bytes` internally (confirmed in `heap.rs:404`
|
||
and interpreter's `TestMem`), so it returns the BE-decoded value. Calling `.swap_bytes()`
|
||
re-reverses to give the LE interpretation, which is exactly what `ldbrx` specifies. Canary
|
||
achieves the same result by skipping `ByteSwap` at the HIR level. Both approaches are correct.
|
||
See per-group report for full byte-level worked example.
|
||
|
||
### PPCBUG-116 — `ld`/`ldx`/`ldu`/`ldux` as 32-bit-ABI poison sources (documentation)
|
||
|
||
- **Severity**: MEDIUM (awareness/documentation; no change to load instructions themselves)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1017-1058`
|
||
- **Symptom**: These instructions correctly write full 64-bit values to the destination GPR.
|
||
Xbox 360 32-bit-ABI binaries legitimately emit them for TOC loads, vtable loads, and kernel
|
||
structure accesses — all of which may have non-zero upper 32 bits. Until PPCBUG-001..089
|
||
arithmetic truncation fixes land, such values can flow into 64-bit compares and corrupt CA
|
||
bits and CR fields — the inverse of the `addis` bug (pollution from memory side vs. sign-ext).
|
||
- **Key guard already in place**: PPCBUG-007's `subfcx` CA fix truncates operands to u32 before
|
||
the compare, correctly handling `ld`-originated 64-bit values. This is the most critical
|
||
downstream consumer and the fix is already specified.
|
||
|
||
### PPCBUG-117 — Stale frozen snapshot in `ppc-manual/memory/ldarx.md`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `ppc-manual/memory/ldarx.md` (frozen snapshot section)
|
||
- **Symptom**: Snapshot uses old field name `ctx.reserved_addr`; live code uses
|
||
`ctx.reserved_line = ea & !RESERVATION_MASK` (M3 refactor). Cosmetic only.
|
||
- **Fix**: Regenerate snapshot after M3 field names settle.
|
||
|
||
### PPCBUG-118 — Zero functional tests for `ld`, `ldx`, `ldu`, `ldux`, `ldbrx`
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: `test_ldarx_stdcx_pair` covers `ldarx`/`stdcx` only. Five doubleword load
|
||
variants are untested. Recommended minimum: `ld` with positive DS, negative DS, and RA=0;
|
||
`ldx` basic; `ldu` with RA writeback check; `ldux` with RA writeback check; `ldbrx` with
|
||
asymmetric data to distinguish output from plain `ldx`.
|
||
|
||
IDs PPCBUG-119 through PPCBUG-122 are unallocated — reserved for group 21 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 4 — load multiple/string (group 22)
|
||
|
||
Per-group report: `audit-out/group-22-load-mlsr.md`.
|
||
|
||
Group 22 summary: one structural HIGH bug (`lswx` is always a no-op due to missing XER TBC field),
|
||
one MEDIUM coupling bug (the write path discards TBC on `mtspr XER`), one MEDIUM ISA-form deviation
|
||
(`lmw` does not skip RA-in-range stores unlike Canary), and two LOW findings. The `lswi` body itself
|
||
is correct; `lmw` core logic (loop bound, zero-extension, byte-packing, register wraparound) is clean.
|
||
Zero unit tests across all three opcodes.
|
||
|
||
### PPCBUG-123 — `lswx` XER TBC field not modeled; always loads 0 bytes
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: `context.rs:235-237` (`xer()` method) + `interpreter.rs:4172`
|
||
- **Symptom**: `ctx.xer()` assembles only SO[31], OV[30], CA[29] — bits 0–28 are always zero.
|
||
`lswx` reads `ctx.xer() & 0x7F` expecting the XER TBC byte-count field at bits 0–6, but always
|
||
gets 0. The `while bytes_left > 0` loop never executes; **`lswx` is permanently a no-op** —
|
||
no bytes are loaded, no destination registers are written. The companion `stswx` at
|
||
`interpreter.rs:4191` has the identical pattern and is equally broken.
|
||
- **Root cause**: `PpcContext` has no `xer_tbc` field. Neither `xer()` nor `set_xer()` model
|
||
XER[25:31]. Any `mtspr XER, rN` that sets a non-zero byte count silently discards it (PPCBUG-124).
|
||
- **Cross-reference**: Canary marks `lswx` as `XEINSTRNOTIMPLEMENTED()` — xenia-rs implemented the
|
||
body but left the XER infrastructure incomplete.
|
||
- **Fix**:
|
||
1. Add `pub xer_tbc: u8` to `PpcContext`.
|
||
2. In `xer()`: `| (self.xer_tbc as u32)` for bits 0–6.
|
||
3. In `set_xer()`: `self.xer_tbc = (val & 0x7F) as u8`.
|
||
The `lswx` body is then correct as-is.
|
||
- **Test gap**: zero unit tests. After fix: `mtspr XER, r3` (r3=4) then `lswx r5, 0, r4` should
|
||
write exactly 4 bytes into r5 (high byte = first byte at EA).
|
||
|
||
### PPCBUG-124 — `set_xer()` discards TBC on `mtspr XER` (structural coupling to PPCBUG-123)
|
||
|
||
- **Severity**: MEDIUM (must land with PPCBUG-123)
|
||
- **Status**: open
|
||
- **Location**: `context.rs:239-244`
|
||
- **Symptom**: `set_xer()` writes only SO/OV/CA from the 32-bit value, silently discarding bits 0–28
|
||
(including the 7-bit TBC field). Any guest `mtspr XER, rN` with a non-zero byte count loses that
|
||
count; subsequent `lswx`/`stswx` see TBC=0. Fix is the same three-line change as PPCBUG-123.
|
||
|
||
### PPCBUG-125 — `lmw` missing RA-in-destination-range skip
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1515`
|
||
- **Symptom**: PowerISA declares `lmw rT, D(rA)` invalid when `rA` is in `[rT..31]`. Canary skips
|
||
the store to `rA` in that case (`if (i.D.RT + j == i.D.RA) continue`). xenia-rs pre-computes EA
|
||
before the loop (so EA values remain correct), but overwrites `rA` with the loaded word instead of
|
||
preserving it. Result differs from Canary for this invalid encoding. Any program that relies on RA
|
||
surviving a nominally invalid `lmw` will see the wrong value.
|
||
- **Fix**:
|
||
```rust
|
||
for r in instr.rd()..32 {
|
||
if r == instr.ra() { ea = ea.wrapping_add(4); continue; }
|
||
ctx.gpr[r] = mem.read_u32(ea as u32) as u64;
|
||
ea = ea.wrapping_add(4);
|
||
}
|
||
```
|
||
- **Test gap**: zero tests. Add: `lmw r28, 0(r28)` (RA=RT=28) — after fix, gpr[28] unchanged.
|
||
|
||
### PPCBUG-126 — `lswi` uses `instr.rb()` instead of `instr.nb()` for the NB field
|
||
|
||
- **Severity**: LOW (maintenance hazard, not a correctness bug)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1340`
|
||
- **Symptom**: `instr.rb()` and `instr.nb()` both extract bits 16–20 and return identical values.
|
||
Using `rb()` misrepresents the operand as a register reference rather than a 5-bit immediate count.
|
||
The companion `stswi` at line 1359 has the same pattern. A future `rb()` type-system refactor
|
||
could break `lswi`/`stswi` silently.
|
||
- **Fix**: `instr.nb()` at both sites.
|
||
|
||
### PPCBUG-127 — Zero execution tests for lmw, lswi, lswx
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: No `#[test]` exists for any of the three opcodes. A regression in loop bounds,
|
||
byte-packing, EA computation, or the NB=0 special case would go undetected.
|
||
- **Recommended minimum**: `lmw r30, 0(r1)` (2-word load); `lswi r3, r4, 8` (2-word byte pack);
|
||
`lswi r31, r4, 8` (register wraparound → r31 and r0); `lswi r3, r4, 0` (NB=0→32 special case);
|
||
post-PPCBUG-123 fix: `lswx` with XER TBC=4 (1-word load), TBC=0 (no-op), TBC=5 (partial word).
|
||
|
||
---
|
||
|
||
## Batch 5 — store byte/halfword (group 24)
|
||
|
||
Per-group report: `audit-out/group-24-store-byte-half.md`.
|
||
|
||
Group 24 summary: **3 findings: 1 HIGH (cross-cutting reservation invalidation), 1 LOW/informational
|
||
(update-form zero-extension correct but undocumented), 1 LOW (zero test coverage).** EA computation,
|
||
value truncation (`as u8`, `as u16`), RA=0 special cases, update-form writeback zero-extension,
|
||
big-endian `mem.write_u16` path, and `sthbrx` byte-reverse logic are all ISA-correct. The single
|
||
HIGH finding is the systemic absence of `invalidate_for_write` calls — same class as PPCBUG-107,
|
||
now documented for all 9 byte/halfword store opcodes.
|
||
|
||
### PPCBUG-130 — All 9 store-byte/halfword opcodes missing `invalidate_for_write` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: `interpreter.rs:1207` (stb), `1213` (stbu), `1219` (stbx), `1225` (stbux),
|
||
`1231` (sth), `1237` (sthu), `1243` (sthx), `1249` (sthux), `1337` (sthbrx)
|
||
- **Class**: same root cause as PPCBUG-107 (stw/stdcx family — `invalidate_for_write` never called
|
||
from any store arm).
|
||
- **Symptom**: Under `--parallel`, a `stb`, `sth`, or `sthbrx` (or any variant in this group) to a
|
||
cache line reserved by another thread via `lwarx`/`ldarx` does NOT clear the table slot.
|
||
The reserving thread's subsequent `stwcx.`/`stdcx.` spuriously succeeds even though an
|
||
intervening sub-word store has modified the line — violating store-conditional atomicity. Affects
|
||
any lock-free protocol that uses byte or halfword stores adjacent to or inside a `lwarx`/`stwcx.`
|
||
loop (e.g. byte-level spinlocks, tagged-pointer updates, audio ring-buffer flags).
|
||
- **Fix** (per PPCBUG-107 pattern): before each `mem.write_u8/u16`, add:
|
||
```rust
|
||
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
|
||
if t.has_active_reservers() { t.invalidate_for_write(ea); }
|
||
}
|
||
```
|
||
- **Note**: PPCBUG-107 is the canonical parent finding. PPCBUG-130 documents that the byte/halfword
|
||
group must be included in the same fix sweep.
|
||
|
||
### PPCBUG-131 — Update-form rA zero-extension correct but undocumented (LOW / informational)
|
||
|
||
- **Severity**: LOW (informational — behavior is correct)
|
||
- **Status**: open (documentation gap)
|
||
- **Locations**: `interpreter.rs:1216` (stbu), `1228` (stbux), `1240` (sthu), `1252` (sthux)
|
||
- **Symptom**: Each update-form arm writes `ctx.gpr[instr.ra()] = ea as u64` where `ea: u32`.
|
||
This zero-extends to 64 bits — correct in the 32-bit ABI (addresses are 32-bit; upper half must
|
||
be zero). No bug, but there is no comment explaining the deliberate zero-extension. A maintainer
|
||
who computes EA as `u64` throughout and drops the `as u32` intermediate would silently
|
||
sign-extend negative displacements into rA, mirroring the `addis` bug shape.
|
||
- **Fix**: add comment `// EA is u32; zero-extend into rA (32-bit ABI: upper 32 bits must be 0).`
|
||
at each update-form writeback line.
|
||
|
||
### PPCBUG-132 — Zero unit tests for all 9 store-byte/halfword opcodes (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: No `test_stb*` or `test_sth*` functions exist. Any regression in EA computation,
|
||
value truncation, update-form writeback order, or `sthbrx` byte-swap logic would be invisible.
|
||
- **Recommended minimum**: `stb` basic + ra=0; `stbu`/`stbux` with rA writeback check; `stbx`
|
||
ra=0; `sth` big-endian byte check (`0xDEAD` → `[0xDE, 0xAD]`); `sthu`/`sthux` writeback;
|
||
`sthbrx` byte-reversed check (`0xDEAD` → `[0xAD, 0xDE]`); post-PPCBUG-130 fix: `lwarx` + `stb`
|
||
to same line + `stwcx.` → CR0.EQ=0.
|
||
|
||
IDs PPCBUG-133 through PPCBUG-139 are unallocated — reserved for group 24 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 5 — store word (group 25)
|
||
|
||
Per-group report: `audit-out/group-25-store-word.md`.
|
||
|
||
Group 25 summary: **8 findings: 4 HIGH (reservation invalidation per opcode), 0 MEDIUM, 4 LOW.**
|
||
Core arithmetic and semantics are entirely clean for all 6 opcodes. EA computation (RA=0 guards,
|
||
simm16 sign-extend, u32 truncation), value truncation (`as u32`), update-form writebacks
|
||
(`ea as u64` zero-extension), big-endian `mem.write_u32`, `stwbrx` byte-reversal, and `stwcx`
|
||
conditional-store logic (cache-line reservation check, CAS, CR0 update, reservation always
|
||
cleared) all match the ISA and Canary exactly. The `stwcx` manual snapshot is stale (uses old
|
||
`reserved_addr` field name; live code correctly uses `reserved_line` at cache-line granularity —
|
||
actually MORE correct than the snapshot). Dominant finding is the same systemic miss as PPCBUG-107
|
||
and PPCBUG-130: `invalidate_for_write` is never called from any plain store arm.
|
||
|
||
### PPCBUG-140 — stw: missing `invalidate_for_write` call (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1183-1188`
|
||
- **Systemic root cause**: PPCBUG-107
|
||
- **Symptom**: Under `--parallel` with the ReservationTable enabled, a plain `stw` by thread B
|
||
to a cache line reserved by thread A does not clear thread A's table slot. Thread A's
|
||
subsequent `stwcx.` spuriously succeeds (CR0.EQ=1) even though thread B has written the line.
|
||
All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic counters) built on
|
||
`lwarx`/`stwcx.` are broken in multi-threaded mode. `stw` is the most common store instruction —
|
||
every stack write, pointer store, and integer field write is affected.
|
||
- **Fix**: Before `mem.write_u32(ea, ...)`:
|
||
```rust
|
||
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
|
||
if t.has_active_reservers() { t.invalidate_for_write(ea); }
|
||
}
|
||
```
|
||
`has_active_reservers()` is a single `Relaxed` load — zero cost in the common non-atomic case.
|
||
|
||
### PPCBUG-141 — stwu: missing `invalidate_for_write` call (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1189-1194`
|
||
- **Systemic root cause**: PPCBUG-107
|
||
- **Symptom**: Same class as PPCBUG-140. `stwu r1, -N(r1)` is the canonical function-prologue
|
||
stack-allocation idiom emitted by every compiled function. A thread holding a reservation on
|
||
the stack region would see spurious `stwcx.` success after any prologue store.
|
||
- **Fix**: Same pattern as PPCBUG-140, inserted before `mem.write_u32`.
|
||
|
||
### PPCBUG-142 — stwx: missing `invalidate_for_write` call (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1195-1200`
|
||
- **Systemic root cause**: PPCBUG-107
|
||
- **Symptom**: Same class as PPCBUG-140. `stwx` is the indexed store used for array writes and
|
||
indirect dereferences — common in loops that may run concurrently with reservation holders.
|
||
- **Fix**: Same pattern as PPCBUG-140.
|
||
|
||
### PPCBUG-143 — stwux: missing `invalidate_for_write` call (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1201-1206`
|
||
- **Systemic root cause**: PPCBUG-107
|
||
- **Symptom**: Same class as PPCBUG-140. Less common than stw/stwu but still a plain store
|
||
that must participate in reservation invalidation.
|
||
- **Fix**: Same pattern as PPCBUG-140.
|
||
|
||
### PPCBUG-144 — stwbrx: missing `invalidate_for_write` call (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1568-1573`
|
||
- **Systemic root cause**: PPCBUG-107
|
||
- **Symptom**: Same class as PPCBUG-140. Byte-reversed stores (used for LE-payload GPU command
|
||
buffers, file format fields) are still plain stores with respect to the reservation protocol.
|
||
- **Fix**: Same pattern as PPCBUG-140. `ea` is already a `u32` at this point (line 1570).
|
||
|
||
### PPCBUG-145 — stwcx: stale manual snapshot uses `reserved_addr` (LOW)
|
||
|
||
- **Severity**: LOW (documentation only; live code is correct)
|
||
- **Status**: open
|
||
- **Location**: `ppc-manual/memory/stwcx.md` (frozen snapshot section)
|
||
- **Symptom**: The frozen snapshot shows `ctx.reserved_addr == ea` (exact-word comparison).
|
||
The live code at `interpreter.rs:1137-1153` uses `ctx.reserved_line == line` where
|
||
`line = ea & !RESERVATION_MASK` (cache-line comparison). The live code is MORE correct per
|
||
ISA (PowerISA 2.07B defines reservation at cache-line granularity). Snapshot reflects an
|
||
earlier implementation before M3 introduced `RESERVATION_MASK` and `reserved_line`.
|
||
Tests confirm live behavior is correct (`stwcx_succeeds_within_same_cache_line`).
|
||
- **Fix**: Regenerate the `stwcx.md` snapshot to show current field names and add a note on
|
||
the ISA cache-line granule.
|
||
|
||
### PPCBUG-146 — Zero unit tests for stwu / stwx / stwux / stwbrx (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: Four of the six group-25 opcodes have zero dedicated unit tests.
|
||
- **Recommended minimum**:
|
||
- `stwu r3, -8(r1)`: verify memory at `r1-8` and `gpr[1]` updated to `old_r1 - 8`.
|
||
- `stwx ra=0`: store at `gpr[rb]`, verify memory and no RA writeback.
|
||
- `stwux`: indexed update — verify store and RA writeback.
|
||
- `stwbrx 0x11223344`: bytes at EA should be `[0x44, 0x33, 0x22, 0x11]`.
|
||
|
||
### PPCBUG-147 — stwcx test suite missing key cases (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:5167-5208` (two existing tests)
|
||
- **Missing**:
|
||
- `stwcx.` without prior `lwarx` → CR0.EQ=0, memory not written.
|
||
- Post-PPCBUG-140-fix: `lwarx` then `stw` to same line then `stwcx.` → CR0.EQ=0.
|
||
- RA=0 form: `stwcx. rS, 0, rB`.
|
||
- Explicit memory check on failure path (assert memory unchanged).
|
||
|
||
IDs PPCBUG-148 and PPCBUG-149 are unallocated — reserved for group 25 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 5 (continued) — store multiple/string (group 27)
|
||
|
||
Per-group report: `audit-out/group-27-store-mlsr.md`.
|
||
|
||
Group 27 summary: **5 findings: 2 HIGH, 1 MEDIUM, 2 LOW.** `stswx` is a permanent no-op (identical
|
||
root cause as PPCBUG-123 for `lswx` — XER TBC field not modeled; fixed as side effect of
|
||
PPCBUG-123/124). `stmw`, `stswi`, and `stswx` all omit `invalidate_for_write`, aggravated vs.
|
||
single-word stores because a single `stmw` can dirty multiple cache lines. `stswi` uses `instr.rb()`
|
||
instead of `instr.nb()` (maintenance hazard, same shape as PPCBUG-126 for `lswi`). Zero unit tests
|
||
across all three opcodes.
|
||
|
||
### PPCBUG-160 — stmw, stswi, stswx missing `invalidate_for_write`; multi-line atomicity exposure (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: `interpreter.rs:1521` (stmw), `interpreter.rs:1357` (stswi), `interpreter.rs:4189` (stswx)
|
||
- **Extends**: PPCBUG-107. The prior stated range `1182-1278` does not cover these three arms.
|
||
Multi-word instructions (stmw up to 128 bytes = 2 lines; stswx up to 127 bytes = ~2 lines) make
|
||
the probability of missing a reservation invalidation much higher than single-word stores.
|
||
- **Symptom**: thread B's `stmw` saves 18+ non-volatile registers across two cache lines. Thread A's
|
||
`lwarx` reservation on the second line is not cleared. Thread A's `stwcx.` spuriously succeeds.
|
||
Because `stmw` is the ABI-standard non-volatile register save, this is triggered constantly in
|
||
function prologues — any lock-free primitive inside a prologue/epilogue window is at risk.
|
||
- **Fix** (same pattern as PPCBUG-107): before each `mem.write_u32`/`mem.write_u8` call, add the
|
||
`invalidate_for_write` guard. See group-27 report for per-opcode code snippets.
|
||
- **Test gap**: `lwarx` reserve a line, `stmw` across that line, `stwcx.` must return CR0.EQ=0.
|
||
|
||
### PPCBUG-161 — `stswx` is a permanent no-op: XER TBC not modeled (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:4189` (`stswx` arm) + `context.rs:235-243` (`xer()`/`set_xer()`)
|
||
- **Companion**: PPCBUG-123 (lswx), PPCBUG-124 (mtspr XER). This finding covers the store side.
|
||
- **Symptom**: `ctx.xer() & 0x7F` always returns 0 (no `xer_tbc` field). `stswx` unconditionally
|
||
stores zero bytes. The byte-loop body is otherwise correct and requires no further changes.
|
||
- **Fix**: same three-line fix as PPCBUG-123 (add `xer_tbc: u8` to `PpcContext`; update `xer()`
|
||
and `set_xer()`). The `stswx` body is correct once TBC is live.
|
||
- **Test gap**: `mtspr XER` (TBC=5) + `stswx r3, 0, r4` → 5 bytes written big-endian.
|
||
|
||
### PPCBUG-162 — `stswi` uses `instr.rb()` instead of `instr.nb()` for NB field (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM (maintenance hazard; not a runtime correctness bug today)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1359`
|
||
- **Companion**: PPCBUG-126 (`lswi` identical pattern at line 1340).
|
||
- **Symptom**: `instr.rb()` and `instr.nb()` extract the same bits 16-20, so values are equal now.
|
||
If `rb()` is ever given a newtype wrapper (e.g. `RegIdx`) to enforce register semantics, the cast
|
||
`instr.rb() as u32` will either fail or yield wrong semantics — silently treating a register index
|
||
as a byte count.
|
||
- **Fix**: `let nb = if instr.nb() == 0 { 32 } else { instr.nb() };`
|
||
|
||
### PPCBUG-163 — Zero unit tests for stmw, stswi, stswx (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: No `#[test]` exists for any of the three opcodes. Regressions in loop bounds, byte
|
||
order, EA computation, NB=0 handling, or register wraparound are invisible.
|
||
- **Recommended minimum**: stmw 2-word and 32-word cases; stswi 4-byte / 0 to 32 / wraparound /
|
||
partial; stswx (post PPCBUG-123 fix) TBC=4, TBC=0, TBC=5. See group-27 report for full list.
|
||
|
||
ID PPCBUG-164 is unallocated — reserved for group 27 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 5 (continued) — store doubleword (group 26)
|
||
|
||
Per-group report: `audit-out/group-26-store-doubleword.md`.
|
||
|
||
Group 26 summary: **0 HIGH, 2 MEDIUM, 2 LOW.** The core semantics of all six opcodes are
|
||
ISA-correct: `ds()` decoder extracts the DS-form displacement correctly; `mem.write_u64` handles
|
||
big-endian byte ordering; update-form writebacks are zero-extended and in the right order; `stdcx.`
|
||
CR0 encoding, reservation check, and table-path interaction all match the ISA. `stdbrx` correctly
|
||
applies `swap_bytes()`. No 32-bit writeback truncation issues (these are store ops, not ALU ops).
|
||
Two MEDIUM findings: (1) PPCBUG-150 extends PPCBUG-107 to the doubleword stores (same gap —
|
||
`invalidate_for_write` never called); (2) PPCBUG-151 identifies that `stwcx.` and `stdcx.` share
|
||
the same reservation slot without a width discriminator, allowing a `lwarx`+`stdcx.` or
|
||
`ldarx`+`stwcx.` cross-pair to succeed when it should fail. Four IDs used (PPCBUG-150..153).
|
||
|
||
### PPCBUG-150 — `std`/`stdu`/`stdx`/`stdux`/`stdbrx` do not call `invalidate_for_write` (scope extension of PPCBUG-107)
|
||
|
||
- **Severity**: MEDIUM (same classification as PPCBUG-107)
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**:
|
||
- `interpreter.rs:1258` (`std`)
|
||
- `interpreter.rs:1264` (`stdx`)
|
||
- `interpreter.rs:1269` (`stdu`)
|
||
- `interpreter.rs:1275` (`stdux`)
|
||
- `interpreter.rs:4163` (`stdbrx`)
|
||
- **Symptom**: When `--parallel` is active and the `ReservationTable` is enabled, any of these
|
||
five stores to an address another HW thread has reserved via `ldarx` will NOT invalidate that
|
||
thread's reservation. The `ldarx`-holding thread's `stdcx.` can subsequently succeed even though
|
||
the memory was overwritten — a classic LL/SC ABA gap. Fix session for PPCBUG-107 must include
|
||
these five sites.
|
||
- **Fix**: in each arm, after `mem.write_u64(ea, ...)`, add:
|
||
```rust
|
||
if let Some(t) = &ctx.reservation_table {
|
||
if t.has_active_reservers() { t.invalidate_for_write(ea); }
|
||
}
|
||
```
|
||
|
||
### PPCBUG-151 — `stdcx.`/`stwcx.` reservation width not discriminated: cross-width pair silently succeeds
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:4119-4155` (`stdcx`) vs `interpreter.rs:1134-1180` (`stwcx`)
|
||
- **Symptom**: Both `stwcx.` and `stdcx.` match reservations using only `(has_reservation,
|
||
reserved_line)`. A `lwarx` reservation can be spuriously committed by `stdcx.`, or a `ldarx`
|
||
reservation by `stwcx.`, as long as the cache line matches. The ISA requires pairing — `lwarx`
|
||
must be committed by `stwcx.`, and `ldarx` by `stdcx.`. Cross-width commit reads the wrong width
|
||
from memory and writes back the wrong width, with no failure indication (CR0.EQ=1).
|
||
- **Fix**: add a `reservation_width: u8` field (4 or 8) to `PpcContext`. `stwcx.` requires
|
||
`reservation_width==4`; `stdcx.` requires `reservation_width==8`. In the table path, pack the
|
||
1-bit width flag into one of the spare bits of the 64-bit slot (bits 39–32 are always zero for
|
||
line addresses in the 32-bit guest address space).
|
||
|
||
### PPCBUG-152 — `stdu`/`stdux` no invalid-form guard for RS==RA (LOW)
|
||
|
||
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1267-1278`
|
||
- **Symptom**: When `RA==RS`, the store writes the original RS value, then RA (==RS) is
|
||
overwritten with EA, destroying the source. ISA marks this invalid-form. Consistent with
|
||
policy of other update-form stores in groups 18-22.
|
||
- **Fix**: `debug_assert!(instr.ra() != 0 && instr.ra() != instr.rs())` in debug builds.
|
||
|
||
### PPCBUG-153 — Zero unit tests for std/stdu/stdx/stdux/stdbrx; stdcx. happy-path only (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` test module (only `test_ldarx_stdcx_pair` at line 4629)
|
||
- **Missing coverage**: `std` with negative DS; `std` with RA=0; `stdu` update writeback; `stdx`
|
||
with RA=0; `stdux` indexed update; `stdbrx` byte-reversed output; `stdcx.` failure path (no
|
||
prior reservation or EA mismatch); `stdcx.` `has_reservation` cleared on failure.
|
||
- **Recommended minimum**: 6 tests — see per-group report for encodings.
|
||
|
||
IDs PPCBUG-154 through PPCBUG-159 are unallocated — reserved for group 26 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 5 (continued) — store float (group 28)
|
||
|
||
Per-group report: `audit-out/group-28-store-float.md`.
|
||
|
||
Group 28 summary: **7 findings: 3 HIGH, 1 MEDIUM, 3 LOW.** EA computation, endianness, update-form
|
||
writebacks, and `stfiwx` integer-word extraction are all correct. Critical bugs: (1) `stfs*` never
|
||
raises FPSCR exception bits (VXSNAN, XX, OX, UX) required by PowerISA for double→single narrowing;
|
||
(2) `stfs*` ignores FPSCR.RN rounding mode, always using round-to-nearest-even; (3) all 9 FP store
|
||
arms omit `invalidate_for_write` (same class as PPCBUG-107). The `stfd*` family and `stfiwx` are
|
||
clean (bit-pattern stores with no conversion). Zero unit tests across all 9 opcodes.
|
||
**7 IDs used (PPCBUG-165..171). 3 IDs unallocated (PPCBUG-172..174).**
|
||
|
||
### PPCBUG-165 — stfs* does not raise FPSCR exception bits (VXSNAN, XX, OX, UX)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux)
|
||
- **Symptom**: PowerISA requires that `stfs` double→single narrowing raises FPSCR[VXSNAN] for SNaN
|
||
input, FPSCR[OX] on overflow to ±∞, FPSCR[UX] on underflow to ±0/denormal, and FPSCR[XX] when the
|
||
result is inexact. None of these bits are ever set. The narrowing is done via `ctx.fpr[instr.rs()] as f32`
|
||
(x86 `CVTSD2SS`); no FPSCR inspection or update follows. Games that poll FPSCR[OX] to detect
|
||
overflow (physics engines clamping large velocities), or FPSCR[VXSNAN] after sentinel SNaN writes,
|
||
get false negatives.
|
||
- **Canary parity**: Canary also omits these FPSCR updates for `stfs*`. Both share the deviation.
|
||
- **Fix**: after the narrowing, check `fpscr::is_snan(src)` → set `VXSNAN`; compare source vs.
|
||
f64 round-trip of narrowed value for inexact; compare src.is_finite() && f32.is_infinite() for
|
||
overflow. See group-28 report for illustrative code sketch.
|
||
|
||
### PPCBUG-166 — stfs* ignores FPSCR.RN; always uses round-to-nearest-even
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1284, 1289, 1296, 1301
|
||
- **Symptom**: `ctx.fpr[instr.rs()] as f32` uses the host MXCSR rounding mode, never consulting
|
||
`ctx.fpscr & fpscr::RN_MASK`. Any game that configures FPSCR.RN to truncate/ceil/floor and then
|
||
stores via `stfs` gets the wrong f32 in memory (wrong by at most 1 ULP). The stfs.md spec
|
||
explicitly acknowledges this gap.
|
||
- **Canary parity**: Canary also ignores FPSCR.RN for stfs. Both share the deviation.
|
||
- **Fix**: read `ctx.fpscr & fpscr::RN_MASK` and set host MXCSR before narrowing, then restore.
|
||
Minimum viable: `debug_assert_eq!(ctx.fpscr & fpscr::RN_MASK, 0)` for debug-build visibility.
|
||
|
||
### PPCBUG-167 — All 9 FP store arms missing `invalidate_for_write` (PPCBUG-107 class)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux),
|
||
1308 (stfd), 1313 (stfdu), 1320 (stfdx), 1325 (stfdux), 1333 (stfiwx)
|
||
- **Symptom**: Same class as PPCBUG-107. Under M3 `--parallel`, a FP store by thread B to a
|
||
cache line reserved by thread A via `lwarx` does not clear thread A's reservation table slot.
|
||
Thread A's subsequent `stwcx.` spuriously succeeds. Rendering workers using FP stores to shared
|
||
transform/particle buffers co-located with spinlock sites are at risk.
|
||
- **Fix**: before each `mem.write_f32`/`write_f64`/`write_u32` in every FP store arm:
|
||
```rust
|
||
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
|
||
if t.has_active_reservers() { t.invalidate_for_write(ea); }
|
||
}
|
||
```
|
||
Recommend a single sweep of all store groups (PPCBUG-107, 130, 160, 167) to avoid further drift.
|
||
|
||
### PPCBUG-168 — stfs* SNaN narrowing: `as f32` quietens SNaN without raising FPSCR.VXSNAN
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1284, 1289, 1296, 1301
|
||
- **Symptom**: When FRS holds an f64 SNaN (bit 51 = 0), `CVTSD2SS` sets the f32 quiet bit (bit 22),
|
||
producing a QNaN in memory, without raising FPSCR[VXSNAN]. The stored memory bytes are correct per
|
||
IEEE-754 (narrowing an SNaN produces a QNaN). The bug is the missing FPSCR signal, a subset of
|
||
PPCBUG-165. **Contrast with PPCBUG-128** (lfs stores wrong FPR bits — HIGH severity; here memory
|
||
bytes are right, only the flag is missing).
|
||
- **Note**: fixed as a side effect of the PPCBUG-165 fix. No independent code change needed.
|
||
|
||
### PPCBUG-169 — stfd* bit-pattern store: confirmed correct (informational)
|
||
|
||
- **Severity**: LOW (confirmed clean, informational)
|
||
- **Status**: wontfix
|
||
- **Locations**: interpreter.rs:1305, 1311, 1317, 1323
|
||
- **Analysis**: `write_f64(ea, fpr)` → `write_u64(ea, fpr.to_bits())` → `val.to_be_bytes()`. Pure
|
||
bit-pattern, correct big-endian. SNaN preserved. EA computation and update-form writebacks all
|
||
correct. Canary parity confirmed. No bugs.
|
||
|
||
### PPCBUG-170 — stfiwx: confirmed correct (informational)
|
||
|
||
- **Severity**: LOW (confirmed clean, informational)
|
||
- **Status**: wontfix
|
||
- **Location**: interpreter.rs:1329-1335
|
||
- **Analysis**: `write_u32(ea, fpr.to_bits() as u32)` correctly extracts the low 32 bits of the
|
||
64-bit FPR as a raw bit pattern (the integer word produced by `fctiw`/`fctiwz`) and stores
|
||
big-endian. RA=0 handled correctly. No FPSCR effects required. Canary parity confirmed. No bugs.
|
||
|
||
### PPCBUG-171 — Zero unit tests for all 9 store-float opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs test module
|
||
- **Symptom**: No `#[test]` covers any of the 9 FP store arms. Regressions in EA computation,
|
||
endianness, update-form writeback order, or double→single narrowing are invisible.
|
||
- **Recommended minimum** (10 tests): `stfd` normal + SNaN bit-exact; `stfdu` update writeback;
|
||
`stfs` round-trip (1.0); `stfs` overflow (→ ±∞); `stfsx` ra=0; `stfsux` update; `stfiwx` integer
|
||
word extract; post-PPCBUG-165 fix: SNaN → FPSCR.VXSNAN set; post-PPCBUG-166 fix: RN=truncate.
|
||
|
||
IDs PPCBUG-172 through PPCBUG-174 are unallocated — reserved for group 28 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 6 — FPU single-precision (group 29)
|
||
|
||
Per-group report: `audit-out/group-29-fpu-single.md`.
|
||
|
||
**Context**: The live implementation is substantially more capable than the frozen ppc-manual
|
||
snapshots indicated. `to_single()` correctly dispatches on FPSCR.RN; `check_invalid_*` helpers
|
||
correctly set VXSNAN, VXISI, VXIMZ, VXZDZ, VXIDI, ZX; `update_after_op` sets OX, UX, and
|
||
FPRF. The remaining bugs are: (1) XX/FI/FR (inexact) never set anywhere; (2) fmadd/fmsub
|
||
*sx variants missing the VXISI check for the add-phase infinity collision (their *x double
|
||
siblings have the same gap); (3) fnmadd/fnmsub NaN sign bit incorrectly flipped by Rust `-`;
|
||
(4) fresx produces a full IEEE 1/b instead of the ~12-bit hardware estimate; (5) FPSCR.NI
|
||
flush-to-zero not modelled; (6) SNaN→QNaN propagation relies on host SSE behavior rather than
|
||
the ISA-canonical derivation.
|
||
|
||
**8 IDs used (PPCBUG-180..187). 12 IDs unallocated (PPCBUG-188..199).**
|
||
|
||
### PPCBUG-180 — XX / FI / FR bits never set across all FPU *sx opcodes (and double siblings)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: `fpscr.rs:184-194` (`update_after_op`); affects interpreter.rs:2252-2494
|
||
- **Symptom**: `FPSCR[XX]` (inexact) should be set whenever the mathematical result of an
|
||
FP operation cannot be represented exactly in the destination format (single or double) and
|
||
a rounding step occurs. `FPSCR[FI]` (fraction inexact) and `FPSCR[FR]` (fraction rounded)
|
||
encode the direction. `update_after_op` sets `OX` (overflow to ±∞) and `UX` (subnormal
|
||
result) but has no inexact-detection logic. Since most `*sx` operations on arbitrary inputs
|
||
require rounding to single precision, XX is almost always wrong (false zero). Games using
|
||
FPSCR polling to check exactness receive false "exact" results.
|
||
- **Canary parity**: Canary's `UpdateFPSCR` also does not set XX/FI/FR. Both share this gap.
|
||
- **Fix**: In `update_after_op` (or a post-`to_single` helper), compare the pre-round f64
|
||
result with the post-round f64 result. If they differ, set `XX`; inspect the difference sign
|
||
to set `FR`; set `FI = FR || (result was not exactly representable)`.
|
||
|
||
### PPCBUG-181 — fmaddsx / fnmaddsx missing VXISI check for add-phase ±∞ collision
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:2339-2348 (fmaddsx), 2383-2392 (fnmaddsx)
|
||
- **Symptom**: When `FRA × FRC = +∞` and `FRB = -∞` (or vice versa), PowerISA §4.3.4
|
||
requires `FPSCR[VXISI]` to be set and the result to be a QNaN. The double-precision sibling
|
||
`fmaddx` (line 2327) correctly calls `fpscr::check_invalid_add(ctx, a * c, b, false)` after
|
||
the multiply-check. `fmaddsx` omits this call entirely — only `check_invalid_mul` runs.
|
||
Games using fused-madd in dot-product accumulators that might overflow to ±∞ (e.g. lighting
|
||
accumulators with very large normals) lose the VXISI signal.
|
||
- **Fix**:
|
||
```rust
|
||
// inside fmaddsx arm, after check_invalid_mul:
|
||
fpscr::check_invalid_add(ctx, a * c, b, false);
|
||
```
|
||
Same for fnmaddsx (same operand pair, same `false` sense for the add).
|
||
|
||
### PPCBUG-182 — fmsubsx / fnmsubsx missing VXISI check for subtract-phase ±∞ collision
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:2361-2370 (fmsubsx), 2405-2414 (fnmsubsx)
|
||
- **Symptom**: When `FRA × FRC = ±∞` and `FRB = ±∞` with the same sign, `(±∞) − (±∞)`
|
||
should fire `FPSCR[VXISI]`. Neither `fmsubsx` nor `fnmsubsx` calls `check_invalid_add`.
|
||
- **Fix**:
|
||
```rust
|
||
// inside fmsubsx arm, after check_invalid_mul:
|
||
fpscr::check_invalid_add(ctx, a * c, -b, false);
|
||
```
|
||
Same for fnmsubsx. The negated `b` turns the subtract into the add-form so that
|
||
`check_invalid_add(..., false)` uses the correct infinity-sign comparison.
|
||
|
||
### PPCBUG-183 — fnmaddsx / fnmsubsx NaN sign bit incorrectly flipped by Rust unary `-`
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:2388 (fnmaddsx), 2410 (fnmsubsx)
|
||
- **Symptom**: `to_single(ctx, -(a.mul_add(c, b)))` — Rust's unary `-f64` always flips the
|
||
IEEE sign bit, including when the value is NaN. PowerISA §4.3.2 specifies that the final
|
||
negation in `fnmadd`/`fnmsub` is NOT applied to a QNaN result: if the fused computation
|
||
yields a NaN (due to SNaN input, VXIMZ, or VXISI), the negation is skipped and the NaN is
|
||
propagated with its canonical sign unchanged. xenia-rs flips the sign bit of any NaN result,
|
||
producing a QNaN with the wrong sign. Observable by storing via `stfd` and inspecting bits.
|
||
Games using sign-bit NaN tagging (e.g. `0xFFC00000` vs `0x7FC00000` as distinct sentinels)
|
||
are affected.
|
||
- **Fix**:
|
||
```rust
|
||
// fnmaddsx arm:
|
||
let inner = a.mul_add(c, b);
|
||
let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
|
||
// fnmsubsx arm:
|
||
let inner = a.mul_add(c, -b);
|
||
let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
|
||
```
|
||
|
||
### PPCBUG-184 — fresx produces full-precision IEEE 1/b instead of ~12-bit hardware estimate
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:2481-2494
|
||
- **Symptom**: `fres` on Xenon hardware produces a reciprocal approximation via a 256-entry
|
||
LUT with linear interpolation, accurate to roughly 1/4096 relative error (~12 mantissa
|
||
bits). xenia-rs computes `to_single(1.0 / b)` — the fully IEEE-754 correctly-rounded
|
||
single-precision reciprocal. The result is up to ~4096× more accurate than hardware.
|
||
Newton-Raphson refinement code `x = fres(d); x = x*(2 - d*x)` is not broken by this (NR
|
||
converges even from an accurate seed), but code that checks the seed's error magnitude for
|
||
convergence termination, or that relies on `fres(d)*d ≠ 1.0` to decide whether to refine,
|
||
may take the wrong branch. Also, `fres(d)*d` on xenia is much closer to 1.0 than on hardware,
|
||
so a "was the estimate good enough?" check based on the residual will give wrong answers.
|
||
- **Canary parity**: Canary uses `f.Recip(f.Convert(frB, FLOAT32_TYPE))` — approximates by
|
||
first converting to f32 (quantizing the input), then applying the host reciprocal. Still
|
||
produces a fully-accurate IEEE single reciprocal rather than the 12-bit table estimate.
|
||
Both emulators share the deviation. Canary's conversion-first approach is slightly closer to
|
||
hardware (the input is quantized before the reciprocal), so if a future fix is desired,
|
||
Canary's approach is the better reference.
|
||
- **Fix (minimal viable)**: Pre-convert input to f32 to match Canary's quantization:
|
||
`let b32 = b as f32; to_single(ctx, 1.0_f64 / b32 as f64)`. This matches Canary but still
|
||
does not emulate the 12-bit LUT. Full fix requires an `fres` LUT matching Xenon's hardware
|
||
table (documented in Xbox 360 SDK / GamePPCLisa docs).
|
||
|
||
### PPCBUG-185 — FPSCR.NI flush-to-zero not modelled; subnormal results propagate through *sx
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: All *sx arms in interpreter.rs; fpscr.rs has `NI` not defined as a constant
|
||
- **Symptom**: Xenon firmware sets `FPSCR.NI = 1` at boot. With NI=1, the Xenon FPU flushes
|
||
subnormal inputs and results to the appropriate signed zero before and after every
|
||
floating-point operation. xenia-rs inherits the host x86 IEEE-754 default (NI=0), which
|
||
propagates subnormals. Subnormal differences: (a) subnormal FPR inputs are used as-is by
|
||
xenia vs. treated as ±0 by hardware; (b) subnormal results are stored by xenia vs. flushed
|
||
to ±0 by hardware. `update_after_op` sets `UX` when the result is subnormal, but does NOT
|
||
flush it. Games with NI-dependent behavior — most Xbox 360 titles compiled with default
|
||
Xenon ABI settings — may see different float results in subnormal-touching paths.
|
||
- **Canary parity**: Canary also inherits host IEEE NI=0 semantics. Both share this gap.
|
||
- **Fix**: After `to_single` (or the double-precision result), check `ctx.fpscr & fpscr::NI_BIT`
|
||
(needs a constant adding) and if set, flush subnormals: `if result.is_subnormal() { result =
|
||
result.signum() * 0.0 }`. Apply to inputs as well for strict correctness.
|
||
|
||
### PPCBUG-186 — SNaN → QNaN propagation relies on host SSE; not ISA-canonical for all *sx
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:2252-2414 (all arithmetic *sx arms without explicit SNaN guard)
|
||
- **Symptom**: When an SNaN input reaches `faddsx`/`fsubsx`/`fmulsx`/`fdivsx`, the code calls
|
||
`check_invalid_add/mul/div` (correctly sets VXSNAN) but then performs the operation on the
|
||
raw SNaN value: `a + b`, `a * c`, etc. On x86-64 SSE2, the hardware `ADDSD`/`MULSD` ops
|
||
produce a QNaN from the first SNaN operand (bit 51 set, other mantissa bits preserved). This
|
||
matches ISA §4.3.2.2 for the common case. However, for `mul_add` (VFMADD231SD on AVX), the
|
||
SNaN propagation priority may differ: the ISA specifies FRA takes priority over FRB, but
|
||
hardware FMA may use a different priority for the three-operand form. The `fsqrtsx` and
|
||
`fresx` arms handle SNaN explicitly (via `is_snan` check) but do not synthesize the correct
|
||
QNaN result — they rely on `b.sqrt()` / `1.0/b` to produce a NaN, which the host does.
|
||
This is a latent risk; active wrong-result cases require bit-level NaN inspection.
|
||
|
||
### PPCBUG-187 — Zero interpreter execution tests for all 10 group-29 opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs test module (no `#[test]` covers any *sx or fresx)
|
||
- **Symptom**: Regressions in rounding, FPSCR side effects, or operand-field decoding are
|
||
invisible to CI. The existing fpscr unit tests cover helper functions in isolation; no test
|
||
exercises the full `step()` path for any single-precision FPU opcode.
|
||
- **Recommended minimum** (12 tests — see group-29 report for encodings):
|
||
`fadds` exact; `fadds` VXISI; `fsubs` VXISI; `fmuls` 0×∞; `fdivs` ZX;
|
||
`fmadds` VXISI regression (PPCBUG-181); `fmsubs` VXISI regression (PPCBUG-182);
|
||
`fnmadds` NaN-sign (PPCBUG-183); `fnmsubs` NaN-sign (PPCBUG-183);
|
||
`fsqrts` negative input VXSQRT; `fsqrts` round-trip; `fres` basic reciprocal.
|
||
|
||
IDs PPCBUG-188 through PPCBUG-199 are unallocated — reserved for group 29 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 6 (continued) — FPU arithmetic double (group 30)
|
||
|
||
Per-group report: `audit-out/group-30-fpu-double.md`.
|
||
|
||
Group 30 summary: **9 findings (PPCBUG-200..208). 2 MEDIUM cross-cutting, 3 MEDIUM opcode-specific, 4 LOW.** Result arithmetic is correct for all 10 opcodes. FPSCR infrastructure is partially wired: VXSNAN, OX, UX, ZX, VXISI (add/sub), VXIMZ, VXZDZ, VXIDI, VXSQRT all set correctly for the opcodes that need them. Critical gaps: (1) XX/FR/FI bits never set by any opcode — same gap as PPCBUG-180 but now confirmed on the double-precision path; (2) FPSCR.RN not honored for double arithmetic — single-precision has `round_to_single` but double has no equivalent; (3) fmsubx/fnmaddx/fnmsubx omit the VXISI check for ∞-collision in the add step; (4) fnmaddx/fnmsubx flip NaN sign bit via Rust `-` operator but ISA requires NaN sign preserved. frsqrtex uses full-precision 1/sqrt(b) instead of the hardware estimate — acceptable. All FMA forms use `f64::mul_add` for correct single-rounding semantics.
|
||
**9 IDs used (PPCBUG-200..208). 11 IDs unallocated (PPCBUG-209..219).**
|
||
|
||
### PPCBUG-200 — All group-30 opcodes: XX, FR, FI bits never set
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `fpscr.rs:184-194` (`update_after_op`); `interpreter.rs:2248,2268,2289,2310,2335,2357,2379,2401,2463,2510`
|
||
- **Symptom**: Same gap as PPCBUG-180 but confirmed for the double-precision path. `update_after_op` only tracks OX (overflow to infinity) and UX (subnormal). FPSCR[XX] (inexact sticky), FPSCR[FR] (round direction), and FPSCR[FI] (inexact for current op) are never updated by any group-30 opcode. Every double-precision arithmetic operation that rounds a non-representable result silently omits these bits.
|
||
- **Fix**: Same as PPCBUG-180 — read MXCSR exception flags after each f64 operation and map to FI/XX/FR. For double, no `to_single` step is involved so the comparison must be done via MXCSR or by a post-op bit-level comparison of inputs vs. result.
|
||
- **Test gap**: Zero tests verify XX set after any inexact double-precision operation.
|
||
|
||
### PPCBUG-201 — All group-30 opcodes: FPSCR.RN not honored for double arithmetic
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2242-2512` (all 10 arms)
|
||
- **Symptom**: Host f64 operators always use nearest-even (host MXCSR default). `fpscr.rs` has a complete `rounding_mode(ctx)` helper and directed rounding helpers for single-precision (`round_to_single`), but no equivalent for double arithmetic. Guest `mtfsfi` RN changes have no effect on faddx/fsubx/fdivx/fsqrtx etc.
|
||
- **Fix**: Wrap each double-precision arithmetic arm with an MXCSR round-mode set/restore when `ctx.fpscr & fpscr::RN_MASK != 0`. Fast path (RN=0) stays zero-cost.
|
||
- **Test gap**: No test changes RN and verifies directed rounding on any double arithmetic opcode.
|
||
|
||
### PPCBUG-202 — fmaddx: non-FMA `a * c` used in check_invalid_add can spuriously raise/miss VXISI
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2332`
|
||
- **Symptom**: `check_invalid_add(ctx, a * c, b, false)` uses a separate two-rounding multiply to approximate the FMA intermediate product. When the true FMA intermediate is finite but the standalone product overflows to ±∞, VXISI fires spuriously. When the true intermediate is ±∞ but the standalone product is finite (extreme cancellation), VXISI is missed.
|
||
- **Fix**: Derive VXISI from input-value properties directly: if `(a.is_infinite() || c.is_infinite())` (product is mathematically infinite) and `b.is_infinite()` with opposing sign → VXISI.
|
||
- **Test gap**: No test covers the large-value cancellation case in fmaddx.
|
||
|
||
### PPCBUG-203 — fmsubx, fnmaddx, fnmsubx: VXISI never raised for ∞-collision in add/sub step
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: `interpreter.rs:2354` (fmsubx), `2376` (fnmaddx), `2398` (fnmsubx)
|
||
- **Symptom**: Same pattern as PPCBUG-181/182 for the double-precision variants. These three arms call only `check_invalid_mul` and omit `check_invalid_add`. Per ISA, all four FMA variants must raise VXISI when the add step yields ∞+∓∞. Example for fmsub: `A×C = +∞`, `B = +∞` → `+∞ − +∞` → VXISI. Currently the result NaN propagates silently with no FPSCR update. The fnmsub pattern is the canonical Newton-Raphson step — the most common FPU path in Xbox 360 graphics code.
|
||
- **Fix**: Add `fpscr::check_invalid_add(ctx, a * c, b, true)` for `fmsubx`/`fnmsubx` and `fpscr::check_invalid_add(ctx, a * c, b, false)` for `fnmaddx` (apply PPCBUG-202 sign-fix simultaneously).
|
||
- **Test gap**: Zero tests for VXISI on any of the three opcodes.
|
||
|
||
### PPCBUG-204 — fmaddx check_invalid_add sub-issue (sign logic reliant on imprecise product)
|
||
- **Severity**: LOW (sub-issue of PPCBUG-202)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2332`
|
||
- **Symptom**: VXISI logic is internally consistent with the passed `a * c` value, but that value can have the wrong sign in extreme overflow/underflow cases. Resolve as part of PPCBUG-202.
|
||
|
||
### PPCBUG-205 — fnmaddx / fnmsubx: Rust `−` flips NaN sign bit; ISA requires NaN sign preserved
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: `interpreter.rs:2377` (fnmaddx), `interpreter.rs:2399` (fnmsubx)
|
||
- **Symptom**: Same pattern as PPCBUG-183 for the double-precision variants. Rust's unary `-` applied to a NaN result flips the IEEE-754 sign bit. PowerISA Book I §4.3.4 states the negation is not applied to NaN results. Title code using NaN sentinels (audio middleware, debug fills) receives sign-flipped NaN payloads.
|
||
- **Fix**:
|
||
```rust
|
||
let fma = a.mul_add(c, b); // fnmaddx
|
||
let result = if fma.is_nan() { fma } else { -fma };
|
||
// and analogously for fnmsubx
|
||
```
|
||
- **Test gap**: No test exercises fnmaddx/fnmsubx with NaN-producing inputs to check sign of result NaN.
|
||
|
||
### PPCBUG-206 — frsqrtex edge cases correct; no code change needed (informational)
|
||
- **Severity**: LOW (confirmed clean, informational)
|
||
- **Status**: wontfix
|
||
- **Location**: `interpreter.rs:2496-2512`
|
||
- **Analysis**: ZX fires for ±0. VXSQRT guard correctly excludes -0.0. frsqrte(+∞)=+0 correct. Full-precision is acceptable over-precision.
|
||
- **Fix**: Add comment: `// Full-precision: hardware gives ~12-14 bit estimate. NR converges identically.`
|
||
- **Test gap**: Zero frsqrtex unit tests — add 4 (±0 inputs, negative input+VXSQRT, SNaN, +∞).
|
||
|
||
### PPCBUG-207 — FMA opcode OX logic correct, OX edge cases untested (informational)
|
||
- **Severity**: LOW (confirmed clean, informational)
|
||
- **Status**: wontfix
|
||
- **Location**: `interpreter.rs:2335,2357,2379,2401`
|
||
- **Analysis**: `inputs_were_finite` correctly suppresses OX when an input is already infinite. OX fires when all inputs are finite but the FMA result overflows — ISA-correct.
|
||
- **Test gap**: Zero tests for OX scenario in any FMA opcode.
|
||
|
||
### PPCBUG-208 — Zero tests for fsubx, fdivx, fmsubx, fnmaddx, fnmsubx, fsqrtx, frsqrtex
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: 7 of 10 group-30 opcodes have zero tests. `faddx` has 1 happy-path test; `fmulx` has 1; `fmaddx` has 1. None have FPSCR/Rc=1/edge-case coverage.
|
||
- **Recommended minimum** (12 tests): `fsubx` normal; `fsubx` VXISI; `fdivx` normal; `fdivx` ZX; `fdivx` VXZDZ; `fmsubx` normal; `fnmaddx` normal; `fnmsubx` normal; `fnmaddx` NaN-sign regression (PPCBUG-205); `fsqrtx` normal; `fsqrtx` negative+VXSQRT; `frsqrtex` positive.
|
||
|
||
IDs PPCBUG-209 through PPCBUG-219 are unallocated — reserved for group 30 follow-up.
|
||
|
||
---
|
||
|
||
## Pending batches
|
||
|
||
- Batch 2: groups 6-11 — logical immediate, logical register, sign-extend/CLZ, word rotate, doubleword rotate, shift.
|
||
- Batch 3: groups 12-17 — compare, branch, trap+sc, CR logical, SPR/MSR, cache+sync.
|
||
- Batch 4: groups 18-23 — loads (byte, halfword, word, doubleword, multiple/string, float).
|
||
- Batch 5 (partial): groups 24, 26, 27, 28 done; group 25 (store word) pending.
|
||
- Batch 6 (partial): groups 29, 30 done; group 31 (FPU convert/compare) pending.
|
||
- Batch 7: groups 32-34 — VMX integer (add/sub, compare/min/max, logical/shift).
|
||
- Batch 8: groups 35-38 — VMX permute/pack, VMX float, VMX multiply-sum, VMX load/store.
|
||
- Phase C: decoder field extractors, decoder opcode-lookup, disassembler formatter parity.
|
||
- Phase D: this file gets re-sorted by severity and finalized.
|
||
|
||
---
|
||
|
||
## Batch 6 (continued) — FPU sign/move/compare/convert/round (group 31)
|
||
|
||
Per-group report: `audit-out/group-31-fpu-misc.md`.
|
||
|
||
Group 31 summary: **9 findings (PPCBUG-221..231; IDs 220/222/226 retracted after analysis).
|
||
1 HIGH, 3 MEDIUM, 5 LOW.** The sign-bit manipulation family (`fabsx`, `fnegx`, `fnabsx`, `fmrx`)
|
||
and `fselx` are all ISA-correct — Rust arithmetic maps to bit-level operations that preserve SNaN
|
||
payloads. `fcmpu` is correct (FPRF and VXSNAN set; no spurious VXVC). The conversion group is
|
||
mostly correct for result values and overflow sentinels; the main gaps are FPSCR inexact/FR/FI
|
||
tracking (shared with groups 29/30) and one subtle NearestEven tie-breaking defect in
|
||
`round_to_i64` that affects `fctidx`. `fcmpo` silently omits VXSNAN/VXVC despite having a
|
||
comment acknowledging the gap.
|
||
|
||
**9 IDs used (PPCBUG-221, 223, 224, 225, 227, 228, 229, 230, 231). IDs 220/222/226 retracted.
|
||
IDs PPCBUG-232..239 unallocated.**
|
||
|
||
### PPCBUG-221 — `fctidx` / `round_to_i64` NearestEven tie-breaking uses f64::EPSILON; broken for |v| > 2^52
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: `fpscr.rs:220–238` (`round_to_i64`, `NearestEven` case)
|
||
- **Symptom**: The tie-breaking code computes `diff = (v - v.trunc()).abs()` and tests
|
||
`(diff - 0.5).abs() < f64::EPSILON` to detect a half-integer. Above `|v| = 2^52`,
|
||
`v.trunc() == v` for all representable f64 values (all are exact integers), so `diff == 0.0`
|
||
and the tie-breaking branch is never taken — the code falls through to `v.round() as i64`,
|
||
which is round-half-away-from-zero instead of round-half-to-even. Every fctid call on a
|
||
large odd half-integer (e.g. `(2^52 + 1).5`) produces the wrong integer. In practice these
|
||
exact 0.5 cases are rare for large values but can appear in audio sample-count arithmetic
|
||
and physics fixed-point pipelines.
|
||
- **Fix**: replace the NearestEven arm with a fractional-part-only tie check that is exact for
|
||
|v| <= 2^52 and degenerates correctly to truncation above 2^52:
|
||
```rust
|
||
RoundingMode::NearestEven => {
|
||
let t = v.trunc();
|
||
let frac = v - t; // exact for |v| <= 2^52; ==0 above (already integer)
|
||
let fa = frac.abs();
|
||
if fa > 0.5 { t as i64 + if v >= 0.0 { 1 } else { -1 } }
|
||
else if fa < 0.5 { t as i64 }
|
||
else {
|
||
// Exact 0.5 tie — round to even.
|
||
let fi = t as i64;
|
||
if fi & 1 == 0 { fi } else { fi + if v >= 0.0 { 1 } else { -1 } }
|
||
}
|
||
}
|
||
```
|
||
- **Test gap**: add `round_to_i64` tests in `fpscr.rs:tests`: 0.5→0, 1.5→2, 2.5→2, 3.5→4,
|
||
-0.5→0, -1.5→-2. Existing tests cover 2.5→2 and 3.5→4 (currently accidentally correct).
|
||
|
||
### PPCBUG-223 — `fcmpo` omits FPSCR[VXSNAN] and FPSCR[VXVC] on NaN operands
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2645–2675`
|
||
- **Symptom**: `fcmpo` body is identical to `fcmpu` — it sets FPRF and the CR field correctly
|
||
but calls no `fpscr::set_exception`. PowerISA requires: QNaN → `FPSCR[VXVC, VX, FX]`;
|
||
SNaN → additionally `FPSCR[VXSNAN]`. `fcmpu` correctly sets VXSNAN for SNaN; `fcmpo` does
|
||
not. A comment in the source acknowledges "not modeled yet."
|
||
- **Impact**: `fcmpo.` (Rc=1) checking CR1.FX after a NaN compare will see FX=0 instead of
|
||
FX=1. `mffsx` after `fcmpo` will not reflect VXVC. Xbox 360 CRT comparison primitives
|
||
(`islessgreater`, ordered relational operators) use `fcmpo`.
|
||
- **Fix**:
|
||
```rust
|
||
if fra.is_nan() || frb.is_nan() {
|
||
ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: false, so: true };
|
||
if fpscr::is_snan(fra) || fpscr::is_snan(frb) {
|
||
fpscr::set_exception(ctx, fpscr::VXSNAN | fpscr::VXVC);
|
||
} else {
|
||
fpscr::set_exception(ctx, fpscr::VXVC);
|
||
}
|
||
}
|
||
```
|
||
|
||
### PPCBUG-224 — `fcfidx` does not set FPSCR[XX/FX] for inexact i64→f64 conversion
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2528–2536`
|
||
- **Symptom**: Only FPRF is updated. Per ISA, `fcfid` sets `FPSCR[XX, FX]` (and FR/FI) when
|
||
the i64 value has more than 53 significant bits and precision is lost. Any i64 with
|
||
`|v| > 2^53` triggers inexact. Common trigger: large frame/sample counters, address values.
|
||
- **Fix**: after the conversion, compare `(result as i64) != (bits as i64)` and call
|
||
`fpscr::set_exception(ctx, fpscr::XX)` if inexact.
|
||
|
||
### PPCBUG-225 — `frspx` does not set FPSCR[XX/FX/FR/FI] on inexact rounding
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2516–2527`
|
||
- **Symptom**: `update_after_op` sets OX/UX only. The ISA requires FR/FI/XX/FX on any f64→f32
|
||
rounding that is not exact. `frsp` is the canonical double→single-precision narrowing idiom
|
||
in compiler output — virtually every call is inexact.
|
||
- **Fix**: after `to_single`, compare result vs b; if different and both finite, call
|
||
`fpscr::set_exception(ctx, fpscr::XX | fpscr::FI | ...)` with FR set if magnitude increased.
|
||
|
||
### PPCBUG-227 — `fctiwx` rounding: `round_to_i32` inherits NearestEven defect via `round_to_i64`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `fpscr.rs:241–243`
|
||
- **Symptom**: `round_to_i32` calls `round_to_i64` then clamps. The PPCBUG-221 defect in
|
||
`round_to_i64` does not manifest for i32-range values (the epsilon check accidentally works
|
||
at this scale), but the structural fragility is inherited. Fixing PPCBUG-221 cures this.
|
||
- **Recommendation**: add unit tests `round_to_i32(0.5)==0`, `round_to_i32(1.5)==2`,
|
||
`round_to_i32(2.5)==2` to verify correct round-to-even behavior.
|
||
|
||
### PPCBUG-228 — Zero interpreter execution tests for fabsx/fnegx/fnabsx/fmrx/fselx/fcmpo/fcfidx/fctidx/fctidzx/frspx
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` module
|
||
- **Symptom**: 10 of the 13 group-31 opcodes have zero dedicated tests. `test_fcmpu` covers
|
||
only the ordered comparison `5.0 > 3.0`. `test_fctiwzx` covers one positive truncation.
|
||
`test_fadd`/`test_fmul` are group-30 tests, not group-31.
|
||
- **Recommended minimum**: SNaN-preservation test for fabsx/fnegx/fnabsx; fselx with NaN/−0/−1;
|
||
fcmpo QNaN→VXVC (after PPCBUG-223 fix); fcfidx exact and inexact; fctidx tie cases; frspx
|
||
inexact → XX set (after PPCBUG-225 fix); fctiwx nearest-even tie; fctiwzx NaN sentinel.
|
||
|
||
### PPCBUG-229 — `fctidx` / `fctidzx` do not set FPSCR[XX/FX] for inexact inputs
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Locations**: `interpreter.rs:2537–2574`
|
||
- **Symptom**: Per ISA, float-to-integer conversions set `FPSCR[XX, FX]` when the source
|
||
value is not an integer (the fractional part is discarded). Neither opcode sets XX.
|
||
Shared root cause with PPCBUG-224/225.
|
||
|
||
### PPCBUG-230 — `fctiwx` / `fctiwzx` do not set FPSCR[XX/FX] for inexact inputs
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Locations**: `interpreter.rs:2575–2612`
|
||
- **Symptom**: Same omission as PPCBUG-229 for the word-width conversion pair.
|
||
|
||
### PPCBUG-231 — `frspx` SNaN input result written as QNaN (host platform dependency)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2519–2524`
|
||
- **Symptom**: Rust's `as f32` (CVTSD2SS) can set the quiet bit on SNaN input, producing a
|
||
QNaN in the FPR. Per ISA, `frsp` on SNaN should quieten it — so the QNaN result is
|
||
correct in kind. The risk is that the exact QNaN bit-pattern may differ from PPC's
|
||
canonical quietening (which ORs bit 22 into the f32 mantissa). Game code inspecting the
|
||
NaN payload after frsp may see a different payload. Same structural root cause as
|
||
PPCBUG-128 (`lfs` SNaN quietening), but lower severity because frsp IS arithmetic.
|
||
|
||
IDs PPCBUG-232 through PPCBUG-239 are unallocated — no further bugs found in group 31.
|
||
|
||
---
|
||
|
||
## Batch 7 — VMX integer add/sub (group 32)
|
||
|
||
Per-group report: `audit-out/group-32-vmx-int-addsub.md`.
|
||
|
||
**Scope**: `vaddubm`, `vaddubs`, `vadduhm`, `vadduhs`, `vadduwm`, `vadduws`, `vaddsbs`, `vaddshs`,
|
||
`vaddsws`, `vaddcuw`, `vsububm`, `vsububs`, `vsubuhm`, `vsubuhs`, `vsubuwm`, `vsubuws`, `vsubsbs`,
|
||
`vsubshs`, `vsubsws`, `vsubcuw`.
|
||
|
||
**Overall verdict**: All 20 opcodes are arithmetically correct. No HIGH-severity bugs found.
|
||
Lane indexing (big-endian, PPC element 0 = `Vec128::bytes[0]`), saturation arithmetic, VSCR.SAT
|
||
sticky-set, and vaddcuw/vsubcuw carry/borrow semantics are all implemented correctly.
|
||
4 LOW-severity findings (2 test gaps, 1 code organization, 1 API hazard).
|
||
|
||
### PPCBUG-240 — 18 of 20 group-32 opcodes have zero interpreter-level tests
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` module
|
||
- **Symptom**: Only `test_vaddubs_saturates_and_sets_vscr_sat` covers any group-32 opcode.
|
||
`vaddubm`, `vsububm`, `vadduhm`, `vsubuhm`, `vadduwm`, `vsubuwm`, `vaddsbs`, `vsubsbs`,
|
||
`vadduhs`, `vsubuhs`, `vaddshs`, `vsubshs`, `vadduws`, `vsubuws`, `vaddsws`, `vsubsws`,
|
||
`vaddcuw`, `vsubcuw` — all 18 have no tests. No high risk today but no regression guard.
|
||
- **Recommended minimum**: wrap-around test (byte, halfword, word); sat-at-max and sat-at-min tests;
|
||
VSCR.SAT sticky-set across two successive saturating instructions; vaddcuw carry lane; vsubcuw
|
||
no-borrow lane.
|
||
|
||
### PPCBUG-241 — `vadduwm` / `vsubuwm` stranded in a separate section from the rest of group-32
|
||
|
||
- **Severity**: LOW (maintenance hazard)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2090–2104` (stranded) vs. `interpreter.rs:2784` (§4a group-32 section)
|
||
- **Symptom**: The two word-modulo opcodes are matched 700 lines above the rest of the group, with
|
||
only a comment at line 2819 as a cross-reference. A future sweep of §4a for group-32 changes
|
||
would miss them.
|
||
- **Fix**: Move both arms into §4a and remove the comment at line 2819.
|
||
|
||
### PPCBUG-242 — `set_vscr_sat(false)` can non-stickily clear SAT from arithmetic handlers
|
||
|
||
- **Severity**: LOW (API hazard)
|
||
- **Status**: open
|
||
- **Location**: `context.rs:252–259`
|
||
- **Symptom**: `set_vscr_sat(bool)` accepts `false`, which would clear the sticky SAT bit. All
|
||
current arithmetic callers pass `true` only (inside `if sat { ... }` guards), so no mis-clear
|
||
occurs today. But the API is misleading — a future saturating handler that writes
|
||
`set_vscr_sat(lane_sat)` with `lane_sat = false` would silently clear a previously-set bit.
|
||
- **Fix**: Rename to `sticky_set_vscr_sat()` (no bool argument, always ORs). Retain
|
||
`force_vscr_sat(bool)` for `mtvscr`.
|
||
|
||
### PPCBUG-243 — `vmx.rs` saturation helpers: u16/i16/u32/i32 variants have zero unit tests
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `crates/xenia-cpu/src/vmx.rs:705–799`
|
||
- **Symptom**: `vmx.rs` tests cover 5 cases of `sat_add/sub_i8/u8`. The 8 helpers for wider
|
||
types (`sat_add_u16`, `sat_sub_u16`, `sat_add_i16`, `sat_sub_i16`, `sat_add_u32`, `sat_sub_u32`,
|
||
`sat_add_i32`, `sat_sub_i32`) are mathematically correct but unguarded by any test. Recommended
|
||
additions listed in the per-group report.
|
||
|
||
IDs PPCBUG-244 through PPCBUG-274 are unallocated — no further bugs found in group 32.
|
||
|
||
---
|
||
|
||
## Batch 7 — VMX integer compare / min / max / avg (group 33)
|
||
|
||
Per-group report: `audit-out/group-33-vmx-int-compare.md`.
|
||
|
||
### PPCBUG-275 — All VC-form vector compare dot forms: `rc_bit()` reads wrong bit; CR6 never updated
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Affected opcodes**: `vcmpequb.`, `vcmpequh.`, `vcmpgtsb.`, `vcmpgtsh.`, `vcmpgtub.`, `vcmpgtuh.`
|
||
- **Location**: `decoder.rs:75` + `interpreter.rs:3318`, `3331`, `3344`, `3357`, `3370`, `3383`
|
||
- **Symptom**: `rc_bit()` is implemented as `self.raw & 1 != 0` (reads LSB = bit 0 of the word).
|
||
For VC-form instructions the Rc flag is at **PPC bit 21 = LSB bit 10**, not bit 0. Bit 0 is
|
||
the LSB of the 10-bit XO field. All integer compare XO values are even (XO=6, 70, 518, 774, 582, 838),
|
||
so their bit 0 is always 0. The CR6 update block is **unconditionally dead** regardless of
|
||
whether the programmer wrote the dot form. `vcmpequb. vMask, vData, vNeedle` + `bc 12,26`
|
||
(branch on CR6.LT = all-true) is the canonical AltiVec memchr idiom; it will always fall through.
|
||
- **Fix**:
|
||
```rust
|
||
// decoder.rs — add:
|
||
/// Rc bit for VC-form vector compare instructions (PPC bit 21 = LSB bit 10).
|
||
#[inline] pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }
|
||
```
|
||
Replace `instr.rc_bit()` with `instr.vc_rc_bit()` at interpreter.rs:3318, 3331, 3344, 3357,
|
||
3370, 3383.
|
||
|
||
### PPCBUG-276 — `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`: same VC-form Rc bug
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Affected opcodes**: `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`
|
||
- **Location**: `interpreter.rs:2237`, `3396`, `3406`
|
||
- **Symptom**: Same root cause as PPCBUG-275. XO for vcmpequw=134, vcmpgtuw=646, vcmpgtsw=902 —
|
||
all even, bit 0 always 0. Word-compare dot forms never update CR6. `vcmpequw128` uses the
|
||
VMX128_R Rc encoding which also likely reads the wrong bit.
|
||
- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:2237, 3396, 3406. Separately verify
|
||
VMX128_R Rc bit position for `vcmpequw128` (may require its own extractor).
|
||
|
||
### PPCBUG-277 — Zero tests for all `vcmp*` dot forms and CR6 correctness
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` module
|
||
- **Symptom**: No test exercises any of the 10 integer vector compare opcodes. Critical missing:
|
||
`vcmpequb.` all-true → CR6.LT=1; `vcmpequb.` all-false → CR6.EQ=1; `vcmpgtsb` signed
|
||
boundary (0x80 vs 0x7F must yield false, not true); `vcmpgtsh` at 0x8000 vs 0x7FFF.
|
||
|
||
### PPCBUG-278 — Zero tests for all 12 `vmax*` / `vmin*` opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` module
|
||
- **Symptom**: None of vmaxub/uh/uw/sb/sh/sw, vminub/uh/uw/sb/sh/sw are tested. Critical missing:
|
||
`vmaxsb(0x80, 0x7F)` = 0x7F (signed max of -128 and +127); `vminsb(0x80, 0x7F)` = 0x80.
|
||
Without these, signed vs unsigned confusion in min/max would not be caught.
|
||
|
||
### PPCBUG-279 — Zero tests for all 6 `vavg*` opcodes; no signed-boundary or rounding coverage
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` module; `vmx.rs` test module
|
||
- **Symptom**: `avg_u8` through `avg_i32` helpers have no unit tests. Key rounding case:
|
||
`avg_u8(0, 1)` must be 1 (round up), not 0 (truncation). `avg_i32(i32::MIN, i32::MIN)` must
|
||
be `i32::MIN` without overflow.
|
||
|
||
IDs PPCBUG-280 through PPCBUG-314 are unallocated — no further bugs found in group 33.
|
||
|
||
---
|
||
|
||
## Batch 6 — VMX integer logical / shift / rotate / select (group 34)
|
||
|
||
Per-group report: `audit-out/group-34-vmx-logic-shift.md`.
|
||
|
||
Group 34 summary: the bitwise logical ops (vand/vandc/vor/vxor/vnor and their 128 variants)
|
||
are all ISA-correct — Vec128 is `[u8; 16]` with no padding bits, so `!(u32)` flips exactly
|
||
32 bits per lane with no upper-bit pollution (the PPCBUG-029/030/031 class does not apply to
|
||
VMX register files). The per-lane shifts (vslb/vsrb/vsrab, vslh/vsrh/vsrah, vslw/vsrw/vsraw
|
||
and their 128 variants) all correctly mask the shift count to the lane width before shifting;
|
||
vsraw uses i32 arithmetic right shift which is correctly defined in Rust for shift-by-31.
|
||
The per-lane rotates (vrlb/vrlh/vrlw and 128 variants) are correct. The whole-register bit
|
||
shifts (vsl/vsr) and whole-register byte shifts (vslo/vsro and 128 variants) correctly
|
||
extract the shift count from VB.b[15] with the proper bit masks. vsel and vsel128 are correct
|
||
including the read-before-write ordering on vsel128's vc=vd aliasing.
|
||
|
||
**One HIGH bug found**: vrlimi128 extracts both the rotate-amount (z) field and the
|
||
blend-mask (IMM) field from the wrong bit positions of the instruction word.
|
||
|
||
**0 MEDIUM bugs with code change needed. 1 HIGH. 10 LOW (test gaps and informational).**
|
||
|
||
### PPCBUG-315 — vrlimi128 z and IMM fields extracted from wrong bit positions
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: interpreter.rs:3551–3552
|
||
- **Symptom**: `shift = ((instr.raw >> 16) & 0x3)` reads integer bits 16–17 — the low 2 bits
|
||
of the 5-bit IMM (blend-mask) field — instead of the 2-bit `z` (rotate) field at integer
|
||
bits 6–7. `mask = (instr.raw >> 2) & 0xF` reads integer bits 2–5 — VD128h extension bits
|
||
and a reserved field — instead of the low 4 bits of IMM at integer bits 16–19.
|
||
**Every `vrlimi128` executes with a wrong rotate amount and a wrong per-word select mask.**
|
||
The only benign case is the degenerate encoding where `z == IMM[1:0]` and the garbage mask
|
||
happens to equal the intended mask — unlikely in real code.
|
||
- **VX128_4 field layout** (LSB-0 integer bit numbering after PPC big-endian byte-swap to host):
|
||
- `VD128l : 5` at integer bits 21–25 (PPC bits 6–10)
|
||
- `IMM : 5` at integer bits 16–20 (PPC bits 11–15) — blend mask, 4 bits used
|
||
- `VB128l : 5` at integer bits 11–15 (PPC bits 16–20)
|
||
- `z : 2` at integer bits 6–7 (PPC bits 24–25) — rotate amount 0..3
|
||
- `VD128h : 2` at integer bits 2–3 (PPC bits 28–29)
|
||
- **Fix**:
|
||
```rust
|
||
let shift = ((instr.raw >> 6) & 0x3) as usize; // z field: integer bits 6-7
|
||
let mask = (instr.raw >> 16) & 0xF; // IMM low 4 bits: integer bits 16-19
|
||
```
|
||
- **Canary reference**: `ppc_decode_data.h:585–608` `FormatVX128_4`; `ppc_emit_altivec.cc:1318,1324`.
|
||
- **Note**: the rotate logic (`b[(shift + i) % 4]`) and mask-select logic (`(mask >> (3-i)) & 1`)
|
||
in the interpreter body are ISA-correct — only the field extraction is wrong.
|
||
- **Test gap**: no interpreter execution test for vrlimi128 (PPCBUG-325).
|
||
|
||
### PPCBUG-316 — Zero interpreter execution tests for vslb/vsrb/vsrab (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:3440–3463
|
||
|
||
### PPCBUG-317 — Zero interpreter execution tests for vslh/vsrh/vsrah (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:3472–3503
|
||
|
||
### PPCBUG-318 — vslo/vsro byte-shift count max is 15 (correct; informational)
|
||
|
||
- **Severity**: LOW (informational / wontfix)
|
||
- **Status**: wontfix
|
||
- `N` is a 4-bit field; max shift is 15 bytes = 120 bits (not 128). VD retains
|
||
the 8 LSBs of VA in position [127:120] at N=15. ISA-correct.
|
||
|
||
### PPCBUG-319 — vsel128 vc=vd read-before-write ordering (correct; informational)
|
||
|
||
- **Severity**: LOW (informational / wontfix)
|
||
- **Status**: wontfix
|
||
- `c = ctx.vr[vc]` is read before `ctx.vr[vd] = result`. Correctly sequenced.
|
||
|
||
### PPCBUG-320 — Zero interpreter execution tests for vslw/vsrw/vsraw/vrlw (+128 variants)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:2108–2155
|
||
|
||
### PPCBUG-321 — Zero interpreter execution tests for vsl/vsr
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:3508–3521
|
||
|
||
### PPCBUG-322 — Zero interpreter execution tests for vslo/vsro (+128 variants)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:3523–3541
|
||
|
||
### PPCBUG-323 — Zero interpreter execution tests for vand/vandc/vor/vxor/vnor (+128 variants)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1900–1944
|
||
|
||
### PPCBUG-324 — Zero interpreter execution tests for vsel/vsel128
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1945–1967
|
||
|
||
### PPCBUG-325 — Zero interpreter execution tests for vrlb/vrlh/vrlw/vrlimi128 (+128 variants)
|
||
|
||
- **Severity**: LOW (test gap; fix PPCBUG-315 before writing vrlimi128 tests)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:3464–3503, 2144–2155, 3550–3565
|
||
|
||
IDs PPCBUG-326 through PPCBUG-354 are unallocated — no further bugs found in group 34.
|
||
|
||
---
|
||
|
||
## Batch 8 — VMX permute / merge / splat / pack / unpack (group 35)
|
||
|
||
Per-group report: `audit-out/group-35-vmx-permute.md`.
|
||
|
||
**Summary**: 5 HIGH, 3 MEDIUM, 9 LOW. Four VX128_* field-extraction bugs; one missing post-pack permutation logic; VSCR.SAT and pack saturation semantics are all correct. Zero interpreter tests for any group-35 opcode.
|
||
|
||
### PPCBUG-360 — vperm128: VC register read from wrong field (vd128() instead of VX128_2 VC bits 23-25)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1979`
|
||
- **Symptom**: `vperm128` uses the VX128_2 instruction form. The permute-control register VC is a 3-bit field at PPC bits 23-25 (LSB integer bits 6-8). The code does `vc = instr.vd128()` which reads PPC bits 6-10 + 21-22 — a completely different set of bits. Every `vperm128` therefore permutes with a control vector read from the wrong register, producing garbage output. `vperm128` is one of the most-used VMX128 ops in Xbox 360 graphics code (texture/vertex data layout).
|
||
- **Fix**:
|
||
```rust
|
||
// decoder.rs — add accessor:
|
||
#[inline] pub fn vc128_2(&self) -> usize { ((self.raw >> 6) & 0x7) as usize }
|
||
// interpreter.rs:1979 — replace:
|
||
vc = instr.vc128_2(); // VX128_2 VC field at PPC bits 23-25
|
||
```
|
||
- **ISA ref**: `ppc-manual/vmx/vperm.md`, VX128_2 encoding; `ppc_decode_data.h:541-561`; `ppc_emit_altivec.cc:1203-1204` (`VX128_2_VC`).
|
||
|
||
### PPCBUG-361 — vsldoi128: SH field MSB reads bit 4 (reserved) instead of bit 9
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:2012`
|
||
- **Symptom**: VX128_5 SH is a 4-bit field at LSB integer bits 6-9. Code does `((raw >> 6) & 0x7) | (((raw >> 4) & 0x1) << 3)`. This reads bit 4 (a reserved field, always 0 in valid encodings) as the MSB of SH instead of bit 9. Shifts of 8-15 bytes silently resolve as shifts of 0-7 bytes. `vsldoi128` with `SH >= 8` (common in vector rotation patterns) always produces the wrong result.
|
||
- **Fix**:
|
||
```rust
|
||
let sh = ((instr.raw >> 6) & 0xF) as usize; // SH field: integer bits 6-9
|
||
```
|
||
- **ISA ref**: `ppc-manual/vmx/vsldoi.md`, VX128_5 encoding; `ppc_decode_data.h:609-634`; canary `VX128_5_SH`.
|
||
|
||
### PPCBUG-362 — vpermwi128: PERMh (high 3 bits of 8-bit PERM immediate) read from VD128l bits instead of bits 6-8
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:4089`
|
||
- **Symptom**: VX128_P PERM = `PERMl[4:0] | (PERMh[2:0] << 5)` where PERMl is at integer bits 16-20 and PERMh is at integer bits 6-8. Code does `(raw >> 16) & 0xFF` which reads bits 16-23. Bits 21-23 are VD128l[4:2], not PERMh. The top 3 bits of the 8-bit PERM immediate are wrong; output word lane selections for lanes 0 and 1 are controlled by garbage bits. Same pattern as PPCBUG-315.
|
||
- **Fix**:
|
||
```rust
|
||
let imm = ((instr.raw >> 16) & 0x1F) | (((instr.raw >> 6) & 0x7) << 5); // VX128_P PERM
|
||
```
|
||
- **ISA ref**: `ppc_decode_data.h:664-686`; `ppc_emit_altivec.cc:1214`.
|
||
|
||
### PPCBUG-363 — vpkd3d128: post-pack permutation (pack + z fields) entirely absent; output always placed in wrong lane when pack != 0
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:3783-3808`
|
||
- **Symptom**: Canary's `vpkd3d128` does three things: (1) pack VB by type, (2) permute the result with the existing VD register using a control determined by `pack` (IMM[1:0]) and `shift` (z field at integer bits 6-7), (3) store. Xenia-rs does only (1) and (3), skipping the entire lane-placement permutation. When `pack != 0`, the packed value must be merged into a specific 32-bit or 64-bit slot of VD — this merge never happens. `pack=0` is the only safe case. Most D3D vertex pack sequences use `pack=1` (32-bit slot) with varying `shift`.
|
||
- **Fix**: Extract `pack = uimm & 3` and `shift = (instr.raw >> 6) & 3` (z field), read existing `ctx.vr[vd]`, apply the permutation table from `ppc_emit_altivec.cc:2125-2188`, write back.
|
||
- **ISA ref**: `ppc_emit_altivec.cc:2088-2191`.
|
||
|
||
### PPCBUG-364 — vsldoi (non-128): correct; PPCBUG-365 — vsplt*: correct; informational
|
||
|
||
- **Severity**: LOW (wontfix)
|
||
- **Status**: wontfix
|
||
- `vsldoi` correctly extracts SH as `(raw >> 6) & 0xF`. `vspltb/vsplth/vspltw` correctly read UIMM from the VA position (integer bits 16-20, masked to lane width). No bugs.
|
||
|
||
### PPCBUG-366 — vspltisb / vspltish: sign-extension idiom is correct but non-obvious; future regression risk
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open (clarity fix recommended)
|
||
- **Location**: `interpreter.rs:2059-2060`, `2064-2066`
|
||
- **Symptom**: `simm | !0x1F` where `simm` is typed `i8`/`i16` is functionally correct (Rust narrows `!0x1F` to the target type), but the pattern is fragile under refactoring. Recommend:
|
||
```rust
|
||
let simm = (((instr.raw >> 16) & 0x1F) as i32).wrapping_shl(27).wrapping_shr(27) as i8;
|
||
```
|
||
|
||
### PPCBUG-367 — vupkhpx / vupklpx: channel replication vs zero-extend divergence; canary is unimplemented
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `vmx.rs:318-330`
|
||
- **Symptom**: `unpack_pixel_555` replicates 5-bit RGB channels (`r << 3 | r >> 2`) to fill 8 bits. ISA specifies zero-extension into bits 7:3, leaving bits 2:0 as zero. The replicate approach produces slightly different values (and slightly higher values), diverging from hardware.
|
||
- **Fix**: `let r8 = r << 3;` (drop the `| r >> 2` replication term).
|
||
|
||
### PPCBUG-368 — vpkpx: pack_pixel_555 channel assignment unverified against hardware; canary comparison inconclusive
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open (needs hardware trace or more detailed canary analysis)
|
||
- **Location**: `vmx.rs:310-316`
|
||
- **Symptom**: The xenia-rs layout comment says R=bits 8-15, G=16-23, B=24-31. Canary's `vkpkx_in_low` uses different shift amounts (`>> 9` for R, `>> 6` for G, `>> 3` for B), suggesting either a different input layout assumption or the channels are swapped. Without a hardware reference, cannot determine which is authoritative.
|
||
|
||
### PPCBUG-369 — vpkd3d128 z-field not extracted (sub-issue of PPCBUG-363)
|
||
|
||
- **Severity**: LOW (tracked under PPCBUG-363)
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:3785`
|
||
- The `z` field (VX128_4, integer bits 6-7) is never extracted. Correct extraction: `(instr.raw >> 6) & 0x3`.
|
||
|
||
### PPCBUG-370 — Zero interpreter tests for vperm / vperm128 (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1970-1995`
|
||
|
||
### PPCBUG-371 — Zero interpreter tests for vsldoi / vsldoi128 (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1997-2020`
|
||
|
||
### PPCBUG-372 — Zero interpreter tests for vpermwi128 (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:4087-4099`
|
||
|
||
### PPCBUG-373 — Zero interpreter tests for vmrghb / vmrglb / vmrghh / vmrglh (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3570-3600`
|
||
|
||
### PPCBUG-374 — Zero interpreter tests for vspltb / vsplth / vspltw / vspltw128 (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2022-2048`
|
||
|
||
### PPCBUG-375 — Zero interpreter tests for vspltisb / vspltish / vspltisw / vspltisw128 (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2050-2068`
|
||
|
||
### PPCBUG-376 — Zero interpreter tests for all vpk* (16 ops) + VSCR.SAT coverage (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3607-3718`
|
||
|
||
### PPCBUG-377 — Zero interpreter tests for vupkhsb / vupklsb / vupkhsh / vupklsh (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3722-3754`
|
||
|
||
### PPCBUG-378 — Zero interpreter tests for vpkd3d128 / vupkd3d128 (test gap; blocked on PPCBUG-363)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3783-3835`
|
||
|
||
IDs PPCBUG-379 through PPCBUG-419 are unallocated — no further bugs found in group 35.
|
||
|
||
---
|
||
|
||
## Batch 9 — VMX float arithmetic / compare / convert / estimate (group 36)
|
||
|
||
Per-group report: `audit-out/group-36-vmx-float.md`.
|
||
|
||
Group 36 summary: **21 findings (PPCBUG-420..440). 6 HIGH, 8 MEDIUM, 7 LOW.** The most
|
||
critical bugs are: (1) four VMX float compare VC-form opcodes use `rc_bit()` (bit 0) instead
|
||
of the correct VC-form Rc bit (bit 10) — CR6 is never updated, same root cause as PPCBUG-275;
|
||
(2) vmaddfp128 and vmaddcfp128 have their multiplicand and accumulator operands swapped —
|
||
every matrix multiply / Newton-Raphson step using these opcodes produces the wrong result;
|
||
(3) VMX128_R dot-form compares (vcmpeqfp128. etc.) decode as Invalid due to missing key4
|
||
entries in decode_op6.
|
||
|
||
**6 HIGH, 8 MEDIUM, 7 LOW. 21 IDs used (PPCBUG-420..440). 39 IDs unallocated (PPCBUG-441..479).**
|
||
|
||
### PPCBUG-420 — vcmpeqfp / vcmpgefp / vcmpgtfp: `rc_bit()` reads wrong bit; CR6 never updated
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Affected opcodes**: `vcmpeqfp.`, `vcmpgefp.`, `vcmpgtfp.`
|
||
- **Location**: `interpreter.rs:1875`, `1885`, `1895`
|
||
- **Symptom**: `rc_bit()` = `self.raw & 1` reads LSB bit 0. For VC-form the Rc flag is at
|
||
PPC bit 21 = LSB bit 10. All XO values (vcmpeqfp=198, vcmpgefp=454, vcmpgtfp=710) have
|
||
bit 0 = 0, so CR6 is never updated for any float compare dot form. `vcmpeqfp.` + `bc 12,24`
|
||
(branch all-equal) always falls through.
|
||
- **Cross-reference**: PPCBUG-275 (identical root cause for integer vcmp). Canary reads
|
||
`i.VXR.Rc` (ppc_emit_altivec.cc:625, 633, 641).
|
||
- **Fix**: Add `pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }` to
|
||
`decoder.rs` and replace `instr.rc_bit()` at interpreter.rs:1875, 1885, 1895.
|
||
|
||
### PPCBUG-421 — vcmpbfp: `rc_bit()` reads wrong bit (VC-form); Rc gate permanently dead
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:3428`
|
||
- **Symptom**: Same root cause as PPCBUG-420. XO=966, bit 0 = 0; CR6 update never fires
|
||
for `vcmpbfp.`. The CR6 value logic (`eq = !any_out`) is correct; only the gate is wrong.
|
||
- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:3428.
|
||
|
||
### PPCBUG-422 — vcmpeqfp128 / vcmpgefp128 / vcmpgtfp128 / vcmpbfp128: `rc_bit()` reads wrong bit (VX128_R-form)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1875`, `1885`, `1895`, `3428` (shared arms with non-128 forms)
|
||
- **Symptom**: For VX128_R-form, Rc is at PPC bit 27 = LSB bit 4 (confirmed from canary's
|
||
`VX128_R` bitfield: `uint32_t Rc : 1` at bit 4 from LSB). `rc_bit()` reads bit 0. Fix
|
||
PPCBUG-423 first (dot forms decode as Invalid before this even matters).
|
||
- **Fix**: Add `pub fn vx128r_rc_bit(&self) -> bool { (self.raw >> 4) & 1 != 0 }` and use
|
||
it in the VX128_R compare arms.
|
||
|
||
### PPCBUG-423 — vcmpeqfp128. / vcmpgefp128. / vcmpgtfp128. / vcmpbfp128.: dot forms decode as `Invalid`
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs:640-648` (decode_op6 VMX128 compare key4 table)
|
||
- **Symptom**: decode_op6 extracts `key4 = (bits22-24 << 3) | bit27`. When Rc=1, PPC bit 27
|
||
is set, making key4 = non-dot value + 1. Dot-form key4 values (1, 9, 17, 25, 33) are all
|
||
absent from the match table. Decoder returns `PpcOpcode::Invalid`. Any game shader using a
|
||
VMX128-form float compare dot form traps with unimplemented opcode.
|
||
- **Fix**: Add dot-form entries to the key4 match table mapping to the same opcodes (the
|
||
interpreter arm uses `instr.vx128r_rc_bit()` to conditionally update CR6):
|
||
```rust
|
||
0b000001 => return PpcOpcode::vcmpeqfp128,
|
||
0b001001 => return PpcOpcode::vcmpgefp128,
|
||
0b010001 => return PpcOpcode::vcmpgtfp128,
|
||
0b011001 => return PpcOpcode::vcmpbfp128,
|
||
0b100001 => return PpcOpcode::vcmpequw128,
|
||
```
|
||
|
||
### PPCBUG-424 — vmaddfp128: operand swap — computes VA×VB+VD instead of VA×VD+VB
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1771` (`r[i] = ai.mul_add(bi, di)`)
|
||
- **Symptom**: Canary (ppc_emit_altivec.cc:806-809) documents `(VD) <- (VA × VD) + VB` and
|
||
routes as `MulAdd(VA, VD, VB)`. Xenia-rs reads VA, VB, VD then computes
|
||
`ai.mul_add(bi, di)` = `VA × VB + VD` — VB and VD roles swapped. Every shader using
|
||
vmaddfp128 for matrix multiply or Newton-Raphson accumulation accumulates the wrong value.
|
||
The existing denorm-flush test aliases vA=vD=v2, making the swap invisible.
|
||
- **Fix**: `r[i] = ai.mul_add(di, bi);`
|
||
|
||
### PPCBUG-425 — vmaddcfp128: operand swap — computes VD×VB+VA instead of VA×VD+VB
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:4065` (`r[i] = di.mul_add(bi, ai)`)
|
||
- **Symptom**: Canary (ppc_emit_altivec.cc:819) documents `(VD) <- (VA × VD) + VB`.
|
||
Xenia-rs computes `VD × VB + VA`. Both the first multiplicand and the addend are wrong.
|
||
- **Fix**: `r[i] = ai.mul_add(di, bi);`
|
||
- **Test gap**: zero tests for `vmaddcfp128`. Add test with distinct VA, VB, VD registers.
|
||
|
||
### PPCBUG-426 — vnmsubfp: two rounding steps instead of fused FMA; NaN sign may be flipped
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1786` (`r[i] = bi - ai * ci`)
|
||
- **Symptom**: `vmaddfp` uses single-rounded `ai.mul_add(ci, bi)`, but `vnmsubfp` uses
|
||
`bi - ai * ci` (two operations, two rounding steps). ISA specifies a single fused operation.
|
||
Canary acknowledges the same limitation (ppc_emit_altivec.cc:1136). Additionally, the
|
||
implicit negation in subtraction may flip the sign bit of a NaN result (see PPCBUG-183).
|
||
- **Fix**: `r[i] = -ai.mul_add(ci, -bi);` — single FMA rounding: `-(ai*ci + (-bi))` = `bi - ai*ci`.
|
||
|
||
### PPCBUG-427 — vnmsubfp128: same two-rounding form as vnmsubfp
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1803` (`r[i] = di - ai * bi`)
|
||
- **Symptom**: Same class as PPCBUG-426 for the VMX128 form.
|
||
- **Fix**: `r[i] = -ai.mul_add(bi, -di);`
|
||
|
||
### PPCBUG-428 — vrefp / vrefp128: full-precision 1/x instead of ~12-bit hardware estimate
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1853` (`r[i] = 1.0 / b[i]`)
|
||
- **Symptom**: Same class as PPCBUG-184 (fresx). Xenon vrefp provides ~12-bit accuracy;
|
||
xenia-rs computes full IEEE-754 division. Canary also uses full precision in practice.
|
||
|
||
### PPCBUG-429 — vrsqrtefp / vrsqrtefp128: full-precision 1/sqrt(x) instead of ~12-bit estimate
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1862` (`r[i] = 1.0 / b[i].sqrt()`)
|
||
- **Symptom**: Same class as PPCBUG-428 for reciprocal square root.
|
||
|
||
### PPCBUG-430 — vexptefp / vexptefp128: full-precision exp2(x) instead of ~12-bit estimate
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3934` (`r[i] = b[i].exp2()`)
|
||
- **Symptom**: Same class as PPCBUG-428. NaN/Inf edge cases may diverge.
|
||
|
||
### PPCBUG-431 — vlogefp / vlogefp128: full-precision log2(x) instead of ~12-bit estimate
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3944` (`r[i] = b[i].log2()`)
|
||
- **Symptom**: Same class as PPCBUG-428.
|
||
|
||
### PPCBUG-432 — vrfin / vrfin128: Rust `round()` is round-half-away-from-zero; ISA requires round-to-nearest-even
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2172` (`r[i] = b[i].round()`)
|
||
- **Symptom**: `vrfin(0.5)` → ISA = 0.0; Rust = 1.0. `vrfin(2.5)` → ISA = 2.0; Rust = 3.0.
|
||
Canary uses SSE2 `ROUNDPS` which is round-to-nearest-even.
|
||
- **Fix**: Use `f32::round_ties_even()` (stable since Rust 1.77).
|
||
|
||
### PPCBUG-433 — vctsxs / vcfpsxws128: NaN input returns 0 instead of saturating to INT_MIN (0x80000000)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `vmx.rs:217` (`if x.is_nan() { return (0, true); }`)
|
||
- **Symptom**: AltiVec ISA: NaN in vctsxs saturates to INT_MIN (0x80000000). Xenia-rs returns 0.
|
||
- **Fix**: `if x.is_nan() { return (i32::MIN, true); }`
|
||
|
||
### PPCBUG-434 — vctuxs NaN → 0 is correct; informational
|
||
|
||
- **Severity**: LOW (wontfix)
|
||
- **Status**: wontfix
|
||
- **Location**: `vmx.rs:225`
|
||
- **Note**: Unsigned NaN saturates to 0 per ISA. Xenia-rs is correct. Add a comment.
|
||
|
||
### PPCBUG-435 — vaddfp / vsubfp / vmulfp128: subnormal inputs not flushed when VSCR.NJ=1
|
||
|
||
- **Severity**: MEDIUM (latent — Xbox 360 always boots with NJ=1)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1713`, `1729`, `1812`
|
||
- **Symptom**: VSCR.NJ=1 requires flush-to-zero for subnormal inputs. vmaddfp family correctly
|
||
calls `vmx::flush_denorm()`; plain add/sub/mul do not check VSCR.
|
||
|
||
### PPCBUG-436 — vmsum3fp128 / vmsum4fp128: per-product intermediates not individually flushed
|
||
|
||
- **Severity**: MEDIUM (latent)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:4076`, `4083`
|
||
- **Symptom**: `flush_denorm` on final sum only. Per-lane products can be subnormal and
|
||
accumulate before the final flush.
|
||
|
||
### PPCBUG-437 — vmaddfp / vmaddfp128 / vmaddcfp128 / vnmsubfp128: subnormal output not flushed
|
||
|
||
- **Severity**: MEDIUM (latent)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1752–1754`, `1771–1773`, `4064–4067`, `1803–1805`
|
||
- **Symptom**: VSCR.NJ=1 requires flushing subnormal results. Inputs flushed; outputs are not.
|
||
|
||
### PPCBUG-438 — Zero tests for vcmpeqfp / vcmpgefp / vcmpgtfp / vcmpbfp and dot forms
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` test module
|
||
|
||
### PPCBUG-439 — Zero tests for vrfiz / vrfin / vrfip / vrfim and 128-bit variants
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2158–2192`
|
||
|
||
### PPCBUG-440 — Zero tests for vctsxs / vctuxs / vcfsx / vcfux and 128-bit variants
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3842–3923`
|
||
|
||
IDs PPCBUG-441 through PPCBUG-479 are unallocated — no further bugs found in group 36.
|
||
|
||
---
|
||
|
||
## Batch 8 — VMX integer multiply-sum / multiply-half / sums / special (group 37)
|
||
|
||
Per-group report: `audit-out/group-37-vmx-mulsum.md`.
|
||
|
||
**Note**: All opcodes in this group are `XEINSTRNOTIMPLEMENTED()` stubs in xenia-canary; correctness is derived from the IBM ISA and `ppc-manual/vmx/` snapshots. `vrlimi128` is already tracked as PPCBUG-315.
|
||
|
||
### PPCBUG-482 — `vmhaddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)
|
||
|
||
- **Severity**: WITHDRAWN
|
||
- **Status**: no bug
|
||
- **Note**: Draft analysis suggested >>16; the spec snapshot `ppc-manual/vmx/vmhaddshs.md`
|
||
explicitly shows `prod = (VA[i]*VB[i]) >> 15` and the pathological-case example confirms
|
||
`0x8000*0x8000 >> 15 = 32768`. Xenia-rs matches the spec exactly. No code change.
|
||
|
||
### PPCBUG-483 — `vmhraddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)
|
||
|
||
- **Severity**: WITHDRAWN
|
||
- **Status**: no bug
|
||
- **Note**: `ppc-manual/vmx/vmhraddshs.md` explicitly shows `(product + 0x4000) >> 15`.
|
||
Xenia-rs matches. No code change needed.
|
||
|
||
### PPCBUG-487 — vsumsws/vsum2sws/vsum4sbs/vsum4ubs/vsum4shs: VB operand mis-named as "c"/"VC"
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3249-3307`
|
||
- **Symptom**: All five vsum* handlers use a VX-form instruction (two operands: VA and VB).
|
||
The code names the VB source `c` and the comment references "vC" — implying a non-existent
|
||
third register operand. Only `instr.ra()` and `instr.rb()` are valid for VX form; there is
|
||
no `rc()`. The arithmetic is correct (rb() is called), but the naming misleads maintainers
|
||
into thinking there is a VA-form three-operand encoding.
|
||
- **Fix**: Rename `c` → `b` and update comments to say `VB` instead of `vC` in all five
|
||
handler bodies.
|
||
|
||
### PPCBUG-490 — Zero tests for all six vmsum* opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` section
|
||
- **Symptom**: No unit test for `vmsumubm`, `vmsummbm`, `vmsumuhm`, `vmsumuhs`, `vmsumshm`,
|
||
`vmsumshs`. Critical missing: saturation + VSCR.SAT for `vmsumuhs`/`vmsumshs`; mixed-sign
|
||
byte product for `vmsummbm`; modulo wrap for `vmsumshm`.
|
||
|
||
### PPCBUG-491 — Zero tests for `vmhaddshs` and `vmhraddshs`
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` section
|
||
- **Symptom**: No test for either multiply-high-add instruction. Key cases: `VA = 0x8000`,
|
||
`VB = 0x8000` (minus-one-times-minus-one saturating case); `VA = VB = 0x7FFF, VC = 0x7FFF`
|
||
(add post-shift result to max accumulator). Verify VSCR.SAT is set on saturation and clear
|
||
on non-saturating inputs.
|
||
|
||
### PPCBUG-492 — Zero tests for `vmladduhm`
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` section
|
||
|
||
### PPCBUG-493 — Zero tests for all eight `vmule*` / `vmulo*` opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` section
|
||
- **Symptom**: No test for `vmuleub`, `vmuloub`, `vmulesb`, `vmulosb`, `vmuleuh`, `vmulouh`,
|
||
`vmulesh`, `vmulosh`. Key: even vs odd lane distinction (`vmulesh` vs `vmulosh`) is untested.
|
||
|
||
### PPCBUG-494 — Zero tests for all five vsum* opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` section
|
||
- **Symptom**: No test for `vsumsws`, `vsum2sws`, `vsum4sbs`, `vsum4ubs`, `vsum4shs`.
|
||
Missing: zero-output-lanes verification for `vsumsws` (w[0..2] must be 0) and `vsum2sws`
|
||
(w[0], w[2] must be 0); VSCR.SAT on overflow for all signed/unsigned variants.
|
||
|
||
### PPCBUG-495 — `vsumsws` comment says "vC[3]" should say "VB[3]"
|
||
|
||
- **Severity**: LOW (cosmetic)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3248`
|
||
|
||
IDs PPCBUG-480, PPCBUG-481, PPCBUG-482 (withdrawn), PPCBUG-483 (withdrawn), PPCBUG-484,
|
||
PPCBUG-485, PPCBUG-486, PPCBUG-488, PPCBUG-489, PPCBUG-496, PPCBUG-497, PPCBUG-498 are
|
||
either withdrawn (no bug found after re-examination), informational, or references to
|
||
existing IDs. IDs PPCBUG-499 through PPCBUG-509 are unallocated — no further bugs found
|
||
in group 37.
|
||
|
||
---
|
||
|
||
## Batch 8 — VMX load/store (group 38)
|
||
|
||
Per-group report: `audit-out/group-38-vmx-loadstore.md`.
|
||
|
||
**Opcodes**: lvebx, lvehx, lvewx, lvewx128, lvlx, lvlx128, lvlxl, lvlxl128, lvrx, lvrx128,
|
||
lvrxl, lvrxl128, lvsl, lvsl128, lvsr, lvsr128, lvx, lvx128, lvxl, lvxl128, stvebx, stvehx,
|
||
stvewx, stvewx128, stvlx, stvlx128, stvlxl, stvlxl128, stvrx, stvrx128, stvrxl, stvrxl128,
|
||
stvx, stvx128, stvxl, stvxl128.
|
||
|
||
Group 38 summary: The load family (lvx, lvxl, lvlx, lvrx, lvsl, lvsr, lvebx, lvehx, lvewx,
|
||
lvewx128 and all 128/LRU-hint variants) is arithmetically correct. EA computation, alignment
|
||
masking, big-endian byte ordering, RA=0 special cases, and lane indexing all match the ISA and
|
||
the `ea_indexed` helper. **5 HIGH bugs found** — the systemic `invalidate_for_write` gap
|
||
(PPCBUG-107 family) applies to ALL 16 VMX store opcodes, and `stvewx128` has an additional
|
||
severe memory-corruption bug (writes 16 bytes instead of 1 word). **1 MEDIUM** (behavioral
|
||
divergence between lvebx/lvehx/lvewx and canary's full-line simplification — xenia-rs is
|
||
architecturally more correct). **1 MEDIUM** (lvsr sh=0 edge-case correctness, documentation
|
||
gap). **3 LOW** test-coverage gaps.
|
||
|
||
### PPCBUG-510 — `stvewx128` stores all 16 bytes instead of one word; 12-byte memory corruption (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:2776-2781
|
||
- **Symptom**: Uses `& !0xF` (16-byte alignment) then stores all 16 bytes of the vector.
|
||
ISA semantics: word-align EA, extract the word lane `(EA & 0xF) >> 2`, store 4 bytes only.
|
||
The non-128 `stvewx` (interpreter.rs:1675-1687) is correct — `stvewx128` was not updated
|
||
to match. Corrupts 12 adjacent bytes on every execution.
|
||
- **Canary reference**: `InstrEmit_stvewx_` (cc:170-185) — `ea & ~3`, extract lane, `ByteSwap`,
|
||
store 4 bytes only. `stvewx128` routes through the same helper as `stvewx`.
|
||
- **Fix**: mirror the `stvewx` body with `instr.vs128()` substituted for `instr.rs()`.
|
||
|
||
### PPCBUG-511 — `stvx`, `stvx128`, `stvxl`, `stvxl128` missing `invalidate_for_write` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: interpreter.rs:1598-1603 (stvx), 1605-1610 (stvx128), 1699-1705 (stvxl/stvxl128)
|
||
- **Root cause**: PPCBUG-107 (systemic)
|
||
- **Symptom**: Under `--parallel`, a 16-byte stvx to a reserved line does not clear the
|
||
reservation table slot. The reserving thread's `stwcx.` spuriously succeeds.
|
||
- **Fix**: per PPCBUG-107 pattern — add `invalidate_for_write(ea)` guard before the byte loop.
|
||
|
||
### PPCBUG-512 — `stvebx`, `stvehx`, `stvewx`, `stvewx128` missing `invalidate_for_write` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: interpreter.rs:1655 (stvebx), 1664 (stvehx), 1675 (stvewx), 2776 (stvewx128)
|
||
- **Root cause**: PPCBUG-107 (systemic)
|
||
- **Note**: `stvewx128` must also fix PPCBUG-510 before adding the invalidation call (or the
|
||
invalidation covers the wrong, over-wide address range).
|
||
|
||
### PPCBUG-513 — `stvlx`, `stvlx128`, `stvlxl`, `stvlxl128` missing `invalidate_for_write` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: interpreter.rs:2746-2749 (stvlx/stvlxl), 2751-2754 (stvlx128/stvlxl128)
|
||
- **Root cause**: PPCBUG-107 (systemic)
|
||
- **Note**: partial stores can span a 128-byte line boundary when `ea & 0xF != 0` and
|
||
`n = 16 - shift` crosses the line; two `invalidate_for_write` calls may be needed.
|
||
|
||
### PPCBUG-514 — `stvrx`, `stvrx128`, `stvrxl`, `stvrxl128` missing `invalidate_for_write` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: interpreter.rs:2756-2759 (stvrx/stvrxl), 2761-2764 (stvrx128/stvrxl128)
|
||
- **Root cause**: PPCBUG-107 (systemic)
|
||
- **Note**: stvrx at shift=0 is a no-op (no bytes written); guard can skip the call in
|
||
that case. Otherwise invalidate `ea & !0xF` (the preceding aligned block).
|
||
|
||
### PPCBUG-515 — `lvebx`, `lvehx`, `lvewx` implement element semantics; canary uses full-line load (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1613-1653
|
||
- **Symptom**: xenia-rs places the loaded byte/halfword/word into the correct lane and preserves
|
||
other lanes from VD (ISA-correct for the "undefined" lanes). Canary does a full aligned
|
||
16-byte `lvx`-style load that overwrites all lanes. Both are valid under the ISA's "undefined"
|
||
specification, but game code compiled against canary may observe the canary behavior. The
|
||
divergence is documented and no code change is required unless canary compatibility becomes
|
||
an explicit goal.
|
||
|
||
### PPCBUG-516 — `lvsr` sh=0 produces {16,17,...,31}; correct per ISA but undocumented (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM (documentation gap — computation is correct)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:2218-2226
|
||
- **Symptom**: When EA is 16-byte aligned, `lvsr` produces byte values all >= 16 (the "select
|
||
entirely from VB" identity for `vperm`). The formula `(16 - sh) + i` cannot overflow u8
|
||
because `sh <= 15` guarantees `(16 - sh) + 15 <= 31`. No computation bug — but there is no
|
||
comment explaining why values > 15 are correct. Add a comment and a `debug_assert!(sh <= 15)`.
|
||
|
||
### PPCBUG-517 — Zero test coverage for lvlx/lvrx/stvlx/stvrx boundary edge cases (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: vmx.rs tests (lines 756-792); interpreter.rs test module
|
||
- **Missing**: shift=15 for lvlx (1 byte loaded), shift=1 for lvrx (15 bytes), stvlx/stvrx
|
||
round-trip, stvrx at shift=0 confirmed no-op, full lvlx+lvrx+vor unaligned memcpy idiom
|
||
verified byte-exact.
|
||
|
||
### PPCBUG-518 — Zero interpreter-level execution tests for all 36 VMX load/store opcodes (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs test module
|
||
- **Missing**: lvx alignment masking, stvx byte-order verification, lvebx lane placement,
|
||
lvsl/lvsr permute index values, lvewx128 after PPCBUG-510 fix. 17 recommended minimum tests
|
||
enumerated in per-group report.
|
||
|
||
### PPCBUG-519 — `stvrx` aligned no-op is silent; no debug trace (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: vmx.rs:284-292 (`store_vector_right`)
|
||
- **Symptom**: shift=0 returns immediately with no trace event. Confusing during memory-
|
||
visibility debugging. Add `tracing::trace!` in debug builds.
|
||
|
||
IDs PPCBUG-520 through PPCBUG-559 are unallocated — no further bugs found in group 38.
|
||
|
||
---
|
||
|
||
## Phase C1 — Decoder field extractors
|
||
|
||
Per-group report: `audit-out/phase-c1-decoder-fields.md`.
|
||
|
||
Comprehensive audit of all `DecodedInstr` field accessors in `decoder.rs` lines 21-165, cross-checked against ISA form specs, Canary `FormatXxx` structs, and the interpreter's inline re-extraction. Phase B already found PPCBUG-040/046/275/315/360-363/420-422. Phase C1 adds 8 new findings (PPCBUG-560..567).
|
||
|
||
**Confirmed-clean** (no new finding): `op`, `rd`/`rs`/`rt`, `ra`, `rb`, `rc`, `simm16`, `uimm16`, `d`, `ds`, `li`, `bd`, `bo`, `bi`, `aa`, `lk`, `oe`, `to`, `mb`/`me` (M-form only), `sh`, `spr`, `crm`, `crfd`/`crfs`, `l`, `crbd`/`crba`/`crbb`, `nb`, `va128`/`vb128`/`vd128`/`vs128`, `extract_vx128_uimm5`.
|
||
|
||
### PPCBUG-560 — sh64() test helper wrong bit order; masks PPCBUG-040 from unit tests (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `xenia-rs/crates/xenia-cpu/tests/disasm_goldens.rs:160-176` (function `rldicl`)
|
||
- **Symptom**: The `rldicl` test helper encodes `sh[5:1]` at PPC bits 16-20 and `sh[0]` at PPC bit 30. The ISA encodes `sh[4:0]` at PPC bits 16-20 and `sh[5]` at PPC bit 30. The wrong `sh64()` formula `(sh_lo << 1) | sh_hi` correctly inverts the wrong encoding, making the test pass — but fails on real binary code.
|
||
|
||
**Counterexamples** (ISA-encoded input → `sh64()` output):
|
||
|
||
| True shift | sh64() result | Error |
|
||
|-----------|--------------|-------|
|
||
| 1 | 2 | +1 |
|
||
| 16 | 32 | +16 |
|
||
| 32 | 1 | -31 |
|
||
| 33 | 3 | -30 |
|
||
| 63 | 63 | 0 (coincidence) |
|
||
|
||
Only `sh=0` and `sh=63` decode correctly. All other shifts (1-62) are wrong against real code.
|
||
|
||
- **Fix for `sh64()`** (per PPCBUG-040):
|
||
```rust
|
||
pub fn sh64(&self) -> u32 {
|
||
(extract_bits(self.raw, 30, 30) << 5) | extract_bits(self.raw, 16, 20)
|
||
}
|
||
```
|
||
- **Fix for test helper** (must be in same commit):
|
||
```rust
|
||
// Correct: sh_lo = sh & 0x1F → PPC bits 16-20; sh_hi = sh >> 5 → PPC bit 30
|
||
(30 << 26) | (rs << 21) | (ra << 16) | ((sh & 0x1F) << 11)
|
||
| (mb_lo << 6) | (mb_hi << 5) | (0 << 2) | ((sh >> 5) << 1) | rc
|
||
```
|
||
- **Cross-reference**: PPCBUG-040 (primary finding). PPCBUG-560 is the test-infrastructure companion.
|
||
|
||
### PPCBUG-561 — Missing `mb_md()` accessor on `DecodedInstr`; interpreter inlines wrong formula at 6 sites (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs` — accessor absent; `disasm.rs:1256` has correct local helper; `interpreter.rs` lines 696, 706, 716, 726, 736, 746 each inline the wrong formula
|
||
- **Symptom**: Interpreter uses `(instr.mb() << 1) | ((instr.raw >> 1) & 1)` which: (a) reads `SH5` (PPC bit 30, host bit 1) instead of `MB5` (PPC bit 26, host bit 5) as the high bit; (b) places the high bit at position 0 instead of position 5. `disasm.rs` has the correct version already — expose it as `DecodedInstr::mb_md()`.
|
||
- **Cross-reference**: PPCBUG-046 (primary finding).
|
||
|
||
- **Fix**:
|
||
```rust
|
||
// Add to decoder.rs:
|
||
#[inline] pub fn mb_md(&self) -> u32 {
|
||
extract_bits(self.raw, 21, 25) | (extract_bits(self.raw, 26, 26) << 5)
|
||
}
|
||
```
|
||
Replace all 6 inline sites in `interpreter.rs` with `instr.mb_md()`.
|
||
|
||
### PPCBUG-562 — Missing `vc_rc_bit()` and `vx128r_rc_bit()` per-form Rc accessors (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs` — no per-form Rc accessors; `interpreter.rs` uses generic `rc_bit()` (bit 31) for both VC and VX128_R forms
|
||
- **Symptom**: Generic `rc_bit()` reads PPC bit 31 (LSB). VC-form Rc is at PPC bit 21 = `(raw >> 10) & 1`. VX128_R-form Rc is at PPC bit 27 = `(raw >> 4) & 1`. Using bit 31 for these forms means the CR6 update gate is permanently disabled for all dot-form VMX vector compares — root cause of PPCBUG-275/420/421/422.
|
||
- **Fix**:
|
||
```rust
|
||
/// Rc for VC-form vector compare (vcmpeqfp, vcmpgefp, vcmpgtfp, vcmpbfp, etc.) — PPC bit 21.
|
||
#[inline] pub fn vc_rc_bit(&self) -> bool { extract_bits(self.raw, 21, 21) != 0 }
|
||
/// Rc for VX128_R-form compare (vcmpeqfp128, vcmpgefp128, etc.) — PPC bit 27.
|
||
#[inline] pub fn vx128r_rc_bit(&self) -> bool { extract_bits(self.raw, 27, 27) != 0 }
|
||
```
|
||
- **Cross-reference**: PPCBUG-275 / PPCBUG-420 / PPCBUG-421 / PPCBUG-422.
|
||
|
||
### PPCBUG-563 — Missing `vx128_4_z()` and `vx128_4_imm()` for VX128_4 form (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs` — accessors absent; `interpreter.rs:3551-3552` (vrlimi128) reads wrong bit positions
|
||
- **Symptom**: VX128_4 form has `IMM` (5-bit) at PPC bits 11-15 (host bits 16-20) and `z` (2-bit) at PPC bits 24-25 (host bits 6-7). Interpreter `vrlimi128` uses `(raw >> 16) & 0x3` for shift (reads VB128l partial) and `(raw >> 2) & 0xF` for mask (reads VD128h region).
|
||
- **Fix**:
|
||
```rust
|
||
#[inline] pub fn vx128_4_imm(&self) -> u32 { extract_bits(self.raw, 11, 15) }
|
||
#[inline] pub fn vx128_4_z(&self) -> u32 { extract_bits(self.raw, 24, 25) }
|
||
```
|
||
- **Cross-reference**: PPCBUG-315.
|
||
|
||
### PPCBUG-564 — Missing `vx128_p_perm()` for VX128_P form; PERMh reads XO bits (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:4089` (vpermwi128) uses `(raw >> 16) & 0xFF` which reads PERMl (correct) but uses XO/reserved bits 21-23 for PERMh instead of PPC bits 23-25
|
||
- **Symptom**: Top 3 bits of the 8-bit PERM selector are wrong for every `vpermwi128` instruction. Lane selections for words 0 and 1 are garbage.
|
||
- **Fix**:
|
||
```rust
|
||
#[inline] pub fn vx128_p_perm(&self) -> u32 {
|
||
extract_bits(self.raw, 11, 15) | (extract_bits(self.raw, 23, 25) << 5)
|
||
}
|
||
```
|
||
- **Cross-reference**: PPCBUG-362.
|
||
|
||
### PPCBUG-565 — Missing `vx128_5_sh()` for VX128_5 form; vsldoi128 MSB reads reserved bit (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:2012` (vsldoi128) uses `(raw >> 4) & 0x1` for the shift MSB (reads PPC bit 27 = reserved) instead of PPC bit 22 = host bit 9 = `(raw >> 9) & 1`
|
||
- **Symptom**: vsldoi128 shift amounts ≥ 8 (where the 4th bit matters) use a garbage bit. The correct 4-bit SH is at PPC bits 22-25 (host bits 6-9) = `(raw >> 6) & 0xF`.
|
||
- **Fix**:
|
||
```rust
|
||
#[inline] pub fn vx128_5_sh(&self) -> u32 { extract_bits(self.raw, 22, 25) }
|
||
```
|
||
- **Cross-reference**: PPCBUG-361.
|
||
|
||
### PPCBUG-566 — Missing XER TBC field accessor documentation for lswx/stswx (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs` — XER[25:31] (7-bit transfer byte count) is runtime state, not an instruction field; no accessor exists and no documentation notes the gap
|
||
- **Symptom**: `lswx`/`stswx` use XER[25:31] as their byte count. The interpreter has no way to read this via the normal accessor pattern. Not a bit-position error, but a structural gap.
|
||
- **Recommendation**: add `ctx.xer_tbc() -> u8` to `PpcContext` returning `(ctx.xer() >> 25) & 0x7F`. Document that these are the only instructions that read XER as a count operand.
|
||
|
||
### PPCBUG-567 — Zero unit tests pin any scalar field accessor (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs` unit tests; `tests/disasm_goldens.rs`
|
||
- **Symptom**: Phase 4 tests pin `va128`/`vb128`/`vd128`/`vs128` only. No test verifies: `sh64()` against ISA-encoded instructions (existing test validates wrong round-trip — PPCBUG-560), `mb_md()` (absent), `vc_rc_bit()`/`vx128r_rc_bit()` (absent), `ds()` for negative displacement, `spr()` for LR/CTR/XER beyond DEC.
|
||
- **Recommended additions**: decoder-level unit tests using ISA-correct encodings for `sh64`, `mb_md`, the two new Rc accessors, `ds` negative, `spr` for LR=8 and CTR=9. See phase-c1-decoder-fields.md for concrete encoding examples.
|
||
|
||
IDs PPCBUG-568 through PPCBUG-599 are unallocated — no further bugs found in Phase C1 scope.
|
||
|
||
---
|
||
|
||
## Phase C2 — Decoder opcode-lookup tables
|
||
|
||
Per-group report: `audit-out/phase-c2-decoder-lookup.md`.
|
||
|
||
**Methodology**: complete line-by-line comparison of all `decode_opNN` functions in
|
||
`xenia-rs/crates/xenia-cpu/src/decoder.rs` against
|
||
`xenia-canary/src/xenia/cpu/ppc/ppc_opcode_lookup_gen.cc`, plus cross-reference of
|
||
`ppc-manual/forms/` for VC, VX128_R, VX128_5, VA, VX128_3, VX128_4 forms.
|
||
|
||
**Overall verdict**: the decoder is structurally sound and entry-by-entry matches
|
||
Canary for all real Xbox 360 instructions, with one pre-known exception (PPCBUG-600 =
|
||
PPCBUG-423). Zero new wrong-entry bugs. One new medium-severity cross-reference bug
|
||
(dot-form gap), one medium maintainability risk (key-ordering dependency), three LOWs
|
||
(test gaps, reserved-encoding misidentification, undocumented fast-path).
|
||
|
||
### PPCBUG-600 — `decode_op6` key4: VMX128 compare dot-forms decode as Invalid (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM (cross-reference for PPCBUG-423; same root cause, Phase C2 ID)
|
||
- **Status**: applied (52b05b1, 2026-05-01) (dup-of:423 for the fix; this ID is for Phase C2 tracking)
|
||
- **Location**: `decoder.rs:640-648` (`decode_op6`, key4 match table)
|
||
- **Symptom**: The VX128_R form places `Rc` at PPC bit 27. The key4 formula is
|
||
`(bits 22-24 << 3) | bit27`. When Rc=1 (dot-form), bit27=1 and key4 is odd.
|
||
Only even key4 values are in the table. Five dot-form encodings fall through to
|
||
`PpcOpcode::Invalid`:
|
||
- `vcmpeqfp128.` → key4=0b000001 (1), decodes as Invalid
|
||
- `vcmpgefp128.` → key4=0b001001 (9), decodes as Invalid
|
||
- `vcmpgtfp128.` → key4=0b010001 (17), decodes as Invalid
|
||
- `vcmpbfp128.` → key4=0b011001 (25), decodes as Invalid
|
||
- `vcmpequw128.` → key4=0b100001 (33), decodes as Invalid
|
||
- **Contrast**: standard VMX VC-form compares (op=4 key3) are correct because their
|
||
Rc bit (bit21) is outside the key3 window (bits 22-31). VMX128_R uses a different
|
||
form where Rc is at bit27, which is inside the key4 window.
|
||
- **Fix**: Add 5 dot-form entries to key4 in `decode_op6`:
|
||
```rust
|
||
0b000001 => return PpcOpcode::vcmpeqfp128,
|
||
0b001001 => return PpcOpcode::vcmpgefp128,
|
||
0b010001 => return PpcOpcode::vcmpgtfp128,
|
||
0b011001 => return PpcOpcode::vcmpbfp128,
|
||
0b100001 => return PpcOpcode::vcmpequw128,
|
||
```
|
||
The interpreter's existing `instr.rc_bit()` check already handles CR6 update for
|
||
dot-forms — decoder just needs to emit the right opcode.
|
||
- **See also**: PPCBUG-423 (Phase B original finding) for impact assessment and
|
||
full context.
|
||
|
||
### PPCBUG-601 — `decode_op6` key ordering creates undocumented correctness dependency (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM (maintainability risk; no current wrong-decode for real code)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs:603-637` (`decode_op6`, key1/key2/key3 dispatch)
|
||
- **Symptom**: key1 (bits 21-22 << 5 | bits 26-27), key2 (bits 21-23 << 4 | bits 26-27),
|
||
and key3 (bits 21-27) all overlap. Correctness depends on an implicit invariant:
|
||
vpkd3d128 and vrlimi128 (matched by key2) always have bits 26-27 = `01`, while all
|
||
15 key3 unary entries always have bits 26-27 = `11`. If a future instruction were
|
||
added to key2 with bits 26-27 = `11`, it would shadow a key3 entry. No comment in
|
||
the source documents this constraint.
|
||
- **Fix**: Add a comment block above the key2/key3 dispatches explaining the invariant:
|
||
```
|
||
// key2 matches bits 26-27 == 01 only (vpkd3d128, vrlimi128).
|
||
// key3 entries all have bits 26-27 == 11. No overlap is possible
|
||
// for any currently-defined Xbox 360 instruction.
|
||
```
|
||
|
||
### PPCBUG-602 — `decode_op4` vsldoi128 fallback: over-broad single-bit catch-all (LOW)
|
||
|
||
- **Severity**: LOW (only fires for reserved/undefined encodings in practice)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs:558-561`
|
||
- **Symptom**: The VX128_5 form for vsldoi128 is identified by op=4, bit27=1. The
|
||
dispatch uses a bare `if extract_bits(code, 27, 27) == 1` after the other tables,
|
||
rather than an exact VX128_5-form check. Reserved VA extended opcodes that happen
|
||
to have their key4 bit4 (= word bit27) set decode as vsldoi128 instead of Invalid.
|
||
Example: VA XO=0b100011 (35, reserved gap between vmladduhm=34 and vmsumubm=36)
|
||
— key4 misses, bit27=1 fires → decoded as vsldoi128. ISA specifies reserved
|
||
encodings should trap; this silently assigns a meaning.
|
||
- **Fix (optional)**: Strengthen to an exact match:
|
||
```rust
|
||
// VX128_5 form: SH@22-25, VA128h@26, XO=bit27. Bits 28-31 carry VD128h/VB128h.
|
||
// Only vsldoi128 uses this form. Verify the XO bit and absence of load/store marker.
|
||
if extract_bits(code, 27, 27) == 1 && extract_bits(code, 30, 31) != 0b11 {
|
||
return PpcOpcode::vsldoi128;
|
||
}
|
||
```
|
||
Alternatively, accept current behavior and add a comment.
|
||
|
||
### PPCBUG-603 — Primary opcode 9 maps to Invalid; correct but undocumented (LOW)
|
||
|
||
- **Severity**: LOW (test gap / documentation only)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs:369` (the `_ => PpcOpcode::Invalid` arm of `lookup_opcode`)
|
||
- **Symptom**: Primary opcode 9 (`dozi` in original POWER ISA) is undefined on
|
||
Xenon/750CL and correctly decodes as Invalid. Canary also returns `PPC_DECODER_MISS`.
|
||
No comment documents this intentional absence.
|
||
- **Fix**: Add `// 9 = dozi (POWER-only, not present on Xenon)` comment near the
|
||
match, or explicitly add `9 => PpcOpcode::Invalid` with a comment.
|
||
|
||
### PPCBUG-604 — Zero decoder unit tests for decode_op5, decode_op6, decode_op30, decode_op63 (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs:897-1107` (test module)
|
||
- **Symptom**: The 10 existing decoder tests cover addi, lwz, branch, stw, ori, and
|
||
cache mechanics. None exercise VMX128 (op=5, op=6), rotate-doubleword (op=30), or
|
||
FPU (op=63) opcode paths. In particular, no test would have caught PPCBUG-600
|
||
(vcmpeqfp128 dot-form decodes as Invalid) before it caused a runtime trap.
|
||
- **Recommended minimum additions** (8 tests):
|
||
1. `vcmpeqfp128` (Rc=0) → decodes as `vcmpeqfp128`.
|
||
2. `vcmpeqfp128.` (Rc=1) → decodes as `vcmpeqfp128` (tests PPCBUG-600 fix).
|
||
3. `vcmpeqfp` (op=4, Rc=0) → key3 check, bit21=0.
|
||
4. `vcmpeqfp.` (op=4, Rc=1) → key3 check, bit21=1, same decode.
|
||
5. `vsldoi128` (op=4, bit27=1) → fallback fires.
|
||
6. `rldicl` (op=30) → decode_op30.
|
||
7. `fadd` (op=63, Rc=0) → arithmetic table.
|
||
8. `fadd.` (op=63, Rc=1) → same decode as fadd.
|
||
|
||
### PPCBUG-605 — `decode_op31` sradix fast-path is correct but undocumented (LOW)
|
||
|
||
- **Severity**: LOW (documentation gap only)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs:702-705`
|
||
- **Symptom**: The sradix pre-check uses bits 21-29 (9 bits). The subsequent main
|
||
table uses bits 21-30 (10 bits). Because no main-table entry has bits 21-29 =
|
||
0b110011101, the fast-path cannot shadow a legitimate main-table entry. However,
|
||
this is not documented in the source, and a reader might worry that sradix (Rc=0,
|
||
bits 21-30 = 0b1100111010) or sradix. (Rc=1, same bits 21-30) could conflict with
|
||
a future entry at key 0b1100111010.
|
||
- **Fix**: Add a comment: `// sradix: XS-form, XO=413 (bits 21-29=0b110011101).`
|
||
`// No main-table entry uses bits 21-30 starting with 0b110011101x.`
|
||
|
||
IDs PPCBUG-606 through PPCBUG-639 are unallocated — no further bugs found in Phase C2.
|
||
|
||
---
|
||
|
||
## Phase C3 — Disassembler formatter parity
|
||
|
||
Per-group report: `audit-out/phase-c3-disasm.md`.
|
||
|
||
**Methodology**: Full line-by-line audit of `disasm.rs:format()` and all ~70 per-class helpers.
|
||
Cross-referenced against `xenia-canary/src/xenia/cpu/ppc/ppc_opcode_disasm_gen.cc`,
|
||
`tests/golden/extended_mnemonics.json`, and `tests/golden/base_mnemonics.json`.
|
||
Checked: mnemonic correctness (Rc/OE/LK/AA/L-field), operand formatting (signed vs unsigned,
|
||
hex vs decimal), simplified-mnemonic priority, branch-condition extended forms, VMX register
|
||
naming, VX128 field extraction, and golden test coverage.
|
||
|
||
**Overall verdict**: The formatter is structurally sound. All OE/Rc/LK/AA suffix handling, the
|
||
simplified mnemonic priority order, VMX 5-bit and VMX128 7-bit register naming, SPR mnemonics,
|
||
and CR-logical extended forms are correct. Two HIGH bugs found: the `bdnz`/`bdz` extended
|
||
mnemonic appends a spurious condition suffix, and the pre-existing `sync`/`lwsync` bug
|
||
(PPCBUG-088) is re-assessed as HIGH in disassembler scope. Two MEDIUM bugs: decimal vs hex
|
||
for SIMM immediates and D-form displacements (diverges from every real PPC disassembler).
|
||
Several LOW findings for golden fixture correctness and edge cases.
|
||
|
||
**Key finding**: the disassembler's VX128 field extraction (vperm128 VC, vsldoi128 SH,
|
||
vpermwi128 PERM) is CORRECT in all three cases where the interpreter (PPCBUG-360/361/362)
|
||
has the wrong extraction. The disassembler was written independently and got them right.
|
||
|
||
### PPCBUG-640 — `fmt_bc`: pure `bdnz`/`bdz` emits `bdnzge`/`bdzge` (spurious condition suffix) (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:829-834`
|
||
- **Symptom**: For `bcx` with BO=16 (`bdnz`: decrement CTR, branch if CTR≠0, CR ignored):
|
||
- `decr = (16 & 4) == 0` = true
|
||
- `uncond = (16 & 16) != 0` = true
|
||
- Code falls into the `if decr` branch and computes `cond_name_opt` from `(cr_bit=0, cond_true=false)` → `Some("ge")`
|
||
- Emits: **`bdnzge`** — WRONG. ISA simplified form is `bdnz`.
|
||
|
||
For BO=18 (`bdz`): same path → **`bdzge`** — WRONG.
|
||
|
||
The bug is absent in `fmt_bclr` which has an explicit `if decr && uncond` guard at line 872
|
||
producing `bdnzlr`/`bdzlr` correctly. `fmt_bc` lacks this guard.
|
||
|
||
The golden fixture "bdnz 0x82000040" (PPCBUG-650 companion) pins the wrong output.
|
||
|
||
- **Fix**: In `fmt_bc`, inside the `if decr` block, gate the condition string on `!uncond`:
|
||
```rust
|
||
if decr {
|
||
let z = if bo & 0x02 != 0 { "z" } else { "nz" };
|
||
let cond_str = if uncond { "" } else { cond_name_opt.unwrap_or("") };
|
||
let ext_mnem = format!("bd{z}{cond_str}{a}{l}");
|
||
let ext_ops = format!("{cr}0x{target:08X}");
|
||
with_ext(&base_mnem, base_ops, 8, &ext_mnem, ext_ops, 8)
|
||
}
|
||
```
|
||
Also update golden fixtures PPCBUG-650.
|
||
|
||
- **Impact**: All analysis-DB queries for `bdnz` loops (common in pixel-shader and vertex
|
||
processing loops) return zero rows; they are stored as `bdnzge`. Developers inspecting
|
||
loop structures see a misleading condition name on a CTR-only branch.
|
||
|
||
### PPCBUG-641 — `sync` emits `"sync"` for `lwsync` (L=1) — re-assessment of PPCBUG-088 (HIGH)
|
||
|
||
- **Severity**: HIGH (disassembler scope; PPCBUG-088 was LOW for interpreter scope)
|
||
- **Status**: open (see PPCBUG-088 for fix)
|
||
- **Location**: `disasm.rs:364`
|
||
- **Symptom**: `PpcOpcode::sync` always emits `"sync"`. The L-field at PPC bit 10 selects
|
||
`lwsync` (L=1, encoding `0x7C2004AC`). `lwsync` is the acquire barrier in every Xbox 360
|
||
spinlock. Every `lwsync` in the disassembly DB is stored as `mnemonic='sync'`.
|
||
`SELECT * WHERE mnemonic='lwsync'` returns zero rows regardless of binary content.
|
||
- **Note**: the golden fixture for lwsync (PPCBUG-649) currently pins the wrong output.
|
||
|
||
### PPCBUG-642 — `fmt_bcctr` missing extended form for CTR-decrement/ignore-CR BO values (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:880-902`
|
||
- **Symptom**: `bcctrx` with BO=16 (decrement CTR, ignore CR) falls through to `base()` with
|
||
no extended form. `fmt_bclr` (the equivalent for bclrx) correctly handles the same case with
|
||
an explicit `decr && uncond` check at line 872, producing `bdnzlr`.
|
||
Note: `bcctr` with CTR-decrement is undefined by PowerISA; this encoding should never appear
|
||
in valid compiled code. The inconsistency is a maintenance concern rather than a runtime bug.
|
||
- **Fix**: Add a `decr && uncond` check before the `cond_branch_ext` call in `fmt_bcctr`,
|
||
mirroring lines 872-876 in `fmt_bclr`. Or add a comment explaining the ISA undefined status.
|
||
|
||
### PPCBUG-643 — SIMM immediate display: decimal diverges from Canary and real disassemblers (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:946` (addi), `976` (addic), `989` (subfic), `990` (mulli),
|
||
`1003` (cmpi), `1048-1061` (fmt_ld/fmt_st), and all similar SIMM sites
|
||
- **Symptom**: SIMM immediates are formatted via Rust's `{imm}` (decimal). Canary uses
|
||
`"-0x{:X}"` / `"0x{:X}"` (signed hex) for every SIMM field. GNU objdump, IDA Pro,
|
||
and all standard PPC disassemblers use hex. The inconsistency is internal to xenia-rs:
|
||
`addis`/`oris`/`xoris` use hex (`0x{imm_u:X}`), but `addi`/`addic`/`mulli` use decimal.
|
||
This misleads analysis-DB queries that mix instructions (e.g. `addi r3, r1, -4` vs
|
||
`addis r3, r0, 0x8000`).
|
||
- **Impact**: Medium — the output is not *wrong* (the value is correctly computed), but
|
||
cross-referencing with Canary output or objdump requires manual conversion.
|
||
|
||
### PPCBUG-644 — D-form load/store displacement uses decimal instead of hex (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:1053` (`fmt_ld`), `1061` (`fmt_st`), `1069` (`fmt_ds`)
|
||
- **Symptom**: `format!("{rn}, {d}({})", gpr(ra))` outputs decimal for the displacement.
|
||
Canary outputs `"-0x8(r1)"` not `"-8(r1)"`. Every standard PPC disassembler uses hex.
|
||
Affects 25+ D-form and DS-form opcodes. Negative displacements (-8, -16, etc.) are
|
||
especially confusing in decimal when reading stack frame accesses.
|
||
- **Fix**:
|
||
```rust
|
||
let d_str = if d < 0 { format!("-0x{:X}", -d) } else { format!("0x{:X}", d) };
|
||
base(mnem, format!("{rn}, {d_str}({})", gpr(ra)), 8)
|
||
```
|
||
Update all golden fixture rows with displacement values.
|
||
|
||
### PPCBUG-645 — `cntlzdx` Rc suffix: moot for valid encodings, but WONTFIX (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: wontfix
|
||
- **Location**: `disasm.rs:286`
|
||
- **Note**: `fmt_x_unary_rc` would emit `cntlzd.` for Rc=1, but valid `cntlzd` encodings
|
||
always have Rc=0. Canary emits `cntlzd` always. No impact for valid code.
|
||
|
||
### PPCBUG-646 — `fmt_rlwimi` inslwi/insrwi priority overlap: confirmed correct (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: wontfix
|
||
- **Note**: After careful analysis, the `inslwi` guard excludes `insrwi` overlap cases
|
||
(`sh != 31u32.wrapping_sub(me)`). Priority is correct. Informational only.
|
||
|
||
### PPCBUG-647 — `fmt_rlwinm` `extrwi` uses `wrapping_sub` which can give misleading results for invalid encodings (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:1137`
|
||
- **Symptom**: `let b = sh.wrapping_sub(n) % 32;` — for invalid `sh < n` encodings,
|
||
`wrapping_sub` gives a large u32, `% 32` gives a confusing value. For all compiler-emitted
|
||
encodings `sh >= n` holds. Add `&& sh >= 32 - mb` to the guard to avoid the fallthrough.
|
||
|
||
### PPCBUG-648 — `fmt_mftb` TBR=268: ext mnemonic identical to base mnemonic (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:1443`
|
||
- **Symptom**: `268 => with_ext("mftb", base_ops, 8, "mftb", gpr(rd), 8)` — base is `mftb`,
|
||
extended is also `mftb`. `display()` picks the extended form (omitting the `268` operand),
|
||
making it ambiguous vs. `mftbu`. Consider: either emit base-only (`mftb r3, 268`) or rename
|
||
the base to `mftb.raw` for disambiguation.
|
||
|
||
### PPCBUG-649 — Golden fixture for `lwsync` pins wrong output (no ext_mnemonic) (LOW)
|
||
|
||
- **Severity**: LOW (test coverage gap)
|
||
- **Status**: open
|
||
- **Location**: `tests/golden/extended_mnemonics.json`, entry "lwsync"
|
||
- **Symptom**: Fixture has `mnemonic: "sync"` and no `ext_mnemonic`. After PPCBUG-088/641
|
||
fix, expected output is `mnemonic: "sync"`, `ext_mnemonic: "lwsync"`. Current fixture
|
||
defeats regression detection — the test passes with wrong output.
|
||
|
||
### PPCBUG-650 — Golden fixtures for `bdnz`/`bdz` pin wrong extended mnemonic (LOW)
|
||
|
||
- **Severity**: LOW (companion to PPCBUG-640)
|
||
- **Status**: open
|
||
- **Location**: `tests/golden/extended_mnemonics.json`, rows "bdnz 0x82000040" and "bdz 0x82000040"
|
||
- **Symptom**: Both rows have `ext_mnemonic: "bdnzge"` and `ext_mnemonic: "bdzge"`.
|
||
After PPCBUG-640 fix, correct values are `"bdnz"` and `"bdz"`.
|
||
|
||
### PPCBUG-651 — `fmt_vmx128_pack_d3d` shared by `vpkd3d128` and `vrlimi128`: confirmed correct (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: wontfix
|
||
- **Note**: Both opcodes use VX128_4 form. Shared formatter outputs identical operand lists
|
||
(`vd, vb, imm, z`) which is correct for both. Informational only.
|
||
|
||
### PPCBUG-652 — Zero golden fixtures for any VMX128 opcode disassembly (LOW)
|
||
|
||
- **Severity**: LOW (test coverage gap)
|
||
- **Status**: open
|
||
- **Location**: `tests/golden/` — all three JSON files
|
||
- **Symptom**: No fixture pins the formatted output of any VMX128 instruction. Regressions
|
||
in VMX128 field extraction (e.g. a re-introduction of PPCBUG-360/361/362 in the disassembler)
|
||
would be invisible. Recommend adding at minimum: `vaddfp128`, `vperm128`, `vsldoi128`,
|
||
`vpkd3d128`, `vcmpeqfp128.`, `vmaddfp128`.
|
||
|
||
### PPCBUG-653 — `fmt_trap_imm` unconditional trap extended form: confirmed not-a-bug (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: wontfix
|
||
- **Note**: `twi 31, rA, IMM` (to=31) has no ISA simplified mnemonic unless RA=0 and IMM=0
|
||
(which matches `tw 31, r0, r0 = trap`). The `fmt_trap_imm` correctly emits base-only for
|
||
`twi 31, rA, N`. Informational.
|
||
|
||
### PPCBUG-654 — `fmt_rldimi` `insrdi` guard excludes valid `mb=0` (b=0) case (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:1220`
|
||
- **Symptom**: Guard `if mb > 0` excludes `insrdi rA, rS, n, 0` (b=0 → mb=0). A valid
|
||
compiler-emitted `rldimi` with sh+mb+n=64 and mb=0 falls through to base form instead of
|
||
displaying the `insrdi` simplified mnemonic.
|
||
- **Fix**: Remove the `mb > 0` guard; the inner `n > 0` guard is sufficient to avoid
|
||
degenerate cases.
|
||
|
||
IDs PPCBUG-655 through PPCBUG-679 are unallocated — no further bugs found in Phase C3.
|