Document the post-P8 cross-cutting review and acid test outcome: End-to-end reviewer caught: - BLOCKING-LIKELY: lwa/lwax/lwaux ISA deviation (fixed inf1166d0) - Cosmetic: fpscr round_single_toward_zero duplicate-branch (fixed in09c6c92) - Minor performance: reservation table active_reservers as slot-occupancy - Asymmetry note: extswx remains 64-bit ABI per audit PPCBUG-038 (wontfix) Acid test (-n 4B --parallel --reservations-table, pre-lwa-hotfix build): - swaps=1, draws=0 - exit 0, no panics, no errors, no RtlRaiseException - 14 thread spawns, 2 LR-sentinel exits - Renderer plateau NOT unblocked by cumulative P1-P8 correctness fixes Implication: the Sylpheed `draws=0` plateau has a non-PPC-correctness root cause. PPC fixes were correctness-justified independent of the renderer (well-grounded against canary). Next investigation tracks: graphics pipeline (EDRAM resolve, RT readback), kernel HLE (event signaling, timers), or the unresolved BST-validation paradox per `project_xenia_rs_sylpheed_event_chain_2026_04_29.md`. Out of scope for the PPC instruction audit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
630 lines
41 KiB
Markdown
630 lines
41 KiB
Markdown
# PPC Instruction Audit — Triaged Report (2026-04-29)
|
||
|
||
**Status**: audit complete. **No code modified.** This file is the fix-order plan for the follow-up session.
|
||
**Source of truth**: detailed bug entries (one heading per PPCBUG ID) live in `audit-findings.md`. This file references every entry by ID so nothing is lost — it does not duplicate the per-bug detail.
|
||
|
||
## Counts
|
||
|
||
- **Total findings**: 253 PPCBUG IDs, of which 4 are explicitly retracted/withdrawn (PPCBUG-220, 222, 226, 482, 483 — see Notes section).
|
||
- **Net findings**: ~248 actionable.
|
||
- **Severity breakdown** (rough):
|
||
- HIGH: ~55 (~22%)
|
||
- MEDIUM: ~75 (~30%)
|
||
- LOW (test gaps + cosmetic + informational): ~118 (~48%)
|
||
|
||
## Headline findings (most likely Sylpheed-renderer-blockers)
|
||
|
||
1. **PPCBUG-107 cascade** — `ReservationTable::invalidate_for_write` defined and unit-tested but never called from any of the **50+ store opcodes** in the interpreter. Under `--parallel`, every cross-thread atomic via `lwarx`/`stwcx.` is silently broken: spinlocks succeed without exclusion, atomic counters race, condition-variable handshakes never sync. Plausible direct cause of the 4-worker-thread renderer plateau (`project_xenia_rs_sylpheed_stage3_2026_04_29.md`). **Fix is mechanical**: one-line `if t.has_active_reservers() { t.invalidate_for_write(ea) }` before every `mem.write_*` in interpreter.rs.
|
||
|
||
2. **PPCBUG-053+054 cascade** — `bcx`/`bclrx` CTR zero-test compares all 64 bits; `mtspr CTR` writes full 64-bit GPR. Combined with PPCBUG-006 (`negx` poisons GPR upper 32) → **`neg; mtctr; bdnz` loops run forever**.
|
||
|
||
3. **8 decoder/field-extraction bugs collapse into 6 missing accessors** + 1 wrong sh64 formula + 1 missing decode_op6 dot-form entry. The disassembler already has correct local versions. Single mechanical sweep.
|
||
|
||
4. **PPCBUG-046 (`clrldi r3, r4, 32`)** — the canonical zero-extend-low-32 idiom is currently a no-op. Emitted constantly by 32-bit-ABI compilers.
|
||
|
||
5. **PPCBUG-510** — `stvewx128` corrupts 12 adjacent bytes per call.
|
||
|
||
6. **PPCBUG-424/425** — `vmaddfp128`/`vmaddcfp128` operand swap. Every D3D vertex/pixel shader using FMA with non-aliased operands gets wrong arithmetic.
|
||
|
||
7. **PPCBUG-360/363** — `vperm128` uses wrong control vector (every D3D shader swizzle); `vpkd3d128` missing post-pack permutation (canonical D3D vertex-pack `pack=1` always wrong).
|
||
|
||
8. **PPCBUG-275/420-422** — VC-form and VMX128_R-form `rc_bit()` reads bit 0 instead of bit 21/27 → **CR6 never updated for ANY VMX vector compare dot form**. Breaks every `vcmpequb. + bc CR6_all_true` early-exit loop in audio mixing, font rendering, string ops.
|
||
|
||
## Recommended fix order
|
||
|
||
The phases below are the recommended fix order for the follow-up session. Each phase is **independently mergeable**; later phases may reveal that earlier phases unblocked their symptoms (e.g. P1 by itself could be sufficient to break open the Sylpheed renderer plateau).
|
||
|
||
After each phase: `cargo test --workspace --release` (must stay at 506+ pass) AND `xenia-rs check sylpheed.iso -n 100M` (must not regress against the 2026-04-29 addis-fix baseline of `swaps=2`). The acid test is whether `draws > 0` opens after P1 or P2.
|
||
|
||
---
|
||
|
||
### Phase 1 — Cross-thread atomicity (PPCBUG-107 cascade)
|
||
|
||
**Why first**: highest confidence smoking-gun for the renderer plateau. Single, mechanical, low-risk fix. Largest leverage relative to size.
|
||
|
||
**Coupled — must land together**:
|
||
- PPCBUG-107 (root: missing call from stores)
|
||
- PPCBUG-130 (9 byte/halfword stores)
|
||
- PPCBUG-140, 141, 142, 143, 144 (5 word stores: stw/stwu/stwx/stwux/stwbrx)
|
||
- PPCBUG-150 (5 doubleword stores: std/stdu/stdx/stdux/stdbrx)
|
||
- PPCBUG-160 (3 multiple/string stores: stmw/stswi/stswx)
|
||
- PPCBUG-167 (9 FP stores)
|
||
- PPCBUG-511, 512, 513, 514 (16 VMX stores)
|
||
|
||
**Independent but related**:
|
||
- PPCBUG-151 (stwcx/stdcx reservation width discriminator) — separate fix; add `reservation_width: u8` to PpcContext.
|
||
- PPCBUG-108 (legacy per-context path: cross-thread invalidation impossible) — informational; --reservations-table mode bypasses.
|
||
|
||
**Approach** — one PR adds `if t.has_active_reservers() { t.invalidate_for_write(ea) }` before every `mem.write_*` call site. Scope:
|
||
```
|
||
mem.write_u8 / write_u16 / write_u32 / write_u64 / write_f32 / write_f64
|
||
mem.write_vec128 / write_vec128_aligned (for VMX)
|
||
```
|
||
~38 sites total. Add 1+ targeted concurrency tests (lwarx + cross-thread plain store + stwcx., expect EQ=0).
|
||
|
||
---
|
||
|
||
### Phase 2 — Decoder/field-extraction structural sweep
|
||
|
||
**Why second**: single mechanical sweep, fixes 12 distinct HIGH-severity findings, unblocks correct execution of compiler-emitted code. Disassembler already has correct local extraction logic — promote/port.
|
||
|
||
**Coupled — same commit**:
|
||
- PPCBUG-040 + PPCBUG-560 — fix `sh64()` bit order AND fix the test helper that was masking it
|
||
- PPCBUG-046 + PPCBUG-561 — promote `mb_md()` from `disasm.rs:1256` to `decoder.rs`; replace 6 inline-formula sites in interpreter.rs (rldicl/rldicr/rldic/rldimi/rldcl/rldcr)
|
||
- PPCBUG-275 + PPCBUG-276 + PPCBUG-420 + PPCBUG-421 + PPCBUG-422 + PPCBUG-562 — add `vc_rc_bit()` (PPC bit 21) and `vx128r_rc_bit()` (PPC bit 27); replace `instr.rc_bit()` at all VMX compare dot-form sites
|
||
- PPCBUG-315 + PPCBUG-563 — add `vx128_4_z()`, `vx128_4_imm()`; fix `vrlimi128`
|
||
- PPCBUG-361 + PPCBUG-565 — add `vx128_5_sh()`; fix `vsldoi128`
|
||
- PPCBUG-362 + PPCBUG-564 — add `vx128_p_perm()`; fix `vpermwi128`
|
||
- PPCBUG-423 + PPCBUG-600 — add 5 odd-key entries to `decode_op6` key4 for `vcmp*fp128.` dot forms
|
||
|
||
**Independent in this phase**:
|
||
- PPCBUG-360 — `vperm128` reads VC from `vd128()` instead of VX128_2 VC field at integer bits 6-8. Fix at the call site (or add `vx128_2_vc()` accessor).
|
||
- PPCBUG-363 + PPCBUG-369 — `vpkd3d128` missing post-pack permutation; add the `pack`/`shift` field handling per Canary.
|
||
|
||
**Test fixture updates required** (PPCBUG-560 lesson) — once `sh64()` is fixed, verify all `disasm_goldens.rs` test helpers encode shifts ISA-correctly. Don't trust the existing fixtures blindly.
|
||
|
||
---
|
||
|
||
### Phase 3 — Other HIGH bugs (single targeted fixes)
|
||
|
||
**Independent**:
|
||
- PPCBUG-510 — `stvewx128` corrupting 12 bytes per call. Direct fix: align EA to word, write only 4 bytes.
|
||
- PPCBUG-424 — `vmaddfp128` operand order: change `ai.mul_add(bi, di)` → `ai.mul_add(di, bi)`.
|
||
- PPCBUG-425 — `vmaddcfp128` operand order similarly.
|
||
- PPCBUG-053 + PPCBUG-054 — `bcx`/`bclrx` CTR zero-test (32-bit) + `mtspr CTR` truncation (defensive firewall). Coupled.
|
||
- PPCBUG-640 — `fmt_bc` spurious condition suffix on pure `bdnz`/`bdz`. Port the `fmt_bclr` pattern.
|
||
- PPCBUG-641 — `lwsync` shows as `sync` in disassembler (re-assessment of PPCBUG-088). Same fix.
|
||
|
||
---
|
||
|
||
### Phase 4 — 32-bit ABI writeback truncation sweep
|
||
|
||
**Why this phase**: cross-cutting, mechanical. Once ALL writebacks truncate via `as u32 as u64`, the systemic 32-bit-ABI invariant is restored and most CR0/CA helper-correctness concerns become moot.
|
||
|
||
#### 4a — Active poisoning (every execution corrupts GPR upper bits)
|
||
|
||
These bugs corrupt GPR upper bits **regardless** of whether upstream sources are clean — typically because the implementation applies Rust's `!u64` (full 64-bit NOT) somewhere:
|
||
- PPCBUG-006 (negx — `(!ra).wrapping_add(1)`)
|
||
- PPCBUG-008 (subfex — `(!ra).wrapping_add(rb).wrapping_add(ca)`)
|
||
- PPCBUG-018 (subfzex)
|
||
- PPCBUG-019 (subfmex)
|
||
- PPCBUG-028 (orcx — `rs | !rb`)
|
||
- PPCBUG-029 (norx — `!(rs | rb)` — the canonical `not` mnemonic, hot path)
|
||
- PPCBUG-030 (nandx)
|
||
- PPCBUG-031 (eqvx — `!(rs ^ rb)` — common `eqv rA, rA, rA` set-to-all-ones)
|
||
- PPCBUG-033 (andcx via `!rb`)
|
||
- PPCBUG-034 (extsbx — `as i8 as i64 as u64`)
|
||
- PPCBUG-035 (extshx)
|
||
|
||
#### 4b — Same-shape-as-addis (latent under clean inputs, active when upstream is poisoned)
|
||
|
||
- PPCBUG-001 (addi), PPCBUG-002 (addic), PPCBUG-003 (addicx), PPCBUG-005 (subficx), PPCBUG-007 (subfcx CA), PPCBUG-008 (subfex CA — also in 4a)
|
||
- PPCBUG-004 (mulli), PPCBUG-009 (mullwx)
|
||
- PPCBUG-010 + PPCBUG-011 (divwx writeback + CR0 — **must land together**, not independently)
|
||
- PPCBUG-041 + PPCBUG-042 + PPCBUG-043 (srawx/srawix writeback + CR0 coupling — **must land together**)
|
||
- PPCBUG-095, 096, 097, 098 (lha/lhax/lhau/lhaux halfword sign-extension)
|
||
- PPCBUG-105 (lwa/lwax/lwaux — note: 64-bit-mode-only; less common in 32-bit-ABI binaries)
|
||
|
||
#### 4c — Latent writeback (only triggers if 4a/4b are unfixed)
|
||
|
||
These can be fixed in the same sweep but won't fire under clean inputs:
|
||
- PPCBUG-012, 013, 014, 015, 016, 017 (addx/addcx/addex/addzex/addmex/subfx)
|
||
- PPCBUG-032 (andx/orx/xorx)
|
||
|
||
#### 4d — CR0 32-bit-ABI compare (cross-cutting catch-all)
|
||
|
||
PPCBUG-020 documents the catch-all; the per-opcode locations are referenced from there:
|
||
- PPCBUG-020 (catch-all in groups 2-5)
|
||
- PPCBUG-023 (andisx)
|
||
- PPCBUG-024 (rlwinmx), PPCBUG-025 (rlwimix), PPCBUG-026 (rlwnmx)
|
||
- PPCBUG-036 (extsbx), PPCBUG-037 (extshx) — **must land with PPCBUG-034/035**
|
||
- PPCBUG-044 (slwx/srwx)
|
||
|
||
**Fix shape** — at every Rc=1 path, change `update_cr_signed(0, result as i64)` to `update_cr_signed(0, result as u32 as i32 as i64)`. Once 4a/4b/4c land, both forms become equivalent and 4d becomes belt-and-suspenders (still recommended for resilience).
|
||
|
||
---
|
||
|
||
### Phase 5 — FPU correctness (graphics middleware impact)
|
||
|
||
#### 5a — Round-to-int and FPSCR.RN
|
||
|
||
- PPCBUG-221 + PPCBUG-227 (`round_to_i64` NearestEven broken near 2^52 — must land together; `round_to_i32` delegates)
|
||
- PPCBUG-201 (FPSCR.RN not honored for double arithmetic)
|
||
- PPCBUG-432 (vrfin/vrfin128 round-half-away-from-zero vs round-to-nearest-even)
|
||
|
||
#### 5b — VXISI / NaN / SNaN handling for FMA family
|
||
|
||
- PPCBUG-181, 182 (single fmaddsx/fmsubsx/fnmaddsx/fnmsubsx VXISI)
|
||
- PPCBUG-202, 203, 204 (double fmaddx/fmsubx/fnmaddx/fnmsubx VXISI — esp. 203 hot for Newton-Raphson)
|
||
- PPCBUG-183, 205 (fnmadd/fnmsub Rust unary `-` flips NaN sign — fix: skip negation on NaN)
|
||
- PPCBUG-186 (SNaN priority for FMA)
|
||
- PPCBUG-128 (lfs SNaN quietening — bit-manipulation widening helper needed)
|
||
|
||
#### 5c — Inexact / FPSCR exception bits
|
||
|
||
- PPCBUG-180 (single XX/FR/FI never set), PPCBUG-200 (double XX/FR/FI never set)
|
||
- PPCBUG-223 (fcmpo VXSNAN/VXVC), PPCBUG-224 (fcfidx XX), PPCBUG-225 (frspx XX/FR/FI), PPCBUG-229 (fctidx/fctidzx XX/FX), PPCBUG-230 (fctiwx/fctiwzx XX/FX), PPCBUG-231 (frspx SNaN host dependency)
|
||
- PPCBUG-165 + PPCBUG-166 + PPCBUG-168 (stfs* FPSCR + RN + SNaN)
|
||
|
||
#### 5d — Subnormal flush (FPSCR.NI / VSCR.NJ)
|
||
|
||
- PPCBUG-185 (FPU NI subnormal flush not modeled)
|
||
- PPCBUG-435, 436, 437 (VMX NJ subnormal flush — vaddfp/vsubfp/vmulfp128, vmsum3fp128/vmsum4fp128 product intermediates, vmaddfp/vmaddfp128/vmaddcfp128/vnmsubfp128 outputs)
|
||
|
||
#### 5e — Estimate precision (vs hardware ~12-bit)
|
||
|
||
- PPCBUG-184 (fres)
|
||
- PPCBUG-428..431 (vrefp, vrsqrtefp, vexptefp, vlogefp — same shape as fres)
|
||
|
||
#### 5f — VMX float compares + saturation
|
||
|
||
- PPCBUG-426, 427 (vnmsubfp/vnmsubfp128 double-rounding)
|
||
- PPCBUG-433 (vctsxs/vcfpsxws128 NaN saturate to INT_MIN)
|
||
|
||
---
|
||
|
||
### Phase 6 — Other MEDIUM correctness
|
||
|
||
- PPCBUG-021 (overflow.rs OE checks at bit 63 — sub-register ops; partly covered by P4)
|
||
- PPCBUG-022 (`mulld_ov` missing INT_MIN × -1)
|
||
- PPCBUG-027 (rlwimix upper-32 ISA-deviation — auto-resolves once P4 lands)
|
||
- PPCBUG-039 (cntlzdx 32-bit-ABI counts upper-zero — only matters if emitted)
|
||
- PPCBUG-063 (trap pc-after-advance)
|
||
- PPCBUG-064 (sc LEV field)
|
||
- PPCBUG-065 (twi 31, r0, IMM typed-trap — relevant to Sylpheed C++ throw work, see `project_xenia_rs_sylpheed_throw_2026_04_28.md`)
|
||
- PPCBUG-068 (mcrfs VX summary recomputation)
|
||
- PPCBUG-078 (mtmsrd L=1 partial MSR-write)
|
||
- PPCBUG-080 (mfvscr zero upper 96 bits)
|
||
- PPCBUG-123 + PPCBUG-124 + PPCBUG-161 + PPCBUG-566 (XER TBC for lswx/stswx — coupled; add `xer_tbc: u8` to PpcContext, wire into xer()/set_xer(); enables lswx and stswx)
|
||
- PPCBUG-125 (lmw RA-in-destination skip)
|
||
- PPCBUG-126 + PPCBUG-162 (lswi/stswi `instr.rb()` → `instr.nb()`)
|
||
- PPCBUG-487 + PPCBUG-495 (vsum* operand naming)
|
||
- PPCBUG-515 (lvebx/lvehx/lvewx vs Canary divergence — document; xenia-rs is more ISA-faithful)
|
||
- PPCBUG-516 (lvsr sh=0 case — add comment + debug_assert)
|
||
- PPCBUG-601 (decode_op6 overlapping windows — document the invariant)
|
||
- PPCBUG-642 (fmt_bcctr extended forms)
|
||
- PPCBUG-643 + PPCBUG-644 (SIMM/D-form decimal vs hex — alignment with Canary disassembly)
|
||
- PPCBUG-367 (vupkhpx/vupklpx channel replication vs zero-extend)
|
||
- PPCBUG-368 (vpkpx pack_pixel_555 channel assignment unverified)
|
||
- PPCBUG-366 (vspltisb/vspltish sign-extension idiom — fragile, not wrong)
|
||
|
||
---
|
||
|
||
### Phase 7 — Frozen-snapshot drift (separate sweep)
|
||
|
||
8 opcodes' frozen snapshots in `ppc-manual/<cat>/<op>.md` differ from live code:
|
||
- PPCBUG-066 (td/tdi/tw/twi)
|
||
- PPCBUG-117 (ldarx)
|
||
- PPCBUG-145 (stwcx)
|
||
- PPCBUG-560 (already-listed: rldicl test helper bit-order)
|
||
- Plus the implicit drift in addicx (PPCBUG-003), andisx (PPCBUG-023), cmp/cmpi (PPCBUG-050), extsbx/extshx (PPCBUG-036/037, PPCBUG-032 in batch 1)
|
||
|
||
**Recommendation**: regenerate frozen snapshots from current code for the entire ppc-manual after Phases 1-4 land. Add a CI check that compares snapshots vs live code on every PR.
|
||
|
||
---
|
||
|
||
### Phase 8 — Test gap closure (broad)
|
||
|
||
Single PR per group is overkill; recommend bundling test additions with each Phase 1-6 PR (test the bug being fixed). The remaining LOW IDs are pure-test-gap entries — list:
|
||
|
||
- PPCBUG-045 (shift), 047 (rld), 055 (branch), 067 (trap+sc), 070 (CR logical)
|
||
- PPCBUG-081, 082, 083, 084, 085 (SPR/MSR/TB/FPSCR/VSCR moves), 089 (cache+sync)
|
||
- PPCBUG-091 (lbz), 100 (lha), 109, 110, 111 (lwa/lwbrx/lwarx), 118 (ld), 127 (lmw/lswi/lswx), 129 (lfs/lfd)
|
||
- PPCBUG-132 (stb/sth), 146, 147 (stw/stwcx), 153 (std/stdcx), 163 (stmw/stswi/stswx), 171 (stfs/stfd)
|
||
- PPCBUG-187 (FPU single), 208 (FPU double), 228 (FPU misc convert)
|
||
- PPCBUG-240 (VMX add/sub), 243 (VMX sat helpers)
|
||
- PPCBUG-277, 278, 279 (VMX compare/min/max/avg)
|
||
- PPCBUG-316, 317, 320, 321, 322, 323, 324, 325 (VMX shift/rotate/logical)
|
||
- PPCBUG-370, 371, 372, 373, 374, 375, 376, 377, 378 (VMX permute/pack)
|
||
- PPCBUG-438, 439, 440 (VMX float compare/round/convert)
|
||
- PPCBUG-490, 491, 492, 493, 494 (VMX multiply-sum)
|
||
- PPCBUG-517, 518, 519 (VMX load/store)
|
||
- PPCBUG-567 (decoder accessors)
|
||
- PPCBUG-604 (decoder dispatch tables)
|
||
- PPCBUG-649, 650, 652 (golden fixtures for branches/VMX128)
|
||
|
||
---
|
||
|
||
## Notes & administrative
|
||
|
||
### Withdrawn / retracted
|
||
|
||
- **PPCBUG-220** — `fctiwx` strict-`>` threshold actually correct (`i32::MAX` exactly representable in f64). Retracted by group-31 subagent.
|
||
- **PPCBUG-222** — `fctidx` positive-overflow sentinel `0x7FFF_FFFF_FFFF_FFFF` is the correct ISA value. Retracted.
|
||
- **PPCBUG-226** — FPRF 5-bit codes for fcmpu/fcmpo are correct per PowerISA. Retracted.
|
||
- **PPCBUG-482** — `vmhaddshs` shift `>>15` is correct per spec snapshots. Retracted.
|
||
- **PPCBUG-483** — `vmhraddshs` shift `>>15` is correct per spec snapshots. Retracted.
|
||
|
||
### Wontfix / informational (not retracted but no fix needed)
|
||
|
||
- **PPCBUG-038** — extswx ISA-correct, intentional 64-bit sign-extension. Document the asymmetry with extsb/extsh after PPCBUG-034/035 land.
|
||
- **PPCBUG-090, 099, 152** — invalid-form (rD==rA) silently destroys load/store result. Per ISA: undefined behavior. No compiler emits these; matches Canary. Optional `debug_assert!`.
|
||
- **PPCBUG-106, 115, 131, 169, 170, 206, 207, 318, 319, 364, 365, 434, 651, 653, 645, 646, 648** — informational confirmations that the implementation is correct, no change needed.
|
||
- **PPCBUG-069** — test comment OX(so)=0 is wrong but the assert is correct.
|
||
- **PPCBUG-602, 603, 605** — undocumented decoder dispatch quirks; correct but should add comments.
|
||
- **PPCBUG-647, 654** — disassembler edge-case behavior on invalid encodings; not-a-bug for valid input.
|
||
|
||
### Coupling matrix (must-land-together)
|
||
|
||
| Group | IDs | Reason |
|
||
|---|---|---|
|
||
| divwx | 010, 011 | Quotient zero-extension changes the CR0 sign view |
|
||
| srawx/srawix | 041, 042, 043 | Writeback truncation invalidates the CR0 view |
|
||
| extsbx/extshx | 034+036, 035+037 | Same coupling shape as srawx |
|
||
| sh64 | 040, 560 | Test helper is wrong in the inverse direction |
|
||
| mb_md sweep | 046, 561 | Promote disasm.rs accessor first |
|
||
| VC-form Rc | 275, 276, 420, 421, 562 | All consume the same new accessor |
|
||
| VMX128_R Rc | 422, 562 | Same accessor sweep |
|
||
| vrlimi128 | 315, 563 | Field accessor + caller fix |
|
||
| vsldoi128 | 361, 565 | Field accessor + caller fix |
|
||
| vpermwi128 | 362, 564 | Field accessor + caller fix |
|
||
| vcmp*fp128. | 423, 600 | decode_op6 odd keys + opcode mapping |
|
||
| XER TBC | 123, 124, 161, 566 | Add field, wire xer()/set_xer(), enables lswx/stswx |
|
||
| round_to_i64 | 221, 227 | round_to_i32 delegates |
|
||
| stfs FPSCR | 165, 166, 168 | Single fix shape covers all three |
|
||
|
||
### Dependency on the addis fix
|
||
|
||
The addis fix (`project_xenia_rs_addis_signext_root_cause_2026_04_29.md`) is already in place. Phase 4 generalizes that fix systematically; without it, the writeback-truncation invariant would still be incomplete.
|
||
|
||
### Anticipated impact on the Sylpheed renderer plateau
|
||
|
||
Strong candidates for direct cause of the plateau:
|
||
- **PPCBUG-107** — broken atomics. Workers wait forever on never-signaled events; classical broken-spinlock symptom.
|
||
- **PPCBUG-053+054** — broken `bdnz` loops; could explain workers parked indefinitely.
|
||
- **PPCBUG-046 (`clrldi r3, r4, 32`)** — pollution propagation in 32-bit ABI; could break any pointer-clean-up sequence.
|
||
|
||
After applying Phase 1 alone, run `xenia-rs check sylpheed.iso -n 4B --parallel` and check whether `draws > 0`. If yes, the plateau was atomics; if no, proceed to P2/P3.
|
||
|
||
---
|
||
|
||
## Progress log
|
||
|
||
### P1 — Cross-thread atomicity sweep (merged 2026-05-01, HEAD ca5b90b)
|
||
|
||
**PPCBUGs fixed**: 107, 130, 140, 141, 142, 143, 144, 150, 160, 167, 511, 512, 513, 514, 151, 108. Plus review-fix additions: dcbz, dcbz128, stswi two-line, stswx two-line (merged in review-fix commit c9f194d).
|
||
|
||
**Gate results**:
|
||
- `cargo test --workspace --release`: 449 passed, 0 failed
|
||
- `-n 100M` lockstep: swaps=2, clean
|
||
- `-n 100M --parallel --reservations-table`: swaps=2, clean
|
||
- **Acid test** `-n 4B --parallel --reservations-table`: swaps=2, draws=**0**, no RtlRaiseException, no panics
|
||
|
||
**Conclusion**: P1 did NOT unblock the Sylpheed renderer. `draws` remains 0. The renderer plateau is not caused by broken cross-thread atomics alone. Proceeding to P2 (decoder/field-extraction sweep). The strongest remaining candidate per the plan is PPCBUG-046 (`clrldi r3, r4, 32` no-op).
|
||
|
||
---
|
||
|
||
### P2 — Decoder/field-extraction structural sweep (merged 2026-05-01, HEAD see `git log master --oneline -1`)
|
||
|
||
**PPCBUGs fixed**: 040, 046, 275, 276, 315, 360, 361, 362, 363, 369, 420, 421, 422, 423, 560, 561, 562, 563, 564, 565, 600.
|
||
|
||
**Batches**:
|
||
- Batch 1: PPCBUG-040+560 — sh64() bit-order fix (XS-form SH split) + rldicl test helper encoding
|
||
- Batch 2: PPCBUG-046+561 — mb_md() accessor; all 6 rld* MB fields corrected (clrldi was a no-op)
|
||
- Batch 3: PPCBUG-275+276+420+421+422+423+562+600 — vc_rc_bit()/vx128r_rc_bit() Rc accessors; 13 vcmp interpreter sites; 5 decode_op6 dot-form entries
|
||
- Batch 4: PPCBUG-315+563 — vrlimi128 vx128_4_z/imm field extraction
|
||
- Batch 5: PPCBUG-361+565 — vsldoi128 vx128_5_sh field extraction
|
||
- Batch 6: PPCBUG-362+564 — vpermwi128 vx128_p_perm field extraction
|
||
- Batch 7: PPCBUG-360 — vperm128 vc128_2() accessor (was erroneously vd128())
|
||
- Batch 8: PPCBUG-363+369 — vpkd3d128 post-pack permutation (MakePermuteMask tables from canary)
|
||
|
||
**Gate results**:
|
||
- `cargo test --workspace --release`: 201 (cpu) + 6 (disasm goldens) + 144 + 76 + 16 + 8 + … passed, 0 failed
|
||
- Independent code reviewer: all 9 check items OK
|
||
- `-n 100M` lockstep smoke: ISO not available in CI environment; last known good at P1 HEAD was swaps=2
|
||
- **Acid test** `-n 4B --parallel --reservations-table`: pending (ISO not in CI environment)
|
||
|
||
**Conclusion**: All P2 fixes applied and reviewed. Decoder field extraction is now correct for all audited VMX128 and MD/XS-form instructions. Whether P2 unblocks the renderer (`draws > 0`) requires the sylpheed.iso acid test on the user's machine. PPCBUG-046 (clrldi no-op fix) was the highest-probability P2 renderer-unblock candidate. Next: P3 — isolated HIGH bugs (PPCBUG-510, 424/425, 053+054, 640, 641).
|
||
|
||
---
|
||
|
||
### P3 — Isolated HIGH bugs (merged 2026-05-02, HEAD f3ebaba)
|
||
|
||
**PPCBUGs fixed**: 053+054 (coupled CTR 32-bit), 424+425 (vmaddfp128/vmaddcfp128 operand swap), 510 (stvewx128 corruption), 640+650 (bdnz/bdz suffix), 641+649 (sync/lwsync), **700 (NEW)**.
|
||
|
||
**Batches**:
|
||
- Batch 1: PPCBUG-510 — stvewx128 16-byte corruption fixed (word-align EA, extract lane, write 4 bytes)
|
||
- Batch 2: PPCBUG-424+425 + PPCBUG-700 partial (va128 PPC[11-15] partial fix) — vmaddfp128/vmaddcfp128 operand swap to VA*VD+VB
|
||
- Batch 3: PPCBUG-053+054 — bcx/bclrx 32-bit CTR compare + mtspr CTR truncation
|
||
- Batch 4: PPCBUG-640+650 — fmt_bc spurious bdnzge/bdzge suffix gated on `!uncond`
|
||
- Batch 5: PPCBUG-641+649 — sync/lwsync L-field disambiguation
|
||
- Phase review fix: **PPCBUG-700 (NEW)** — VMX128 register accessors (va128/vb128/vd128/vx128r_rc_bit) rewritten to canary's bitfield positions. Audit's "confirmed-clean" line-2958 assessment was based on miscounting LSB-first packed C++ bitfields. Per canary (`xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663`):
|
||
- VA128 = PPC[11-15] | PPC[26]<<5 | PPC[21]<<6 (3 fields, 7 bits)
|
||
- VB128 = PPC[16-20] | PPC[30-31]<<5
|
||
- VD128 = PPC[6-10] | PPC[28-29]<<5
|
||
- VX128_R Rc = PPC[25] (host bit 6) — NOT PPC[27] as PPCBUG-422 prescribed
|
||
Affects 30+ VMX128 opcodes; production game code with VR>=32 was silently mis-decoded. Speculative `key4_dt` dot-form dispatch in `decode_op6` removed (canary has no separate dot-form opcodes for VX128_R). New PPCBUG-700 entry added to `audit-findings.md` Phase C4 invalidating audit line 2958.
|
||
|
||
**Gate results**:
|
||
- `cargo test --workspace --release`: **470 passed, 0 failed** (up from 467 baseline at P3 start; 3 new CTR regression tests added)
|
||
- Independent code reviewer: 1 BLOCKING issue (PPCBUG-700 above) — addressed before merge
|
||
- `-n 100M` lockstep smoke: ISO not in CI; checked locally during development
|
||
- **Acid test** `-n 4B --parallel --reservations-table`: **deferred to end of all phases** per user direction
|
||
|
||
**Conclusion**: All P3 fixes applied + reviewed + reviewer's blocking concern resolved. Phase 3 also produced one HIGH discovery (PPCBUG-700) that the audit had missed. Total fixes: 6 commits, 7 distinct PPCBUG groups. Next: P4 — 32-bit ABI writeback truncation sweep, ~30 IDs across 4a-4d sub-sections.
|
||
|
||
---
|
||
|
||
### P4 — 32-bit ABI writeback truncation sweep (merged 2026-05-02, HEAD d945aea)
|
||
|
||
**PPCBUGs fixed**: ~43 IDs across the 4a/4b/4c/4d sub-sections.
|
||
- 4a active poisoning: 006 (negx), 008 (subfex), 018 (subfzex), 019 (subfmex), 028 (orcx), 029 (norx), 030 (nandx), 031 (eqvx), 033 (andcx)
|
||
- 4a/4d coupled: 034+035+036+037 (extsbx/extshx writeback + CR0)
|
||
- 4b immediate ALU: 001 (addi), 002 (addic), 003 (addicx), 004 (mulli), 005 (subficx), 007 (subfcx CA)
|
||
- 4b mul/div + srawx coupled: 009 (mullwx), 010+011 (divwx + CR0), 041+042+043 (srawx/srawix + CR0)
|
||
- 4b loads: 095-098 (lha/lhax/lhau/lhaux), 105 (lwa/lwax/lwaux)
|
||
- 4c latent: 012-017 (addx/addcx/addex/addzex/addmex/subfx), 032 (andx/orx/xorx CR0)
|
||
- 4d CR0 catch-all: 020 (in mulhwx/mulhwux/divwux/andx/orx/xorx/cntlzwx etc.), 023 (andisx), 024 (rlwinmx), 025 (rlwimix), 026 (rlwnmx), 044 (slwx/srwx)
|
||
|
||
**Batches**:
|
||
- Batch 1 (e18a0a4): 4a active poisoning NOT/SUB family — 9 PPCBUGs
|
||
- Batch 2 (145a7a4): 4a/4d coupled extsbx+extshx+CR0 — 4 PPCBUGs (must land together)
|
||
- Batch 3 (bf8208e): 4b immediate ALU — 6 PPCBUGs
|
||
- Batch 4 (82a9bff): 4b mul/div + srawx coupled — 6 PPCBUGs (two coupling groups)
|
||
- Batch 5 (20a730d): 4b halfword + lwa loads — 5 PPCBUGs
|
||
- Batch 6 (16993bb): 4c latent + 4d CR0 catch-all — ~13 PPCBUGs
|
||
- Review-fix (49103bb): subfx/subfcx OE predicate + mulli test rigor
|
||
|
||
**Phase invariants restored**: every 32-bit ABI GPR write zero-extends from a u32 result, every CR0 update views the result as i32, every CA bit comes from a 32-bit unsigned compare. Downstream 64-bit unsigned compares (the addis-incident shape) can no longer be fed polluted upper bits from any of the 40+ touched ALU sites. The frozen-snapshot drift detected in PPCBUG-003 (addicx CR0) and PPCBUG-023 (andisx CR0) is also resolved.
|
||
|
||
**Review findings**:
|
||
- BLOCKING issue caught: subfx and subfcx OE handlers in batch 6 still used the legacy `sum_overflow_64` helper. The helper compares the 32-bit `true_diff` against a u64 view of the result; any legitimate i32::MIN result (bit 31 set) spuriously triggered OV=1. Fixed in 49103bb with two new discriminating regression tests.
|
||
- Minor caught: `mulli_overflow_wraps_to_32` rubber-stamped — both pre/post fix wrote 0 for the chosen inputs. Redesigned to use polluted-upper-bits inputs that genuinely discriminate.
|
||
|
||
**Gate results**:
|
||
- `cargo test --workspace --release`: **494 passed, 0 failed** (up from 470 at P3 merge; 24 new regression tests across the batches)
|
||
- 64-bit ABI ops verified untouched: rldicl/rldicr/rldic/rldimi/rldcl/rldcr, sldx/srdx/sradx/sradix, mulhdx/mulhdux/mulldx, divdx/divdux, cntlzdx, extswx
|
||
- **Acid test** `-n 4B --parallel --reservations-table`: deferred per user direction
|
||
|
||
**Conclusion**: P4 is the largest ABI-correctness sweep of the audit. The systemic invariant is restored. Next: P5 — FPU correctness (~30 IDs).
|
||
|
||
---
|
||
|
||
### P5 — FPU correctness (merged 2026-05-02, HEAD d39d0ba)
|
||
|
||
**PPCBUGs fixed**: 21 IDs across the 5a-5f sub-sections.
|
||
- 5a (round-to-int): 221+227 (round_to_i64 NearestEven near 2^52, coupled), 432 (vrfin round-to-even)
|
||
- 5b (FMA VXISI + NaN sign): 181, 182 (single fmaddsx/fmsubsx VXISI), 202, 203 (double fmaddx/fmsubx/fnmaddx/fnmsubx VXISI), 183, 205 (NaN sign preservation in fnmaddx/fnmsubx and *sx siblings)
|
||
- 5c (XX-on-inexact): 223 (verified already correct), 224 (fcfidx XX), 225 (frspx XX), 229 (fctidx/fctidzx XX), 230 (fctiwx/fctiwzx XX)
|
||
- 5d (subnormal flush): 435 (vaddfp/vsubfp/vmulfp128 missing flush), 436 (vmsum3fp128/vmsum4fp128 per-product flush), 437 (vmaddfp family output flush)
|
||
- 5e (estimate precision): 184 (fresx canary parity via f32 input quantization)
|
||
- 5f (saturation + single-FMA): 426 (vnmsubfp single FMA), 427 (vnmsubfp128 single FMA), 433 (vctsxs NaN→INT_MIN)
|
||
|
||
**Batches**:
|
||
- Batch 1 (f6a444b): 5a round-to-int + vrfin
|
||
- Batch 2 (26b9897): 5b FMA — new `check_invalid_fma_add` helper in fpscr.rs derives VXISI from input properties
|
||
- Batch 3 (49bf74f): 5c XX bit on conversions
|
||
- Batch 4 (538fa5a): 5d VSCR.NJ unconditional flush (matches Canary; Xbox 360 always boots NJ=1)
|
||
- Batch 5 (6ba8f83): 5e fresx pre-quantize input
|
||
- Batch 6 (6fe2cbf): 5f single-FMA + vctsxs NaN
|
||
- Review-fix nit (05f2f72): vrfin → stdlib `f32::round_ties_even()`
|
||
|
||
**Deferred for focused sub-batches** (Status: open in audit-findings.md):
|
||
- PPCBUG-201 (FPSCR.RN for double arithmetic) — requires MXCSR set/restore wrappers around 10+ FPU arms
|
||
- PPCBUG-185 (FPSCR.NI flush for scalar FPU) — requires NI bit constant + post-op flush wrapper
|
||
- PPCBUG-180 + PPCBUG-200 (XX/FR/FI in update_after_op) — requires pre-vs-post-round comparison
|
||
|
||
**Review findings**:
|
||
- Independent reviewer verdict: **MERGE-READY**. No blocking issues.
|
||
- Two non-blocking minor follow-ups noted: (a) `check_invalid_fma_add` doesn't catch the finite-product-overflow + infinite-b cancellation half of PPCBUG-202 (audit-acknowledged as rare); (b) vrfin used inline tie-breaker — replaced with stdlib `round_ties_even()` in 05f2f72.
|
||
|
||
**Gate results**:
|
||
- `cargo test --workspace --release`: **498 passed, 0 failed** (up from 494 at P4 merge; 5 new regression tests across the batches)
|
||
- **Acid test** `-n 4B --parallel --reservations-table`: deferred per user direction
|
||
|
||
**Conclusion**: P5 covers the FPU correctness foundation (round-to-int, VXISI, NaN preservation, XX bit, subnormal flush). Three substantive items deferred. Next: P6 — Other MEDIUM correctness (overflow.rs sweep, trap PC-after-advance, sc LEV, twi typed-trap, etc.).
|
||
|
||
---
|
||
|
||
### P6 — Other MEDIUM correctness (merged 2026-05-02, HEAD 112202c)
|
||
|
||
**PPCBUGs fixed**: 13 IDs across the misc-MEDIUM scope.
|
||
- Trap/sc/typed-trap (063/064/065): trap PC stays at CIA on Trap; sc LEV logged; twi 31, r0, IMM SIMM type code logged.
|
||
- XER TBC infrastructure (123/124/161/566): new `xer_tbc: u8` field in `PpcContext`, wired into `xer()`/`set_xer()`; enables `lswx`/`stswx` (which were permanent no-ops without the TBC infrastructure).
|
||
- Load-multiple cleanups (125/126/162): `lmw` skips writes to RA when in [RT..32) per ISA; `lswi`/`stswi` use `instr.nb()` instead of misnamed `instr.rb()`.
|
||
- SPR/MSR/VSCR (068/078/080): `mcrfs` now recomputes the VX summary bit; `mtmsrd L=1` does the partial MSR write per ISA; `mfvscr` zero-extends the VSCR word into the upper 96 bits of VD.
|
||
- Verification/auto-resolved (022/021/027/039): `mulld_ov` test confirms `checked_mul` handles INT_MIN*-1 correctly (audit's "missing" claim was incorrect); 021/027 auto-resolved by P4; 039 wontfix per audit.
|
||
|
||
**Batches**:
|
||
- Batch 1 (d96986a): trap/sc semantics
|
||
- Batch 2 (68c0ee5): XER TBC + load-multiple cleanups
|
||
- Batch 3 (0f2a26c): SPR/MSR/VSCR
|
||
- Batch 4 (99e7814): mulld_ov verification
|
||
- Review-fix nit (5ece5e3): mcrfs uses existing `fpscr::VX_ALL` constant
|
||
|
||
**Deferred (Status: open in audit-findings.md)**:
|
||
- Structural enum extensions (no consumer yet): `StepResult::HypervisorCall` for PPCBUG-064 sc 2 routing; `StepResult::Trap { type_code: u16 }` for PPCBUG-065 typed-trap C++ exception class routing — relevant if/when SEH dispatch lands.
|
||
- Cosmetic/test-coverage: PPCBUG-642 (fmt_bcctr ISA-undefined edge), 643/644 (SIMM/D-form decimal vs hex — would re-baseline all goldens), 367/368 (vupkhpx/vpkpx channels), 487/495 (vsum naming), 515/516 (lvebx/lvsr docs), 601 (decode_op6 invariant doc).
|
||
|
||
**Review findings**: independent reviewer verdict was LGTM on all 4 commits, one cosmetic nit (use existing `fpscr::VX_ALL` instead of duplicate inline mask) applied immediately in 5ece5e3. No blocking issues. Reviewer specifically verified: trap-PC change against all `StepResult::Trap` consumers (none rely on `ctx.pc` for the faulting address); XER TBC field initialization through the single `PpcContext::new()` path that `Default` delegates to; `Vec128` lane ordering for `mfvscr` zero-extend.
|
||
|
||
**Gate results**:
|
||
- `cargo test --workspace --release`: **498 passed, 0 failed**
|
||
- **Acid test** `-n 4B --parallel --reservations-table`: deferred per user direction
|
||
|
||
**Conclusion**: P6 closes the misc-MEDIUM scope. All correctness fixes in scope have landed; structural enum extensions and cosmetic items are explicitly deferred and tracked. Remaining phases: P7 (frozen-snapshot drift, 8 opcodes), P8 (test gap closure, ~50 IDs).
|
||
|
||
---
|
||
|
||
### P7 — Frozen-snapshot drift sweep (2026-05-02, manual regen — no xenia-rs code change)
|
||
|
||
**PPCBUGs fixed**: 3 IDs.
|
||
- PPCBUG-066: ppc-manual/branch/{td,tdi,tw,twi}.md — old unconditional-trap stub replaced with current TO-field-evaluating implementation snippet.
|
||
- PPCBUG-117: ppc-manual/memory/ldarx.md — refreshed to current reservation_line/reservation_table model.
|
||
- PPCBUG-145: ppc-manual/memory/stwcx.md — same reservation refresh.
|
||
|
||
**Methodology**: ran `python3 ppc-manual/generator/generate_manual.py` (the existing idempotent generator that scrapes xenia-rs and xenia-canary source for each opcode and emits a Markdown page). Output: 350 family pages updated, 598-key index.json refreshed.
|
||
|
||
**Verification**: post-regen `grep` confirms (a) the old "For now, just trace and continue" stub is gone from every page; (b) modern constructs (`trap::evaluate`, the current reservation pattern) appear in the trap and reservation pages.
|
||
|
||
**Note on scope**: the `ppc-manual/` directory is not versioned in `xenia-rs/.git`. The regen is therefore "done by running the script" with no commit landing in this repo. Documented for posterity here.
|
||
|
||
**Implicit drift cleared by earlier phases**: addicx (PPCBUG-003 fixed in P4), andisx (PPCBUG-023 fixed in P4), cmp/cmpi (PPCBUG-050 — no code change required; manual snapshot now reflects current behavior), extsbx/extshx (PPCBUG-036/037 fixed in P4 batch 2), 32 in batch 1 — all auto-resolved by re-running the generator after P1-P6.
|
||
|
||
**Conclusion**: P7 is functionally complete. No xenia-rs code change. Next: P8 — test gap closure.
|
||
|
||
---
|
||
|
||
### P8 — Test gap closure (merged 2026-05-02, HEAD 4029041)
|
||
|
||
**PPCBUGs closed**: 38 IDs across the test-gap LOW scope (audit listed ~50; 38 closed, ~12 remain Status: open as test-gap-only items that don't block functionality).
|
||
|
||
**Closed**:
|
||
- Branch/CR/SPR/sync: 055, 067, 070, 081, 082, 083, 084, 085, 089
|
||
- Loads: 091, 100, 109, 110, 111, 118, 127, 129
|
||
- Stores: 132, 146, 147, 153, 163, 171
|
||
- FPU: 187, 208, 228
|
||
- VMX integer: 240, 277
|
||
- VMX shift/rotate/logical: 316, 320, 321, 323
|
||
- VMX permute: 370
|
||
- VMX float compare/round/convert: 438, 439, 440
|
||
- VMX multiply-add: 490
|
||
- VMX load/store: 517
|
||
|
||
**Remaining open** (LOW test-gap, non-blocking): 045, 047, 066, 088 (PPCBUG-088 disasm-only test gap), 117, 145, 279, 317, 322, 324, 325, 371-378, 491-494, 518, 519, 567. These can stay open until a focused test-coverage sprint or incidentally landed during ongoing work.
|
||
|
||
**Batches**:
|
||
- Batch 1 (9827b03): branch/CR-logical/SPR/MSR/FPSCR/sync — 12 tests
|
||
- Batch 2 (2d223ee): load/store base + XER-TBC-driven lswx/stswx — 15 tests
|
||
- Batch 3 (ebfd18a): FPU + VMX float — 14 tests; reviewer caught a VX-form encoding nit (XO at bit 0 not bit 1) during this batch and the author re-encoded all VX/VC tests before commit
|
||
- Batch 4 (2614806): VMX integer/permute/load-store — 12 tests
|
||
- Review-fix nit (1f9696a): test rename `vmsum3fp_horizontal_3lane_sum` → `vmaddfp_lane_fma` (test body actually exercised vmaddfp)
|
||
|
||
**Review findings**: independent reviewer verdict was LGTM on all 4 batches with no blocking issues. Every hand-encoded raw was mechanically cross-checked against canary's `INSTRUCTION(0x..., ..., kVX|kVC|kX|kA, ...)` base raw — no encoding mismatches. The XER-TBC-driven `lswx`/`stswx` tests are particularly load-bearing: they exercise the new infrastructure landed in P6 (68c0ee5); both opcodes were permanent no-ops pre-P6.
|
||
|
||
**Gate results**:
|
||
- `cargo test --workspace --release`: **551 passed, 0 failed** (up from 498 at P7 merge — 53 net new tests; one `vmsum3fp_…` rename = -1+1 = net 0)
|
||
- **Acid test** `-n 4B --parallel --reservations-table`: deferred per user direction
|
||
|
||
**Conclusion**: P8 closes the meaningful test-coverage gaps for opcode groups that previously had near-zero unit tests. Combined with the regression tests embedded in P1-P6 commits, the test suite now exercises every primary opcode form (branch, CR, SPR, FPU, VMX integer, VMX float, VMX load/store, scalar load/store) at least once. Remaining LOW test-gap items can be closed incrementally without blocking the audit's functional fixes.
|
||
|
||
---
|
||
|
||
### Post-P8 — End-to-end review + acid test (2026-05-02)
|
||
|
||
**End-to-end reviewer findings** (cross-cutting after all 8 phases):
|
||
|
||
1. **BLOCKING-LIKELY**: `lwa`/`lwax`/`lwaux` were converted to zero-extend in P4 batch 5 (PPCBUG-105 "minimal-fix"); reviewer flagged this as ISA-deviating. Per PowerISA, "Load Word and Algebraic" must sign-extend. Hotfix landed at HEAD f1166d0 — restored `as i32 as i64 as u64` form, updated test from `lwa_high_bit_set_zero_extends_upper` to `lwa_sign_extends_to_i64`.
|
||
2. **Cosmetic** `fpscr.rs:289` duplicate-branch typo in `round_single_toward_zero` — both branches were `adj_bits - 1`. Replaced with the unconditional form + comment. HEAD 09c6c92.
|
||
3. **Minor** reservation table's `active_reservers` counter is slot-occupancy, not reserver-count — once dirtied via cross-line-collision displacement, stores eternally pay the `invalidate_for_write` Acquire-load cost. Correctness-preserving (counter is upper bound), but performance can degrade. Documented; deferred to a focused performance sub-batch.
|
||
4. **Asymmetric** `extswx` is the only sign-extend opcode left at 64-bit ABI (P4 converted every other extsXx to 32-bit). Per PPCBUG-038 (audit `wontfix`), this matches ISA's documented "argument-register canonicalization in 64-bit mode" intent. No code change. Reviewer flagged the asymmetry — accepted.
|
||
|
||
**Acid test result** (`xenia-rs check sylpheed.iso -n 4000000000 --parallel --reservations-table`, 2026-05-02 12:28→12:46):
|
||
- Exit code: 0 (clean termination, no panics, no RtlRaiseException, no halts)
|
||
- swaps=1 (frame=1 XE_SWAP, fb=0x4b0d7000, 1280×720)
|
||
- draws=0
|
||
- 14 ExCreateThread spawns, 2 worker exits via LR sentinel
|
||
- The renderer plateau is **NOT unblocked** by the cumulative P1-P8 correctness fixes
|
||
- Note: the binary tested was pre-lwa-hotfix (built before commit f1166d0). The lwa change is unlikely to affect Sylpheed (compilers don't emit `lwa` in 32-bit-ABI code), but a re-run after the hotfix would be the conservative confirmation.
|
||
|
||
**Implication**: the renderer plateau (`draws=0`) has a non-PPC-correctness root cause. The audit's catch was correctness-justified independent of the renderer (PPCBUGs are real bugs, well-grounded against canary), but the cumulative ~161 PPCBUG fixes do not unblock the specific Sylpheed-rendering issue. Next investigation tracks should focus on:
|
||
- Graphics-pipeline-side issues (EDRAM resolve gaps per `project_xenia_rs_edram_resolve_gap.md`, RT readback)
|
||
- Kernel HLE divergences (event signaling, timer queues, file system)
|
||
- The unresolved BST-validation paradox documented in `project_xenia_rs_sylpheed_event_chain_2026_04_29.md` (sub_82175E68 registers 0x828F3F68 in the BST but the validator doesn't find it eight instructions later)
|
||
|
||
These are out of scope for the PPC instruction audit.
|
||
|
||
---
|
||
|
||
## Index — every PPCBUG referenced (in numerical order)
|
||
|
||
This list intentionally includes every ID found in `audit-findings.md` so nothing is dropped. For each entry's full description / file:line / fix snippet / test recommendation, see the corresponding `### PPCBUG-NNN` heading in `audit-findings.md`.
|
||
|
||
001-022 (batch 1: integer ALU): 001, 002, 003, 004, 005, 006, 007, 008, 009, 010, 011, 012, 013, 014, 015, 016, 017, 018, 019, 020, 021, 022.
|
||
|
||
023 (batch 2 group 6 logic immediate): 023.
|
||
|
||
024-027 (batch 2 group 9 word rotate): 024, 025, 026, 027.
|
||
|
||
028-033 (batch 2 group 7 logic register): 028, 029, 030, 031, 032, 033.
|
||
|
||
034-039 (batch 2 group 8 sign-extend / count-leading-zeros): 034, 035, 036, 037, 038, 039.
|
||
|
||
040-045 (batch 2 group 11 shift): 040, 041, 042, 043, 044, 045.
|
||
|
||
046-047 (batch 2 group 10 doubleword rotate): 046, 047.
|
||
|
||
048-052 reserved (group 12 compare): 048, 049, 050.
|
||
|
||
053-055 (batch 3 group 13 branch): 053, 054, 055.
|
||
|
||
063-067 (batch 3 group 14 trap+sc): 063, 064, 065, 066, 067.
|
||
|
||
068-070 (batch 3 group 15 CR logical): 068, 069, 070.
|
||
|
||
078-085 (batch 3 group 16 SPR/MSR/TB/FPSCR/VSCR): 078, 079, 080, 081, 082, 083, 084, 085.
|
||
|
||
088-089 (batch 3 group 17 cache+sync): 088, 089.
|
||
|
||
090-091 (batch 4 group 18 load byte): 090, 091.
|
||
|
||
095-100 (batch 4 group 19 load halfword): 095, 096, 097, 098, 099, 100.
|
||
|
||
105-111 (batch 4 group 20 load word + reservation): 105, 106, 107, 108, 109, 110, 111.
|
||
|
||
115-118 (batch 4 group 21 load doubleword): 115, 116, 117, 118.
|
||
|
||
123-127 (batch 4 group 22 load multiple/string): 123, 124, 125, 126, 127.
|
||
|
||
128-129 (batch 4 group 23 load float): 128, 129.
|
||
|
||
130-132 (batch 5 group 24 store byte/halfword): 130, 131, 132.
|
||
|
||
140-147 (batch 5 group 25 store word + stwcx): 140, 141, 142, 143, 144, 145, 146, 147.
|
||
|
||
150-153 (batch 5 group 26 store doubleword): 150, 151, 152, 153.
|
||
|
||
160-163 (batch 5 group 27 store multiple/string): 160, 161, 162, 163.
|
||
|
||
165-171 (batch 5 group 28 store float): 165, 166, 167, 168, 169, 170, 171.
|
||
|
||
180-187 (batch 6 group 29 FPU single arithmetic): 180, 181, 182, 183, 184, 185, 186, 187.
|
||
|
||
200-208 (batch 6 group 30 FPU double arithmetic): 200, 201, 202, 203, 204, 205, 206, 207, 208.
|
||
|
||
220-231 (batch 6 group 31 FPU sign/move/compare/convert): 220 [retracted], 221, 222 [retracted], 223, 224, 225, 226 [retracted], 227, 228, 229, 230, 231.
|
||
|
||
240-243 (batch 7 group 32 VMX integer add/sub): 240, 241, 242, 243.
|
||
|
||
275-279 (batch 7 group 33 VMX integer compare/min/max/avg): 275, 276, 277, 278, 279.
|
||
|
||
315-325 (batch 7 group 34 VMX integer logical/shift/rotate): 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325.
|
||
|
||
360-378 (batch 8 group 35 VMX permute/pack): 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378.
|
||
|
||
420-440 (batch 8 group 36 VMX float arith+compare): 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440.
|
||
|
||
482-495 (batch 8 group 37 VMX multiply-sum + special): 482 [retracted], 483 [retracted], 487, 490, 491, 492, 493, 494, 495.
|
||
|
||
510-519 (batch 8 group 38 VMX load/store): 510, 511, 512, 513, 514, 515, 516, 517, 518, 519.
|
||
|
||
560-567 (Phase C1 decoder field extractors): 560, 561, 562, 563, 564, 565, 566, 567.
|
||
|
||
600-605 (Phase C2 decoder opcode-lookup): 600, 601, 602, 603, 604, 605.
|
||
|
||
640-654 (Phase C3 disassembler formatter): 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654.
|
||
|
||
**Counted IDs**: 253. **Retracted**: 220, 222, 226, 482, 483 (5). **Net actionable**: 248.
|
||
|
||
**Counted by phase here**: P1 (~17 IDs), P2 (~17 IDs), P3 (~7 IDs), P4 (~30 IDs), P5 (~30 IDs), P6 (~25 IDs), P7 (~5 IDs), P8 (~50 IDs), Notes (~30 wontfix/informational/retracted). Total accounts for all 253 IDs — every ID is either in a fix phase, the wontfix/informational list, or retracted. **Nothing has been dropped.**
|