Files

MechaCat02 19659d7f76 feat(kernel): KRNBUG-XAM-001 — XGetAVPack returns 8 (HDMI), not 0x16

Mirrors canary's cvars::avpack default (xam_info.cc:35) and Sylpheed's
accepted set {3,4,6,8} (xam_info.cc:250-251). With KRNBUG-XEX-001 having
flipped the priv-10 gate, XGetAVPack now reaches its caller in
sub_824AB578; returning 0x16 caused Sylpheed to abort the AV/crypto
block before XeCryptSha. Cascade walks one step (canary-only export
list 11 → 10); sub_824ABA98 is the next candidate.

Tests: 589 → 590. Goldens re-baselined (n50m: 50000005→50000004,
imports 407417→407416). Lockstep deterministic across 3 reruns at
-n 100M (instructions=100000010, import_calls=987686 +2.4×, swaps=2).
9-PC producer probe still 0×; parked handles 0x1004/0x100c/0x15e0
still signal_attempts=0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-04 18:54:24 +02:00

293 KiB

Raw Blame History

PPC Instruction Audit — Findings Tracker

Started: 2026-04-29 (single session, audit-only) Trigger: addis 32-bit-ABI sign-extension fix surfaced a likely systemic class of bugs. Status: in flight. Per-group reports live in audit-out/. This file is the consolidated, stable-ID index. Workflow: audit only this session; fix session(s) reference these IDs.

Conventions

Every finding has an ID PPCBUG-NNN for cross-referencing.
Status: open (audit found it, not yet fixed) | applied (fix landed) | wontfix (intentional) | dup-of:NNN (collapsed into another finding).
Severity:
- HIGH = wrong arithmetic / control flow on plausible Xbox 360 user code.
- MEDIUM = wrong status flag / latent under broken upstream invariants / edge case.
- LOW = test gap / cosmetic / dead-code-only.
All file:line refs are xenia-rs/crates/xenia-cpu/src/interpreter.rs unless otherwise noted.
Suggested fixes are written as one-line patches where possible; see the per-group report for full context.

Cross-cutting recommendation

The single recurring root cause is violating the 32-bit ABI invariant that all GPR writes truncate to 32 bits. The cleanest fix is to systematically apply as u32 as u64 at every GPR writeback in every integer ALU op. The existing CA/CR0/OE helpers will then be correct without further changes (because their inputs become guaranteed-clean). The audit reports list each fix individually; the fix session may choose to apply them as one sweep or one-at-a-time.

A defensive secondary recommendation: even after the writeback truncation, instructions whose CA computation does its own internal arithmetic on 64-bit operands (subfcx, subfex, addic, addicx, subficx) should additionally truncate their compare operands. This guards against any future regression that re-pollutes the GPR file.

Batch 1 — integer ALU (groups 1-5)

Per-group reports: audit-out/group-01-add-imm.md, group-02-add-reg.md, group-03-sub-reg.md, group-04-multiply.md, group-05-divide.md.

PPCBUG-001 — addi sign-extension, no truncation

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:114-118
Symptom: addi rT, r0, -1 (= li rT, -1) writes 0xFFFFFFFF_FFFFFFFF instead of 0x00000000_FFFFFFFF. Identical shape to addis.

Fix:

ctx.gpr[instr.rd()] = ra_val.wrapping_add(instr.simm16() as i64 as u64) as u32 as u64;

Test gap: existing test_addi only covers positive simm16. Add a test for li rT, -1 and verify the upper 32 bits are zero.

PPCBUG-002 — addic untruncated writeback + 64-bit CA compare

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:133-140
Symptom: (a) GPR writeback not truncated (same shape as addi). (b) CA computed via 64-bit result < ra — Canary's AddDidCarry explicitly truncates both operands to int32 first.

Fix:

let ra32 = ra as u32;
let imm = instr.simm16() as i32 as u32;
let result32 = ra32.wrapping_add(imm);
ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;

Test gap: zero unit tests for addic.

PPCBUG-003 — addicx untruncated writeback + 64-bit CA + CR0 regression

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:141-150
Symptom: same as PPCBUG-002 plus a CR0 regression: live code uses update_cr_signed(0, result as i64) (64-bit signed). The frozen snapshot in ppc-manual/alu/addicx.md shows the previously-correct result as i32 as i64 form. Live code has drifted.
Fix: PPCBUG-002 fix plus update_cr_signed(0, result32 as i32 as i64).
Test gap: zero unit tests.
Note: confirms the manual's frozen snapshots are useful drift detectors — see if other opcodes have similarly regressed.

PPCBUG-004 — mulli untruncated 64-bit signed product

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:159-164
Symptom: RA read as full i64, product stored as u64 without truncation. Per ISA in 32-bit ABI, both factors should be i32 and product should fit in 32 bits (overflow silently wraps per ISA).

Fix:

let ra = ctx.gpr[instr.ra()] as i32 as i64;
let imm = instr.simm16() as i64;
ctx.gpr[instr.rd()] = (ra.wrapping_mul(imm) as u32) as u64;

Test gap: zero unit tests.

PPCBUG-005 — subficx untruncated writeback + 64-bit CA compare

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:151-158
Symptom: (a) imm.wrapping_sub(ra) on 64-bit values writes poisoned upper bits; sign-extended imm for negative SIMM has bits 32-63 set. (b) CA imm >= ra is 64-bit unsigned compare; wrong relative to Canary's 32-bit form.

Fix:

let ra32 = ra as u32;
let imm32 = instr.simm16() as i32 as u32;
let result32 = imm32.wrapping_sub(ra32);
ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;

Test gap: zero unit tests.

PPCBUG-006 — negx active GPR poisoning + 64-bit OE overflow check

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:319-330
Symptom: (a) (!ra).wrapping_add(1) unconditionally sets upper 32 bits to all-ones because !ra flips them. Even a clean r3 = 5 produces 0xFFFFFFFF_FFFFFFFB instead of 0x00000000_FFFFFFFB. This is active, not latent — every neg in 32-bit-ABI code poisons the GPR. (b) neg_ov_64 overflow predicate tests ra == 0x8000_0000_0000_0000 (64-bit INT_MIN) instead of ra == 0x0000_0000_8000_0000 (32-bit INT_MIN).

Fix:

let result = (!(ra as u32)).wrapping_add(1);
ctx.gpr[instr.rd()] = result as u64;
if instr.oe() {
    overflow::apply(ctx, (ra as u32) == 0x8000_0000);
}
if instr.rc_bit() { ctx.update_cr_signed(0, result as i32 as i64); }

Test gap: existing nego_sets_ov_only_on_int_min tests 64-bit INT_MIN — add a 32-bit INT_MIN case.

PPCBUG-007 — subfcx CA via 64-bit unsigned compare

Severity: HIGH (defensive — same shape as the compare that broke addis)
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:258
Symptom: if rb >= ra { 1 } else { 0 } is the exact 64-bit unsigned compare that the addis bug exploited. Wrong CA when either operand has poisoned upper 32 bits. Apply defensively even if all upstream sources are cleaned, because a wrong CA bit is unrecoverable downstream.

Fix:

let ra32 = ra as u32;
let rb32 = rb as u32;
let result32 = rb32.wrapping_sub(ra32);
ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;

Test gap: zero dedicated unit tests for subfcx — the most critical opcode in Group 3 had no coverage. Add 6+ tests including the exact 0x828F3F98 / 0x828F3F68 case from the addis incident.

PPCBUG-008 — subfex CA via 64-bit unsigned compare + `!ra` poisons writeback

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:268-284
Symptom: (a) CA if rb > ra || (rb == ra && ca != 0) is 64-bit; same shape as PPCBUG-007. (b) Writeback uses (!ra).wrapping_add(rb).wrapping_add(ca) — !ra always sets upper 32 bits, guaranteed GPR poison even with clean inputs (same shape as PPCBUG-006).

Fix:

let ra32 = ra as u32;
let rb32 = rb as u32;
let ca = ctx.xer_ca as u32;
let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca);
ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;

PPCBUG-009 — mullwx untruncated 64-bit signed product

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:331-344
Symptom: 32x32 multiply produces 64-bit signed i64 product, written to GPR via as u64 without truncation. When product overflows i32 (which mullw_ov correctly detects), upper 32 bits are non-zero and corrupt downstream 64-bit unsigned compares — same class as addis.

Fix (one line; OE handler unchanged):

ctx.gpr[instr.rd()] = product as u32 as u64;

PPCBUG-010 — divwx quotient sign-extended to 64 bits

Severity: HIGH
Status: open (must be applied in same commit as PPCBUG-011)
Location: interpreter.rs:373
Symptom: (ra / rb) as i64 as u64 sign-extends a negative i32 quotient. -10 / 3 = -3 writes 0xFFFFFFFF_FFFFFFFD instead of 0x00000000_FFFFFFFD. Canary's InstrEmit_divwx uses f.ZeroExtend(v, INT64_TYPE) — explicit zero-extension.
Fix: ctx.gpr[instr.rd()] = (ra / rb) as u32 as u64;

PPCBUG-011 — divwx CR0 update breaks after PPCBUG-010 fix

Severity: MEDIUM (coupled to PPCBUG-010 — must land together)
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:379
Symptom: update_cr_signed(0, ctx.gpr[instr.rd()] as i64) accidentally works today because the sign-extended GPR has consistent sign in i64 view. After PPCBUG-010, GPR holds 0x00000000_FFFFFFFD for -3 and as i64 reads positive — CR0.LT will be wrong for negative quotients.
Fix: ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64);

PPCBUG-012 — addx writeback not truncated (latent)

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:167-179
Symptom: 64-bit wrapping_add result written to GPR untruncated. Latent: only triggers if upstream operands have poisoned upper 32 bits. With PPCBUG-001 etc. unfixed, that invariant is broken — addx amplifies the poison.
Fix: ctx.gpr[instr.rd()] = result as u32 as u64;

PPCBUG-013 — addcx writeback not truncated (latent)

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:180-193
Fix: same shape as PPCBUG-012.

PPCBUG-014 — addex writeback not truncated (latent)

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:194-209
Fix: same shape as PPCBUG-012.

PPCBUG-015 — addzex writeback not truncated (latent)

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:210-224
Fix: same shape as PPCBUG-012.

PPCBUG-016 — addmex writeback not truncated (latent + edge case)

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:225-240
Symptom: same writeback issue plus the wrapping_sub(1) produces all-ones upper 32 bits when low 32 bits underflow — guaranteed poison even if inputs are clean (same shape as PPCBUG-006/008).
Fix: truncate operands and result to 32 bits.

PPCBUG-017 — subfx writeback not truncated (latent)

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:241-253
Fix: same shape as PPCBUG-012.

PPCBUG-018 — subfzex writeback not truncated + `!ra` poisons

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:285-302
Symptom: (!ra).wrapping_add(ca) flips upper 32 bits — guaranteed poison.
Fix: truncate ra to u32, do arithmetic on u32, write as u64.

PPCBUG-019 — subfmex writeback poisoning + always-true CA edge

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:303-318
Symptom: (a) writeback poisoned via (!ra). (b) CA predicate (!ra) != 0 is always true when ra has clean upper 32 bits (because !ra flips them) — so CA is always 1, even in the documented edge case where 32-bit ra == 0xFFFFFFFF && ca == 0 should yield CA=0.
Fix: operate on u32, then xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 }.

PPCBUG-020 — CR0 update uses 64-bit signed compare in all sub-register ops

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Locations: interpreter.rs:250, 264, 281, 299, 315, 327, 341, 379, 396, 410, 419, 428, 445, 462 (every Rc=1 path in groups 2-5)
Symptom: update_cr_signed(0, result as i64) views result as 64-bit signed. In 32-bit ABI, bit 31 determines LT/GT, not bit 63. A result like 0x00000000_80000000 is negative in 32-bit but positive in 64-bit — CR0.LT inverted.
Fix (catch-all): change to result as u32 as i32 as i64 everywhere. Once PPCBUG-001..-019 truncate writebacks, the upper 32 bits of result are zero and this distinction becomes moot — but applying both is cheap and provides defense in depth.
Note: this is one logical fix duplicated across all rc paths; the fix session should grep update_cr_signed(0, .* as i64) to find them all.

PPCBUG-021 — OE overflow checks at bit 63 in all sub-register ops

Severity: LOW
Status: open
Locations: throughout — add_ov_64, sub_ov_64, sum_overflow_64, mullw_ov, etc. (defined in xenia-cpu/src/overflow.rs)
Symptom: signed-overflow check operates on 64-bit boundary. For 32-bit-ABI ops (addo, subfo, subfco, etc.), should check at bit 31. With PPCBUG-006 a tighter form was given for negx. The pattern probably needs systematic review across overflow.rs.
Fix: open a follow-up audit of overflow.rs after batch B completes.

PPCBUG-022 — mulld_ov missing INT_MIN * -1 edge case

Severity: LOW
Status: open
Location: xenia-cpu/src/overflow.rs (mulld_ov helper)
Symptom: 64-bit signed multiply overflow check doesn't handle i64::MIN * -1.
Fix: add the special case to the helper.

PPCBUG-023 — andisx CR0 update uses 64-bit signed compare; should use 32-bit

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:475
Symptom: update_cr_signed(0, ctx.gpr[instr.ra()] as i64) interprets the result as 64-bit signed. The andisx result is bounded by 0x0000_0000_FFFF_0000, which is always non-negative in 64-bit view. In 32-bit ABI, bit 31 is the sign bit — results with bit 31 set (e.g. andis. rA, rS, 0x8000 with rS=0x80000000 → result=0x80000000) should yield CR0.LT=1, but xenia-rs gives CR0.GT=1. The ppc-manual frozen snapshot for andisx shows the correct as i32 as i64 form; the live code has drifted. Common trigger: andis. rA, rS, 0x8000 to test the sign bit of a 32-bit word.

Fix:

ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);

Test gap: zero tests for andisx. Add at minimum: result with bit 31 set (expect LT=1), result with bits 0–30 set (expect GT=1), result=0 (expect EQ=1).

Batch 2 — logical immediate (group 6)

Per-group report: audit-out/group-06-logic-imm.md.

Group 6 summary: only 1 new bug found. The simm16 sign-extension pattern does not apply (all ops use uimm16). ori, oris, xori, xoris, and andix are ISA-correct; andisx has a CR0 interpretation bug (PPCBUG-023). All 6 opcodes have inadequate test coverage (LOW gaps for 5 of them, MEDIUM gap for andisx tied to the bug).

Batch 3 — word rotate-and-mask (group 9)

Per-group report: audit-out/group-09-word-rotate.md.

Group 9 summary: core arithmetic is clean — rlw_mask, rotate logic, and result write are all ISA-correct. The single recurring defect is the Rc=1 CR0 path using as i64 instead of as u32 as i32 as i64 (instances of PPCBUG-020 specific to these three opcodes). rlwimix zeroes the upper 32 bits of RA instead of preserving them per ISA, but this is safe under 32-bit ABI invariant and classified LOW. Test coverage is poor: 1 partial test for rlwinmx, zero for the other two.

PPCBUG-024 — rlwinmx CR0 update uses 64-bit signed compare; should use 32-bit

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:667
Symptom: update_cr_signed(0, ctx.gpr[instr.ra()] as i64) — result is a zero-extended u32, so bit 31 set yields +2147483648 in 64-bit signed view but -2147483648 in 32-bit ABI. CR0.LT/GT inverted for results with bit 31 set. rlwinm. is the most common dot-form instruction in compiler output (all slwi., srwi., clrlwi., bitfield-test-and-branch idioms).
Fix: ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);
Test gap: test_rlwinm exists but non-Rc only, result has bit 31 clear. Add Rc=1 tests with bit 31 set in result.

PPCBUG-025 — rlwimix CR0 update uses 64-bit signed compare; should use 32-bit

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:679
Symptom: same class as PPCBUG-024. rlwimi. is compiler-generated for struct bitfield writes; when the inserted value occupies or sets bit 31 of RA, CR0.LT is wrong.
Fix: ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);
Test gap: zero tests for rlwimix. Add basic insert (non-Rc) + Rc=1 with bit-31-set case.

PPCBUG-026 — rlwnmx CR0 update uses 64-bit signed compare; should use 32-bit

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:690
Symptom: same class as PPCBUG-024. rlwnm. is less frequent but used in variable-shift normalisation patterns.
Fix: ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);
Test gap: zero tests for rlwnmx.

PPCBUG-027 — rlwimix zeroes upper 32 bits of RA instead of preserving them (ISA deviation, LOW)

Severity: LOW
Status: open (no fix action required for 32-bit ABI emulation)
Location: interpreter.rs:677-678
Symptom: let ra = ctx.gpr[instr.ra()] as u32 discards upper 32 bits; result written as as u64 zero-extends. Per ISA, (RA) & ¬MASK(MB+32, ME+32) preserves upper 32 bits of RA. Canary confirms: f.And(f.LoadGPR(i.M.RA), f.LoadConstantUint64(~m)) with ~m non-zero in upper half.
Impact: under 32-bit ABI, if the 32-bit GPR invariant holds, upper 32 bits of RA are already zero before rlwimix, so both behaviours are identical. The deviation is only observable if an upstream bug (PPCBUG-001..023) has leaked non-zero upper bits into RA — in which case rlwimix would silently clean them (beneficial side-effect). No isolated fix needed; resolves automatically when upstream bugs are fixed.
Note: if 64-bit mode support is ever added, this will become a HIGH bug.

Batch 2 — logical register (group 7) [renumbered from collision]

Per-group report: audit-out/group-07-logic-reg.md (note: report uses original IDs PPCBUG-023..029 from the subagent's local numbering; tracker uses PPCBUG-028..033 here to avoid collision with groups 6 and 9).

The group 7 subagent also flagged a CR0 regression across all 8 opcodes — that is an extension of PPCBUG-020 (catch-all for CR0 64-bit-signed regressions). Adding andx, andcx, orx, orcx, xorx, norx, nandx, eqvx Rc=1 paths to PPCBUG-020's scope rather than creating a new ID.

PPCBUG-028 — orcx active GPR poisoning

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:509-513
Symptom: writes rs | !rb. Rust's ! on u64 flips all 64 bits — the upper 32 bits of !rb are unconditionally all-ones, OR'd into the result. With clean inputs orc r5, r3, r4 writes 0xFFFFFFFF_xxxxxxxx. Active poisoning, same shape as PPCBUG-006/008.

Fix: operate on u32, write as u64:

let result = (ctx.gpr[instr.rs()] as u32) | !(ctx.gpr[instr.rb()] as u32);
ctx.gpr[instr.ra()] = result as u64;

Test gap: zero tests.

PPCBUG-029 — norx active GPR poisoning (the `not` simplified mnemonic)

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:519-523
Symptom: writes !(rs | rb) — outer ! flips upper 32 bits unconditionally. nor rA, rS, rS is the canonical not simplified mnemonic used pervasively in PPC code; every not in 32-bit-ABI Xbox 360 binaries actively poisons the GPR.
Fix: u32 arithmetic, write as u64.

PPCBUG-030 — nandx active GPR poisoning

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:524-528
Symptom: writes !(rs & rb) — same shape as norx. The simplified mnemonic nand is also nand rA, rS, rS (= nor . . . in some assemblers).
Fix: u32 arithmetic.

PPCBUG-031 — eqvx active GPR poisoning

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:529-533
Symptom: writes !(rs ^ rb) — same shape. The idiom eqv rA, rS, rS "set rA to all-ones (i.e. -1 in 32-bit ABI)" produces 0xFFFFFFFF_FFFFFFFF instead of 0x00000000_FFFFFFFF.
Fix: u32 arithmetic.

PPCBUG-032 — andx / orx / xorx writeback not truncated (latent)

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Locations: interpreter.rs:494-498 (andx), 504-508 (orx), 514-518 (xorx)
Symptom: 64-bit bitwise on full GPR values. Latent — clean if both operands are clean; pollutes if either is poisoned upstream.
Fix: as u32 as u64 truncation at writeback. Once all upstream poison sources are fixed, these become unnecessary; until then, defensive truncation.

PPCBUG-033 — andcx active poisoning via `!rb` sub-expression

Severity: MEDIUM (the !rb always poisons; outer & masks it away when rs is clean — fully active when rs is poisoned)
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:499-503
Symptom: writes rs & !rb. The !rb always has all-ones upper bits; if rs has clean upper bits (zero), the result is clean. If rs is poisoned upstream, the poison propagates AND the always-set bits in !rb make it look "guaranteed". This is closer to active than latent.
Fix: (rs as u32) & !(rb as u32) then as u64.

Batch 2 — sign-extend / count-leading-zeros (group 8) [renumbered]

Per-group report: audit-out/group-08-extend-clz.md (report uses local IDs PPCBUG-023..030; tracker uses PPCBUG-034..039).

PPCBUG-034 — extsbx writeback sign-extends to 64 bits

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:537
Symptom: as i8 as i64 as u64 — a byte with high bit set (0x80) writes 0xFFFFFFFF_FFFFFF80 instead of 0x00000000_FFFFFF80. Active poisoning on every negative byte. extsb is emitted by compilers to canonicalize signed-byte arguments — common code path.
Fix: ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i8 as i32 as u32 as u64;
Test gap: zero unit tests.
Note: Canary's JIT does the same sign-extension but is rescued by x86's 32-bit-write zeroing the upper 32 of host registers. Pure interpreter has no such escape.

PPCBUG-035 — extshx writeback sign-extends to 64 bits

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:542
Symptom: as i16 as i64 as u64 — same shape as PPCBUG-034 for halfwords.
Fix: ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i16 as i32 as u32 as u64;

PPCBUG-036 — extsbx CR0 coupling

Severity: MEDIUM (must land in same commit as PPCBUG-034)
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:538
Symptom: update_cr_signed(0, ra as i64) — currently latent because the unfixed sign-extended value's i64 sign matches bit 7 of the byte. After PPCBUG-034 lands, the truncated value's i64 view becomes always non-negative — CR0.LT will never fire for negative byte results.
Fix: ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); — must land with PPCBUG-034.

PPCBUG-037 — extshx CR0 coupling

Severity: MEDIUM (must land with PPCBUG-035)
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:543
Symptom: same coupling shape as PPCBUG-036 for halfwords.

PPCBUG-038 — extswx ISA-correct, document asymmetry

Severity: LOW (informational / wontfix)
Status: wontfix
Location: interpreter.rs:547
Symptom: as i32 as i64 as u64 produces full 64-bit sign-extension. This IS the documented purpose of extsw — argument-register canonicalization in 64-bit mode. Behavior is intentional. After PPCBUG-034/035 land, document the asymmetry with extsb/extsh in a comment.

PPCBUG-039 — cntlzdx counts upper 32 always-zero bits in 32-bit ABI

Severity: LOW
Status: open (probably dead code in Xbox 360 binaries)
Location: interpreter.rs:556-562
Symptom: counts leading zeros in full 64. If a 32-bit-ABI binary emits cntlzd, the result is 32 + cntlzw(low32) not cntlzw(low32). ISA-correct for 64-bit mode; only matters if the binary actually emits it.
Test gap: zero tests.

Clean opcodes from group 8

cntlzwx (interpreter.rs:551-555) — (rs as u32).leading_zeros() reads only low 32 bits, result range 0..=32, upper 32 zero. CR0 path benign because result is small. Test gap only, LOW.
extswx CR0 path is correct per ISA (PPCBUG-038 wontfix).

Batch 2 — shift (group 11) [renumbered]

Per-group report: audit-out/group-11-shift.md (uses local IDs PPCBUG-050..055; tracker uses PPCBUG-040..045).

PPCBUG-040 — DECODER BUG: `sh64()` wrong bit order for sradi (HIGH)

Severity: HIGH (this is a decoder-level bug, file:line is in decoder.rs not interpreter.rs)
Status: applied (52b05b1, 2026-05-01)
Location: xenia-rs/crates/xenia-cpu/src/decoder.rs:91-93 (the sh64() accessor method on DecodedInstr)
Symptom: the XS-form sradix (sradi) shift amount is assembled as SH[4:0] << 1 | SH[5] instead of the correct SH[5] << 5 | SH[4:0]. Every sradi rA, rS, N instruction where N is not 0 or 63 executes with a completely wrong shift count. Example: sradi rA, rS, 32 shifts by 1 instead. This is a silent, structural mis-decoding — none of the interpreter changes can paper over it.
Cross-reference: Canary's (i.XS.SH5 << 5) | i.XS.SH pattern is the correct ISA encoding.

Fix: in decoder.rs:sh64() body, swap the bit order:

pub fn sh64(&self) -> u32 {
    // SH5 is at bit 30 of the encoded word; SH[4:0] is at bits 16-20.
    let sh_lo = extract_bits(self.raw, 16, 20);
    let sh_hi = extract_bits(self.raw, 30, 30);
    (sh_hi << 5) | sh_lo
}

Impact: sradi is used by compilers for arithmetic right shifts on 64-bit values. In Xbox 360 32-bit-ABI binaries it should not be common, but it's emitted by some compilers for sign-magnitude conversions and 64-bit fixed-point arithmetic. This is the kind of silent decoder bug the user explicitly wanted the audit to catch.
Test gap: no decoder unit test pins sh64() for non-trivial SH values. Add fixture cases in disasm_goldens.rs for sradi rA, rS, 1, sradi rA, rS, 32, sradi rA, rS, 63.
Note: any other instruction that uses the same XS-form SH split-encoding is suspect. Phase C decoder audit must verify sradi and sradix are the only consumers of sh64().

PPCBUG-041 — srawx writeback sign-extends to 64 bits

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Locations: interpreter.rs:583, 588 (two writeback paths for the count<32 and count>=32 branches)
Symptom: result as i64 as u64 violates the 32-bit-ABI zero-extension convention. A negative shifted value writes 0xFFFFFFFF_xxxxxxxx instead of 0x00000000_xxxxxxxx.
Fix: result as u32 as u64 in both writeback paths.
Note: subagent verified the CA computation is independently correct — uses (rs as u32) << (32 - sh) != 0 which is the canonical ISA shifted-out-bits test on 32-bit operands. Do not change CA logic.

PPCBUG-042 — srawix writeback sign-extends to 64 bits

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Locations: interpreter.rs:600, 605 (same shape as PPCBUG-041 for srawi)
Fix: result as u32 as u64.

PPCBUG-043 — srawx / srawix CR0 coupling

Severity: MEDIUM (must land with PPCBUG-041 and PPCBUG-042)
Status: applied (P4 d945aea, 2026-05-02)
Locations: interpreter.rs:593, 607
Symptom: currently masked by the sign-extended writeback (sign-extension makes the 64-bit and 32-bit sign agree). After truncating the writeback, as i64 will misread the sign for negative results.
Fix: as u32 as i32 as i64 in both Rc=1 paths, applied with PPCBUG-041/042.

PPCBUG-044 — slwx / srwx CR0 misclassifies negative 32-bit results

Severity: LOW (zero-extended results have bit 31 set in low 32, but always positive in i64 view → CR0.LT never fires for slw/srw with bit-31-set results)
Status: applied (P4 d945aea, 2026-05-02)
Locations: interpreter.rs:568, 576
Fix: as u32 as i32 as i64.

PPCBUG-045 — Zero unit tests for any shift opcode

Severity: LOW (test gap only)
Status: open
Locations: interpreter.rs:563-658 (entire shift group: slwx, srwx, srawx, srawix, sldx, srdx, sradx, sradix)
Recommendation: add at least one functional test per opcode. Especially: srawix r3, r3, 1 with rs=0xFFFFFFFE (CA should be 0), srawix r3, r3, 1 with rs=0x80000001 (CA should be 1, result=0xC0000000); sradix r3, r3, 32 (currently wrong per PPCBUG-040).

Clean opcodes from group 11

slwx writeback at line 568 (zero-ext 32-bit result via (rs as u32 << count) as u64) — clean.
srwx writeback at line 576 — clean.
sldx, srdx, sradx — 64-bit ops, ISA-correct (probably dead in 32-bit-ABI binaries).
sradix body logic is structurally correct; failure is solely from PPCBUG-040 giving it a wrong shift count.

Batch 2 — doubleword rotate (group 10) [renumbered]

Per-group report: audit-out/group-10-dword-rotate.md (uses local IDs PPCBUG-027/028; tracker uses PPCBUG-046/047).

PPCBUG-046 — DECODER BUG: wrong bit position for MB[5] in all 6 doubleword-rotate opcodes (HIGH)

Severity: HIGH (decoder-level; impacts the canonical zero-extend-to-32 idiom)
Status: applied (52b05b1, 2026-05-01)
Locations: interpreter.rs — every arm of rldiclx, rldicrx, rldicx, rldimix, rldclx, rldcrx (lines 693-754)
Symptom: each arm computes let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1). The bit at (instr.raw >> 1) & 1 is PPC bit 30, which in MD form is sh[0] (the low bit of the shift amount) — NOT mb[5]. The high bit of the 6-bit MB field lives at PPC bit 26 = (instr.raw >> 5) & 1.

As written, the code computes (mb[4:0] << 1) | sh[0]. Ironically disasm.rs:1256 (the mb_md() helper) has the correct formula. The interpreter was written independently with the wrong bit position — probably a copy-error from sh64() where bit 30 really is the split bit.
Concrete impact:
- clrldi r3, r4, 32 is the canonical "zero-extend low 32 bits" idiom emitted constantly in 32-bit-ABI PPC code. Encoded as rldicl r3, r4, 0, mb=32. With mb=32, mb[5]=1, mb[4:0]=0. The interpreter decodes mb=0 → mask is all-ones → instruction becomes a no-op. Any downstream 64-bit compare (subfcx CA, cmpld) on that register sees a polluted 64-bit value instead of a clean 32-bit zero-extended one. This is the same class of bug that caused the addis/BST incident.
- For rldcr (MDS form), the XO field's LSB at bit 30 is always 1 (Rc=0 opcode), so me[5] is forcibly set to 1 for every non-record-form invocation — effectively adding 32 to all me values.
Fix (one line per opcode):
```
// Replace in all 6 arms:
let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1);
// With:
let mb = instr.mb() | (((instr.raw >> 5) & 1) << 5);
```
Or, cleaner: expose mb_md() (currently in disasm.rs:1256) as a method on DecodedInstr in decoder.rs and have the interpreter call instr.mb_md() — single source of truth for MD-form mb extraction.
Test gap: zero execution tests for any of the 6 opcodes; only disasm-golden string-output tests.
Note: this is the second decoder bug found by the audit (PPCBUG-040 / sh64() for sradi is the first). Phase C decoder audit must verify whether other MD/MDS/XS form accessors have similar bit-position errors.

PPCBUG-047 — Zero execution tests for any doubleword-rotate opcode

Severity: LOW (test gap)
Status: open
Locations: interpreter.rs:693-754 (all 6 opcodes)
Recommendation: at minimum, a clrldi r3, r4, 32 test verifying the result is exactly the low 32 bits of r4. After PPCBUG-046 lands, this test would have caught the MB-reconstruction bug.

What's correct in group 10

sh64() accessor — correctly reconstructs 6-bit shift from MD split encoding (cross-check: disasm.rs agrees).
rld_mask_left() / rld_mask_right() mask helpers — verified against Canary's XEMASK.
rldicx/rldimix mask formulas (63 - sh for right edge) — correct.
rldimix read-modify-write merge — correct 64-bit mask-insert.
CR0 as i64 — correct here because these ARE genuine 64-bit ops (unlike word rotate).
rldcl/rldcr register-shift extraction (gpr[rb] & 0x3F) — correct.
No 32-bit writeback truncation needed: these are intentionally 64-bit; 32-bit-ABI compilers only emit them with masks that yield 32-bit-clean results.

Batch 3 — branch (group 13)

Per-group report: audit-out/group-13-branch.md.

Group 13 summary: the branch implementation is substantively correct. All BO/BI bit masks, CTR decrement-before-test ordering, AA absolute vs relative dispatch, LK unconditional write (including not-taken path in bcx), LR-read-before-LR-write atomicity in bclrx, and get_cr_bit() field indexing are all ISA-correct and match Canary. The only execution bugs are a latent 64-bit CTR zero-test (PPCBUG-053/054, active under current GPR-pollution environment) and severely thin test coverage (PPCBUG-055).

PPCBUG-053 — CTR zero-test uses 64-bit compare; should use 32-bit in `bcx`/`bclrx`

Severity: MEDIUM (effectively HIGH given unfixed PPCBUG-001..031 GPR pollution)
Status: applied (3d8e2ce, 2026-05-02)
Locations: interpreter.rs:849 (bcx ctr_ok), interpreter.rs:879 (bclrx ctr_ok)
Symptom: ctx.ctr != 0 compares all 64 bits. In 32-bit ABI the CTR is logically 32-bit. Canary explicitly truncates to 32 bits: ctr = f.Truncate(ctr, INT32_TYPE). When CTR upper 32 bits are non-zero (due to upstream GPR pollution flowing through mtspr CTR, rN), the 64-bit test disagrees with the 32-bit ISA semantic. Most dangerous with neg; mtctr; bdnz: negx (PPCBUG-006) always sets upper 32 bits, so the 32-bit CTR counter can reach zero while the 64-bit CTR is still non-zero → infinite loop.

Fix:

// Replace in both bcx and bclrx:
let ctr_ok = (bo & 0b00100) != 0
    || (((ctx.ctr as u32) != 0) ^ ((bo & 0b00010) != 0));

Or, alternatively, truncate at decrement:

if bo & 0b00100 == 0 {
    ctx.ctr = ctx.ctr.wrapping_sub(1) as u32 as u64;
}

Test gap: zero tests for CTR-decrement branches (bdnz, bdz, bdnzt, bdnzf, bdzt, bdzf).

PPCBUG-054 — `mtspr CTR` writeback not truncated to 32 bits

Severity: MEDIUM
Status: applied (3d8e2ce, 2026-05-02)
Location: interpreter.rs:1411
Symptom: crate::context::spr::CTR => ctx.ctr = val writes the full 64-bit GPR to CTR. Acts as a firewall gap: any upstream 64-bit GPR pollution flows directly into CTR, where it will be tested by PPCBUG-053's 64-bit comparison. Defensive fix prevents CTR from ever acquiring non-zero upper 32 bits independently of the GPR-pollution fix.
Note: the bcctrx branch-target read ((ctx.ctr as u32) & !3) already truncates correctly; the bug is confined to the ctr != 0 zero-test in bcx/bclrx.
Fix: crate::context::spr::CTR => ctx.ctr = val as u32 as u64,
Cross-reference: Group 16 (SPR/MSR) subagent should verify this write-point.

PPCBUG-055 — Severely inadequate test coverage for all four branch opcodes

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Locations: interpreter.rs test module (lines 4455–4491)
Current coverage: bx forward (1 test), bl LR update (1 test), bcx taken beq (1 test via test_cmp_and_bc). Zero tests for: bclrx, bcctrx, any CTR-decrement variant, not-taken path, backward branch, AA=1 absolute, bcl LR-write-on-not-taken.
Recommended minimum: blr, bctr, bdnz (taken and not-taken at boundary CTR=1), bclrl old-LR-as-target, bcl LK-write-on-not-taken. See per-group report for concrete encoding patterns.

Batch 3 — trap + system call (group 14)

Per-group report: audit-out/group-14-trap-sc.md.

Group 14 summary: the core trap evaluation (trap.rs) is correct — TO bit constants, signed/unsigned comparison dispatch, and word-vs-doubleword width handling are all ISA-conformant. The live interpreter arm properly evaluates the TO field (replacing the old unconditional-trap stub). Three MEDIUM issues found: PC ordering on trap return, missing LEV dispatch for sc, and the Xbox 360 typed-trap convention (twi 31, r0, IMM) not handled. Two LOW findings for stale manual snapshots and test gaps.

PPCBUG-063 — `ctx.pc` already at CIA+4 when `StepResult::Trap` returns

Severity: MEDIUM
Status: applied (P6 112202c, 2026-05-02)
Location: interpreter.rs:1543 (ctx.pc += 4) before interpreter.rs:1549 (return StepResult::Trap)
Symptom: any trap handler that reads ctx.pc to find the faulting instruction sees CIA+4 instead of CIA. The existing tracing::warn! compensates with .wrapping_sub(4), confirming the asymmetry. On real hardware, SRR0 = CIA (trapping instruction address). Current risk LOW (no handler inspects pc), but HIGH if any SEH/exception-delivery path is added (critical for the C++ throw investigation).
Fix: save CIA before incrementing, restore it when firing the trap:
```
let trap_pc = ctx.pc;
ctx.pc += 4;
if fired { ctx.pc = trap_pc; return StepResult::Trap; }
```
Alternatively store CIA in a separate ctx.srr0-equivalent field and leave ctx.pc at NIA.
Note: sc correctly leaves ctx.pc at NIA (the return address) — that is a different and correct design choice. The inconsistency between sc and trap is the bug.

PPCBUG-064 — `sc` ignores `LEV` field; `sc 2` (HVcall) silently misdispatched

Severity: MEDIUM
Status: applied (P6 112202c, 2026-05-02)
Location: interpreter.rs:915-918
Symptom: sc 2 (Xbox 360 hypervisor call) returns StepResult::SystemCall identically to sc 0. Canary dispatches LEV=0 to syscall_handler and LEV=2 to f.function() (the HVcall path). For pure game-title code (LEV=0 only) this is invisible; XDK kernel-mode components and some HV-aware titles may use sc 2.
Fix: decode the 7-bit LEV field (bits 20-26 of SC-form encoding), add a HypervisorCall variant to StepResult, and dispatch accordingly.

PPCBUG-065 — `twi 31, r0, IMM` typed-trap not handled; SIMM type code discarded

Severity: MEDIUM
Status: applied (P6 112202c, 2026-05-02)
Location: interpreter.rs:1532-1551 (trap arm)
Symptom: twi 31, r0, IMM (TO=31=unconditional, RA=r0) is used by the Xbox 360 CRT/kernel to encode typed C++ exceptions — the 16-bit SIMM carries the exception type discriminator. xenia-rs fires the trap correctly but discards SIMM. The caller sees a generic StepResult::Trap with no type information, preventing correct C++ SEH dispatch.
Canary reference: ppc_emit_control.cc:611-616 special-cases RA==0 && TO==31 and calls f.Trap(type) with the SIMM as the type code.
Fix: add a trap_type: Option<u16> payload to StepResult::Trap. Detect twi with to()==31 and ra()==0 and populate it with instr.simm16() as u16.
Note: directly relevant to the Sylpheed std::runtime_error throw investigation (project_xenia_rs_sylpheed_throw_2026_04_28.md) — the typed-trap SIMM carries the CRT exception class that the kernel uses to route to the correct handler.

PPCBUG-066 — Stale frozen snapshots in ppc-manual for td/tdi/tw/twi

Severity: LOW
Status: applied (P7 manual regen, 2026-05-02)
Location: ppc-manual/branch/td.md, tdi.md, tw.md, twi.md
Symptom: all four show the old unconditional-trap stub (// For now, just trace and continue) instead of the current TO-field-evaluating implementation.
Fix: regenerate after PPCBUG-063 and PPCBUG-065 are resolved.

PPCBUG-067 — Test gaps for trap and sc

Severity: LOW
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs #[cfg(test)] mod tests
Missing coverage: sc smoke test (fires SystemCall, advances PC); td vs tw on 64-bit-clean operands (width discrimination); tdi/td signed/unsigned LT/GT conditions; tw 31, r0, r0 unconditional trap encoding; twi 31, r0, N typed-trap; negative simm16 in twi.

Batch 3 — SPR / MSR / TB / FPSCR / VSCR moves (group 16)

Per-group report: audit-out/group-16-spr-msr.md.

Group 16 summary: the core paths are clean — mfcr, mtcrf, mfspr, mtspr, mftb, mffsx, mtfsfx, mtfsb0x, mtfsb1x, mtfsfix, mfvscr, mtvscr are all functionally ISA-correct. The spr() decoder accessor correctly inverts the PPC XFX half-swap encoding. The one MEDIUM finding is mtmsrd silently ignoring the L=1 partial-MSR-write semantics. Five LOW test-gap findings cover near-total absence of unit tests for this entire group.

PPCBUG-078 — `mtmsrd` L=1 partial-MSR-write not modelled

Severity: MEDIUM
Status: applied (P6 112202c, 2026-05-02)
Location: interpreter.rs:1458-1461
Symptom: xenia-rs merges mtmsr and mtmsrd into a single body that unconditionally writes ctx.msr = ctx.gpr[instr.rs()]. PowerISA specifies that mtmsrd with instruction bit 15 (L) = 1 performs a partial update: only MSR[EE] (u64 bit 15) and MSR[RI] (u64 bit 0) are modified; all other MSR bits preserved. Kernel code using mtmsrd L=1 to re-enable external interrupts silently corrupts the entire MSR in xenia-rs. Canary acknowledges the same TODO.

Fix:

PpcOpcode::mtmsrd => {
    let l = (instr.raw >> (31 - 15)) & 1;
    if l == 1 {
        let mask: u64 = (1u64 << 15) | 1u64;
        let rs = ctx.gpr[instr.rs()];
        ctx.msr = (ctx.msr & !mask) | (rs & mask);
    } else {
        ctx.msr = ctx.gpr[instr.rs()];
    }
    ctx.pc += 4;
}

Test gap: zero tests for mtmsr or mtmsrd.

PPCBUG-079 — `mtspr` silent drop of unknown-SPR writes without value logging

Severity: LOW
Status: open
Location: interpreter.rs:1430-1433
Symptom: Unknown SPR writes are silently discarded with only a tracing::warn!() that omits the value being written. Reduces debuggability; no correctness impact for known Xbox 360 titles.
Fix (optional): tracing::warn!("mtspr: unimplemented SPR {} <= 0x{:016x}", spr, val).

PPCBUG-080 — `mfvscr` does not zero the upper 96 bits of VD per ISA

Severity: LOW
Status: applied (P6 112202c, 2026-05-02)
Location: interpreter.rs:2198-2201
Symptom: ISA requires mfvscr VD to place VSCR in the rightmost word of VD and zero bytes 0-11. xenia-rs copies the full 128-bit ctx.vscr into ctx.vr[VD], leaving stale data in bytes 0-11 if ctx.vscr was populated from a non-zeroed vector. Canary explicitly zero-extends.

Fix:

PpcOpcode::mfvscr => {
    let vscr_word = ctx.vscr.as_u32x4()[3];
    ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array([0, 0, 0, vscr_word]);
    ctx.pc += 4;
}

PPCBUG-081 — Zero unit tests for `mfcr` / `mtcrf`

Severity: LOW
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:1436-1453
Recommended additions: full mfcr round-trip; mtcrf 0xFF; mtcrf 0x80 (CR0 only); mtcrf 0x38 (ABI CR2|CR3|CR4 restore).

PPCBUG-082 — Minimal unit tests for `mfspr` / `mtspr`

Severity: LOW
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:1376-1435
Note: only DEC and TBL_WRITE covered; add LR, CTR, XER, TBL/TBU, VRSAVE.

PPCBUG-083 — Zero unit tests for `mftb`

Severity: LOW
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:1462-1470

PPCBUG-084 — Zero interpreter-level round-trip tests for FPSCR move instructions

Severity: LOW
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:2678-2720
Note: fpscr.rs helper-level tests exist; interpreter dispatch (mffsx, mtfsfx, mtfsb0x, mtfsb1x, mtfsfix) is untested end-to-end.

PPCBUG-085 — Zero unit tests for `mfvscr` / `mtvscr`

Severity: LOW
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:2198-2205

IDs PPCBUG-086 and PPCBUG-087 are unallocated — reserved for group 16 follow-up findings.

Batch 3 — cache + sync (group 17)

Per-group report: audit-out/group-17-cache-sync.md.

Group 17 summary: the cleanest group audited so far. Both dcbz and dcbz128 have correct EA computation (ra=0 special case, 64-bit→u32 truncation, alignment masks & !31 / & !127, byte counts 32/128). The nine no-op opcodes (dcbf, dcbi, dcbst, dcbt, dcbtst, icbi, sync, eieio, isync) are all listed in one arm and complete. The dcbz128 Xbox 360 specific opcode (RT=1 bit distinguishes from dcbz) dispatches correctly. 0 HIGH, 0 MEDIUM, 2 LOW findings.

PPCBUG-088 — sync disasm ignores L field; `lwsync` (L=1) shows as "sync"

Severity: LOW
Status: open
Location: xenia-rs/crates/xenia-cpu/src/disasm.rs:364
Symptom: The PpcOpcode::sync disasm arm outputs "sync" unconditionally regardless of the L field (PPC bit 10). When L=1 (word 0x7C2004AC), the instruction should disassemble as "lwsync". The extended_mnemonics.json golden already accepts "sync" as output for the lwsync case, meaning the test currently passes with the wrong string.
Impact: Disassembly output for lwsync (very common in Xbox 360 acquire-barrier idioms) shows as sync. No interpreter impact; both L=0 and L=1 are correctly treated as no-op PC advance.

Fix:

PpcOpcode::sync => {
    // L field at PPC bit 10
    if extract_bits(instr.raw, 10, 10) == 1 {
        base("lwsync", String::new(), 0)
    } else {
        base("sync", String::new(), 0)
    }
}

Update extended_mnemonics.json golden to add "ext_mnemonic": "lwsync" for that entry.

PPCBUG-089 — Zero interpreter execution tests for group 17

Severity: LOW
Status: applied (P8 4029041, 2026-05-02)
Location: xenia-rs/crates/xenia-cpu/src/interpreter.rs (test module)
Symptom: No #[test] covers dcbz, dcbz128, or any no-op (sync/isync/eieio/dcbf/icbi). A regression in dcbz byte count or alignment would go undetected.
Recommended additions: dcbz with misaligned address (verifies 32-byte aligned zero), dcbz128 with misaligned address (verifies 128-byte aligned zero), both ra=0 and ra!=0 cases, sync/isync/dcbf no-op PC-advance smoke tests.

Batch 3 — CR logical + CR moves (group 15)

Per-group report: audit-out/group-15-cr-logical.md.

Group 15 summary: cleanest group audited to date. All 8 CR logical ops (crand, crandc, creqv, crnand, crnor, cror, crorc, crxor), mcrf, and mcrxr are ISA-correct. The cr_logical helper's use of fn(bool, bool) -> bool prevents the !u64 bit-pollution class (PPCBUG-028–031 in group 7). CR bit indexing in get_cr_bit/set_cr_bit is correct (bit/4 = field, bit%4 = within-field sub-index matching PPC MSB-0 numbering, with sub {0=LT, 1=GT, 2=EQ, 3=SO}). mcrxr correctly maps XER{SO,OV,CA} to CR{LT,GT,EQ} with SO=false and unconditionally clears the XER bits. mcrfs nibble extraction, field shift formula (28 - crfs*4), and CLEARABLE_MASK (all 14 ISA-clearable exception bits, no FEX/VX) are all correct. One MEDIUM ISA violation: mcrfs omits VX summary recomputation. Two LOW findings: a misleading test comment and zero coverage for all 8 CR logical ops + mcrf.

PPCBUG-068 — `mcrfs` does not recompute VX summary bit after clearing VX* exception bits

Severity: MEDIUM
Status: applied (P6 112202c, 2026-05-02)
Location: interpreter.rs:4250 (ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK))
Symptom: When mcrfs clears VX* exception bits (VXSNAN, VXISI, VXIDI, VXZDZ, VXIMZ, VXVC, VXSOFT, VXSQRT, VXCVI) from any source field, the VX summary bit (FPSCR[2], fpscr::VX = 1<<29) is left stale. If those VX* bits were the only contributors to VX, it should become 0 but remains 1. A subsequent mcrfs cr0, 0 will then report VX=1 in CR0.EQ, misleading the caller into thinking an invalid-operation exception is still active.

Fix:

// After ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK); add:
if (ctx.fpscr & fpscr::VX_ALL) != 0 {
    ctx.fpscr |= fpscr::VX;
} else {
    ctx.fpscr &= !fpscr::VX;
}
// FEX recomputation omitted — xenia doesn't model enabled-exception dispatch.

Test gap: existing test only covers crfS=0 (FX+OX) — no VX* bits involved. Add a test that sets only VXSNAN, runs mcrfs cr0, 1, then verifies VX is now 0.

PPCBUG-069 — `mcrfs` test comment claims OX(so)=0 but OX is set in the test

Severity: LOW (cosmetic; the assert is correct, only the comment is wrong)
Status: open
Location: interpreter.rs:5402
Symptom: Comment reads "FX(lt)=1 and OX(so)=0". FPSCR was set to (1<<31)|(1<<28), which sets both FX and OX. The nibble is 0b1001, so so=true. The assert cr[2].as_u8() == 0b1001 is correct; only the comment is wrong.
Fix: // FX(lt)=1, FEX(gt)=0, VX(eq)=0, OX(so)=1 → 0b1001 = 9

PPCBUG-070 — Zero execution tests for all 8 CR logical ops and `mcrf`

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Locations: interpreter.rs:1473–1484
Missing minimum: crclr idiom (crxor BT,BT,BT, BT=1 → 0), crset idiom (creqv BT,BT,BT, BT=0 → 1), crmove idiom (cror BT,BA,BA), crnot idiom (crnor BT,BA,BA, BA=1 → 0), cross-field crand/crandc, and a full mcrf cr0, cr3 field-copy + source-field-intact test.

Pre-pass hints REFUTED by audit

These were flagged by the orchestrator's regex scan but the subagents found them to be safe:

divwux writeback (interpreter.rs:390) — both operands cast to u32 before division, as u64 zero-extends correctly. Clean.
mulhwx intermediate cast (interpreter.rs:349) — ((result >> 32) as i32 as i64 as u64) & 0xFFFF_FFFF is redundant but the trailing mask saves correctness. Cosmetic only.
mulhwux writeback (interpreter.rs:359) — (result >> 32) & 0xFFFF_FFFF clean unsigned. Clean.
CR0 stale-prepass-claim: pre-pass document mentioned result as i32 as i64; live code actually uses result as i64 — so the claim that the live form is i64 is correct, but the prepass implied an i32 form was already there. PPCBUG-020 is the real finding.

Batch 4 — load float (group 23)

Per-group report: audit-out/group-23-load-float.md.

Group 23 summary: the double-precision load family (lfd, lfdu, lfdux, lfdx) is fully ISA-correct — EA computation, endianness, update-form writeback, and bit-pattern fidelity are all clean. The single-precision family (lfs, lfsu, lfsux, lfsx) has one HIGH bug: Rust's as f64 float cast compiles to x86 CVTSS2SD which unconditionally sets the IEEE quiet bit in the output, silently converting f32 SNaN loads to f64 QNaN. The ISA requires the SNaN to pass through unchanged. FPSCR.NI does not apply to loads (correct by omission). One LOW test-gap finding. 2 IDs used (PPCBUG-128, PPCBUG-129). 8 IDs unallocated (PPCBUG-130..137).

PPCBUG-128 — lfs/lfsu/lfsx/lfsux silently quieten SNaN via `as f64` Rust float cast

Severity: HIGH
Status: open
Locations: interpreter.rs:1064 (lfs), 1070 (lfsx), 1087 (lfsu), 1093 (lfsux)
Symptom: All four single-precision load arms use mem.read_f32(ea) as f64 where read_f32 = f32::from_bits(read_u32(ea)). The as f64 Rust float cast compiles to x86 CVTSS2SD, which unconditionally sets bit 51 of the f64 mantissa (the IEEE quiet/signalling discriminator bit) for any NaN input. An f32 SNaN (e.g. 0x7F800001) is loaded and written to the FPR as the f64 QNaN 0x7FF8000002000000 instead of the SNaN 0x7FF0000002000000.

ISA requirement: "A signalling NaN passes through unchanged into the FPR — it will signal at the next FP arithmetic instruction." (lfs.md Special Cases). The FPR must hold the SNaN; VXSNAN fires at the consuming arithmetic op, not at the load.

Impact: (a) Game code storing f32 SNaN sentinels (physics engines mark unset float slots with SNaN) and then loading+inspecting them: fpscr::is_snan(ctx.fpr[rd]) returns false after the load, breaking sentinel detection. (b) Arithmetic ops consuming the loaded value see a QNaN rather than SNaN, so VXSNAN is never set; games relying on VXSNAN to detect uninitialized-read bugs get false negatives.
Canary parity: Canary's JIT also uses CVTSS2SD via f.Convert(). Both emulators share this deviation. The bug is a structural consequence of using semantic float widening rather than a bit-pattern-preserving widening routine.

Fix: replace the float cast with a bit-manipulation widening that preserves the SNaN bit:

fn widen_f32_bits_to_f64(raw32: u32) -> u64 {
    let sign = ((raw32 >> 31) as u64) << 63;
    let exp32 = ((raw32 >> 23) & 0xFF) as i32;
    let mant32 = (raw32 & 0x007F_FFFF) as u64;
    if exp32 == 0xFF {
        // NaN or Infinity — propagate mantissa left-shifted by 29 bits.
        // SNaN (bit22=0) stays SNaN (bit51=0); QNaN (bit22=1) stays QNaN (bit51=1).
        sign | (0x7FFu64 << 52) | (mant32 << 29)
    } else if exp32 == 0 {
        // ±Zero or subnormal f32.
        if mant32 == 0 { return sign; } // ±zero
        // Subnormal: normalize by finding leading bit, then adjust exponent.
        let shift = mant32.leading_zeros() - (64 - 23);
        let exp64 = (1023u64 - 126).wrapping_sub(shift as u64);
        let mant64 = (mant32 << (shift + 1 + 29)) & 0x000F_FFFF_FFFF_FFFF;
        sign | (exp64 << 52) | mant64
    } else {
        // Normal f32 → normal f64.
        let exp64 = (exp32 as u64) - 127 + 1023;
        sign | (exp64 << 52) | (mant32 << 29)
    }
}
// In each lfs* arm:
ctx.fpr[instr.rd()] = f64::from_bits(widen_f32_bits_to_f64(mem.read_u32(ea)));

This function also correctly handles subnormal f32 → normal f64 widening (which the as f64 cast already gets right numerically, but now goes through a consistent code path).

Test gap: add a test loading an f32 SNaN (0x7F800001) via lfs and asserting fpscr::is_snan(ctx.fpr[rd]) is true and bit 51 of ctx.fpr[rd].to_bits() is 0.

PPCBUG-129 — Zero interpreter execution tests for all 8 float-load opcodes

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Locations: interpreter.rs test module; tests/disasm_goldens.rs:249-250 (disasm-only)
Symptom: No #[test]-decorated function exercises any float-load interpreter arm. A regression in EA computation, endianness, f32→f64 widening, or update-form writeback would go undetected. The SNaN bug (PPCBUG-128) was undetected partly due to this gap.
Recommended minimum:
1. lfs normal: 0x3F800000 (1.0f32) → assert fpr[rd] == 1.0f64 exact.
2. lfs negative displacement: base minus 4.
3. lfs ra=0 path (absolute addressing).
4. lfd normal: store PI bits, assert exact bit equality via .to_bits().
5. lfd SNaN: store 0x7FF0_0000_0000_0001u64, assert exact bit equality after load.
6. lfsu / lfsux / lfdu / lfdux: verify loaded FPR value AND rA update address.
7. After PPCBUG-128 fix: lfs SNaN round-trip test.

IDs PPCBUG-130 through PPCBUG-137 are unallocated — no further bugs found in group 23.

Files modified by the audit

xenia-rs/audit-prepass-findings.md — Phase A pre-pass red flags (orchestrator regex output).
xenia-rs/audit-out/group-01-add-imm.md — Group 1 report (Sonnet subagent).
xenia-rs/audit-out/group-02-add-reg.md — Group 2 report.
xenia-rs/audit-out/group-03-sub-reg.md — Group 3 report.
xenia-rs/audit-out/group-04-multiply.md — Group 4 report.
xenia-rs/audit-out/group-05-divide.md — Group 5 report.
xenia-rs/audit-out/group-06-logic-imm.md — Group 6 report.
xenia-rs/audit-out/group-09-word-rotate.md — Group 9 report.
xenia-rs/audit-out/group-13-branch.md — Group 13 report.
xenia-rs/audit-out/group-14-trap-sc.md — Group 14 report.
xenia-rs/audit-out/group-15-cr-logical.md — Group 15 report.
xenia-rs/audit-out/group-16-spr-msr.md — Group 16 report.
xenia-rs/audit-out/group-17-cache-sync.md — Group 17 report.
xenia-rs/audit-out/group-18-load-byte.md — Group 18 report.
xenia-rs/audit-out/group-19-load-halfword.md — Group 19 report.
xenia-rs/audit-out/group-21-load-doubleword.md — Group 21 report.
xenia-rs/audit-out/group-22-load-mlsr.md — Group 22 report.
xenia-rs/audit-out/group-23-load-float.md — Group 23 report.
xenia-rs/audit-out/group-24-store-byte-half.md — Group 24 report.
xenia-rs/audit-out/group-26-store-doubleword.md — Group 26 report.
xenia-rs/audit-findings.md — this consolidated tracker.

No source code under xenia-rs/crates/ has been modified.

Batch 4 — load byte (group 18)

Per-group report: audit-out/group-18-load-byte.md.

Group 18 summary: cleanest group audited to date — zero HIGH or MEDIUM bugs. All four opcodes (lbz, lbzu, lbzx, lbzux) are ISA-correct: EA computation (rA=0 special case, D-field sign-extension, 32-bit EA truncation), zero-extension of the byte result to 64 bits, and update-form writeback all match the ISA spec and Canary cross-reference. Two LOW findings only.

PPCBUG-090 — lbzu/lbzux: rD==rA "invalid form" silently misloads rD

Severity: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
Status: open
Location: interpreter.rs:951-956 (lbzu), 963-968 (lbzux)
Symptom: When rD == rA (invalid form, UISA undefined), the byte load into gpr[rD] at line 953/965 is immediately overwritten by the EA writeback at line 954/966. Net result: gpr[rD] holds the EA, not the loaded byte. Canary has the same behaviour. No practical impact under normal compiler output.
Recommendation: add debug_assert!(instr.rd() != instr.ra()) in debug builds.

PPCBUG-091 — Zero interpreter execution tests for all four lbz* opcodes

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module; disasm_goldens.rs:247 (disasm-only, no execution)
Symptom: No #[test] exercises lines 945-968. A regression in EA computation, zero-extension, or the update writeback would go undetected.
Recommended minimum: lbz with ra=0 + negative displacement; lbzu normal case (verify both byte result and rA update); lbzx with ra=0; lbzux normal case. Each test should assert gpr[rD] <= 0xFF to catch any future accidental sign-extension.

IDs PPCBUG-092, PPCBUG-093, PPCBUG-094 are unallocated — no further bugs found in group 18.

Batch 4 — load halfword (group 19)

Per-group report: audit-out/group-19-load-halfword.md.

Group 19 summary: 4 HIGH bugs confirmed — all pre-pass flags validated. The four lha* opcodes (lha, lhax, lhau, lhaux) all use as i16 as i64 as u64, sign-extending a negative halfword to 64 bits in violation of the 32-bit ABI. Every negative halfword load (common for int16_t PCM samples, packed vertex deltas, short[] arrays) actively poisons the upper 32 bits of the destination GPR — identical shape to the addis bug. The four lhz* opcodes and lhbrx are all clean (as u64 zero-extension; swap_bytes() as u64 byte-reversal; correct endian handling; correct EA computation and update writebacks). Two LOW findings: rD==rA invalid-form in update variants, and zero unit tests for all nine opcodes.

PPCBUG-095 — `lha`: GPR writeback sign-extends to 64 bits

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:990
Symptom: mem.read_u16(ea) as i16 as i64 as u64 — memory 0x8000 writes 0xFFFFFFFF_FFFF8000 instead of 0x00000000_FFFF8000. Active GPR poisoning for every negative halfword. Common trigger: int16_t struct fields, PCM samples, packed vertex deltas.

Fix:

ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64;

Test gap: zero unit tests. Add: memory 0x8000 → gpr[rD] == 0x00000000_FFFF8000; memory 0x7FFF → gpr[rD] == 0x00000000_00007FFF.

PPCBUG-096 — `lhax`: GPR writeback sign-extends to 64 bits

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:996
Symptom: identical to PPCBUG-095. Indexed form emitted for array access with GPR index.
Fix: mem.read_u16(ea) as i16 as i32 as u32 as u64
Test gap: zero unit tests.

PPCBUG-097 — `lhau`: GPR writeback sign-extends to 64 bits

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:1007
Symptom: identical to PPCBUG-095. Update form emitted for auto-incrementing short[] loops; poison accumulates across all iterations.
Fix: mem.read_u16(ea) as i16 as i32 as u32 as u64
Test gap: zero unit tests. Add: verify both gpr[rD] (upper-32 = 0) and gpr[rA] (EA update).

PPCBUG-098 — `lhaux`: GPR writeback sign-extends to 64 bits

Severity: HIGH
Status: applied (P4 d945aea, 2026-05-02)
Location: interpreter.rs:1013
Symptom: identical to PPCBUG-095, update+indexed form.
Fix: mem.read_u16(ea) as i16 as i32 as u32 as u64
Test gap: zero unit tests.
Note: PPCBUG-095..098 are the same one-line fix at four sites. Fix session sweep: rg -n 'as i16 as i64 as u64' interpreter.rs finds exactly these four lines.

PPCBUG-099 — `lhau`/`lhaux`: rD==rA invalid-form silently destroys load result

Severity: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
Status: open
Location: interpreter.rs:1005-1016
Symptom: same as PPCBUG-090 (lbzu/lbzux) — EA writeback overwrites gpr[rD] when rD == rA. Net: gpr[rD] holds EA, not the loaded value.
Recommendation: debug_assert!(instr.rd() != instr.ra()) in both arms.

PPCBUG-100 — Zero execution tests for all nine halfword-load opcodes

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module
Symptom: No #[test] exercises any of the 9 opcodes. The HIGH sign-extension bug would have been caught by any test that checks gpr[rD] <= 0x0000_0000_FFFF_FFFF.
Recommended minimum: lha with negative halfword (assert upper 32 zero), lhz same, lhau verify both rD and rA, lhzux verify both rD and rA, lhbrx verify byte-swap.

IDs PPCBUG-101, PPCBUG-102, PPCBUG-103, PPCBUG-104 are unallocated — no further bugs found in group 19.

Batch 4 — load word (group 20)

Per-group report: audit-out/group-20-load-word.md.

Group 20 summary: 1 HIGH bug (reservation invalidation never called), 1 MEDIUM (cross-thread reservation isolation), 1 MEDIUM (lwa 64-bit sign-extension hazard), 3 LOW test gaps. The zero-extending family (lwz/lwzu/lwzx/lwzux) is entirely correct — mem.read_u32(ea) as u64 cleanly zero-extends; EA computation, update writebacks, and RA0 handling all match ISA and Canary. lwbrx is correct: the double-swap (from_be_bytes then swap_bytes()) correctly produces a little-endian word read, zero-extended. The sign-extending family (lwa/lwax/lwaux) is ISA-correct for 64-bit mode but a 32-bit-ABI hazard — classified MEDIUM because lwa is a 64-bit-mode instruction unlikely to appear in Xbox 360 32-bit-ABI binaries. The HIGH finding is that ReservationTable::invalidate_for_write is defined and unit-tested but never called from any store instruction, breaking multi-threaded lwarx/stwcx. atomicity under --parallel.

PPCBUG-105 — lwa / lwax / lwaux sign-extend to 64 bits; 32-bit-ABI hazard

Severity: MEDIUM
Status: applied (P4 d945aea, 2026-05-02)
Locations: interpreter.rs:1032 (lwa), 1038 (lwax), 1043 (lwaux)
Symptom: mem.read_u32(ea) as i32 as i64 as u64 — a word with high bit set (e.g. 0x8000_0000) writes 0xFFFF_FFFF_8000_0000 to rD. ISA-correct for 64-bit-mode lwa. In 32-bit ABI, the poisoned upper 32 bits produce wrong CA / CR results in downstream 64-bit unsigned compares — same shape as the addis bug.
Likelihood: LOW on real Xbox 360 32-bit-ABI binaries (compilers use lwz for word loads; lwa is a 64-bit-mode instruction). Risk elevated if the binary contains 64-bit-mode kernel code.
Note: Canary also uses SignExtend(..., INT64_TYPE) — both are ISA-correct. Pre-pass flagged HIGH; audit downgrades to MEDIUM because lwa is unlikely in 32-bit-ABI Xbox 360 code.

PPCBUG-106 — lwa no-update-form undocumented (LOW / informational)

Severity: LOW
Status: open
Location: interpreter.rs:1029-1034
Symptom: lwa arm has no RA writeback. Correct per ISA (no lwau in PowerISA). Undocumented.
Fix: add comment // No lwau in PowerISA; lwa is DS-form non-update only.

PPCBUG-107 — `invalidate_for_write` never called from stores; lwarx/stwcx. atomicity broken under `--parallel` (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Locations: reservation.rs:234 (definition, never called from interpreter); interpreter.rs:1182-1278 (all store arms, none call it)
Symptom: ReservationTable::invalidate_for_write(addr) is defined and correctly unit-tested but no interpreter store arm calls it. Under M3 --parallel with the table enabled, a plain stw by thread B to a cache line reserved by thread A does NOT clear thread A's table slot. Thread A's subsequent stwcx. calls t.try_commit(), which succeeds — spurious success, violating store-conditional atomicity. All lock-free sync primitives (spin_lock, CompareExchange, atomic counters) built on lwarx/stwcx. are broken in multi-threaded mode.
Concrete scenario: thread A: lwarx r3, 0, r4 (reserves line). Thread B: stw r5, 0(r4) (same address; should invalidate). Thread A: stwcx. r6, 0, r4 → should fail (CR0.EQ=0) but succeeds (CR0.EQ=1). Thread A's store silently overwrites thread B's store.
Fix: in every store arm, before mem.write_*, add:
```
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
    if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
has_active_reservers() is a single Relaxed atomic load — negligible cost for non-atomic code (common case returns false immediately). Alternative: inject the table into the memory layer so write_u32/write_u64 call it automatically.
Test gap: add interpreter-level test: lwarx reserve a line, intervening stw to the same line, stwcx. must fail (CR0.EQ=0).

PPCBUG-108 — Legacy per-ctx reservation path: cross-thread invalidation impossible (MEDIUM)

Severity: MEDIUM
Status: applied (ca5b90b, 2026-05-01)
Location: interpreter.rs:1148-1153 (stwcx legacy path)
Symptom: When table is None/disabled, reservation state lives in per-thread PpcContext fields. A store by thread B cannot clear ctx_A.has_reservation. Safe in strict lockstep (one host thread). Broken under real parallelism with the table inadvertently disabled.
Fix: add a debug_assert! in lwarx/stwcx. that table is enabled when multiple host threads are active. The M3 scheduler should always enable the table before spawning a second host thread.

PPCBUG-109 — Zero unit tests for lwa / lwax / lwaux

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module
Recommended minimum:
- lwa with 0x8000_0000 → gpr[rD] == 0xFFFF_FFFF_8000_0000.
- lwa with 0x7FFF_FFFF → gpr[rD] == 0x0000_0000_7FFF_FFFF.
- lwax with ra=0.
- lwaux: verify loaded value and rA update.

PPCBUG-110 — Zero unit tests for lwbrx

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module
Recommended minimum: memory [0x11, 0x22, 0x33, 0x44] at EA → gpr[rD] == 0x4433_2211; ra=0; assert gpr[rD] <= 0xFFFF_FFFF.

PPCBUG-111 — lwarx / stwcx test suite missing key cases

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:5167-5207 (two tests exist)
Missing: lwarx ra=0; stwcx. without prior lwarx → CR0.EQ=0; second lwarx displaces first; post-PPCBUG-107-fix store-invalidation test; lwarx zero-extension assertion.

IDs PPCBUG-112, PPCBUG-113, PPCBUG-114 are unallocated — reserved for group 20 follow-up.

Batch 4 — load doubleword (group 21)

Per-group report: audit-out/group-21-load-doubleword.md.

Group 21 summary: cleanest load group audited — zero HIGH bugs. All six instructions (ld, ldu, ldux, ldx, ldbrx, ldarx) are ISA-correct: 64-bit load, big-endian byte order, EA computation (RA=0, DS-form, u32 truncation), update-form writebacks, and reservation tracking all pass scrutiny against Canary and the ISA spec. ldbrx's double-swap pattern was investigated and confirmed correct (PPCBUG-115 informational). One MEDIUM documentation finding, two LOW findings.

PPCBUG-115 — `ldbrx` byte-swap confirmed correct (informational)

Severity: LOW (confirmed clean, informational only)
Status: wontfix
Location: interpreter.rs:4157-4159
Analysis: mem.read_u64 uses u64::from_be_bytes internally (confirmed in heap.rs:404 and interpreter's TestMem), so it returns the BE-decoded value. Calling .swap_bytes() re-reverses to give the LE interpretation, which is exactly what ldbrx specifies. Canary achieves the same result by skipping ByteSwap at the HIR level. Both approaches are correct. See per-group report for full byte-level worked example.

PPCBUG-116 — `ld`/`ldx`/`ldu`/`ldux` as 32-bit-ABI poison sources (documentation)

Severity: MEDIUM (awareness/documentation; no change to load instructions themselves)
Status: open
Location: interpreter.rs:1017-1058
Symptom: These instructions correctly write full 64-bit values to the destination GPR. Xbox 360 32-bit-ABI binaries legitimately emit them for TOC loads, vtable loads, and kernel structure accesses — all of which may have non-zero upper 32 bits. Until PPCBUG-001..089 arithmetic truncation fixes land, such values can flow into 64-bit compares and corrupt CA bits and CR fields — the inverse of the addis bug (pollution from memory side vs. sign-ext).
Key guard already in place: PPCBUG-007's subfcx CA fix truncates operands to u32 before the compare, correctly handling ld-originated 64-bit values. This is the most critical downstream consumer and the fix is already specified.

PPCBUG-117 — Stale frozen snapshot in `ppc-manual/memory/ldarx.md`

Severity: LOW
Status: applied (P7 manual regen, 2026-05-02)
Location: ppc-manual/memory/ldarx.md (frozen snapshot section)
Symptom: Snapshot uses old field name ctx.reserved_addr; live code uses ctx.reserved_line = ea & !RESERVATION_MASK (M3 refactor). Cosmetic only.
Fix: Regenerate snapshot after M3 field names settle.

PPCBUG-118 — Zero functional tests for `ld`, `ldx`, `ldu`, `ldux`, `ldbrx`

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module
Symptom: test_ldarx_stdcx_pair covers ldarx/stdcx only. Five doubleword load variants are untested. Recommended minimum: ld with positive DS, negative DS, and RA=0; ldx basic; ldu with RA writeback check; ldux with RA writeback check; ldbrx with asymmetric data to distinguish output from plain ldx.

IDs PPCBUG-119 through PPCBUG-122 are unallocated — reserved for group 21 follow-up.

Batch 4 — load multiple/string (group 22)

Per-group report: audit-out/group-22-load-mlsr.md.

Group 22 summary: one structural HIGH bug (lswx is always a no-op due to missing XER TBC field), one MEDIUM coupling bug (the write path discards TBC on mtspr XER), one MEDIUM ISA-form deviation (lmw does not skip RA-in-range stores unlike Canary), and two LOW findings. The lswi body itself is correct; lmw core logic (loop bound, zero-extension, byte-packing, register wraparound) is clean. Zero unit tests across all three opcodes.

PPCBUG-123 — `lswx` XER TBC field not modeled; always loads 0 bytes

Severity: HIGH
Status: applied (P6 112202c, 2026-05-02)
Location: context.rs:235-237 (xer() method) + interpreter.rs:4172
Symptom: ctx.xer() assembles only SO[31], OV[30], CA[29] — bits 0–28 are always zero. lswx reads ctx.xer() & 0x7F expecting the XER TBC byte-count field at bits 0–6, but always gets 0. The while bytes_left > 0 loop never executes; lswx is permanently a no-op — no bytes are loaded, no destination registers are written. The companion stswx at interpreter.rs:4191 has the identical pattern and is equally broken.
Root cause: PpcContext has no xer_tbc field. Neither xer() nor set_xer() model XER[25:31]. Any mtspr XER, rN that sets a non-zero byte count silently discards it (PPCBUG-124).
Cross-reference: Canary marks lswx as XEINSTRNOTIMPLEMENTED() — xenia-rs implemented the body but left the XER infrastructure incomplete.
Fix:
1. Add pub xer_tbc: u8 to PpcContext.
2. In xer(): | (self.xer_tbc as u32) for bits 0–6.
3. In set_xer(): self.xer_tbc = (val & 0x7F) as u8. The lswx body is then correct as-is.
Test gap: zero unit tests. After fix: mtspr XER, r3 (r3=4) then lswx r5, 0, r4 should write exactly 4 bytes into r5 (high byte = first byte at EA).

PPCBUG-124 — `set_xer()` discards TBC on `mtspr XER` (structural coupling to PPCBUG-123)

Severity: MEDIUM (must land with PPCBUG-123)
Status: applied (P6 112202c, 2026-05-02)
Location: context.rs:239-244
Symptom: set_xer() writes only SO/OV/CA from the 32-bit value, silently discarding bits 0–28 (including the 7-bit TBC field). Any guest mtspr XER, rN with a non-zero byte count loses that count; subsequent lswx/stswx see TBC=0. Fix is the same three-line change as PPCBUG-123.

PPCBUG-125 — `lmw` missing RA-in-destination-range skip

Severity: MEDIUM
Status: applied (P6 112202c, 2026-05-02)
Location: interpreter.rs:1515
Symptom: PowerISA declares lmw rT, D(rA) invalid when rA is in [rT..31]. Canary skips the store to rA in that case (if (i.D.RT + j == i.D.RA) continue). xenia-rs pre-computes EA before the loop (so EA values remain correct), but overwrites rA with the loaded word instead of preserving it. Result differs from Canary for this invalid encoding. Any program that relies on RA surviving a nominally invalid lmw will see the wrong value.

Fix:

for r in instr.rd()..32 {
    if r == instr.ra() { ea = ea.wrapping_add(4); continue; }
    ctx.gpr[r] = mem.read_u32(ea as u32) as u64;
    ea = ea.wrapping_add(4);
}

Test gap: zero tests. Add: lmw r28, 0(r28) (RA=RT=28) — after fix, gpr[28] unchanged.

PPCBUG-126 — `lswi` uses `instr.rb()` instead of `instr.nb()` for the NB field

Severity: LOW (maintenance hazard, not a correctness bug)
Status: applied (P6 112202c, 2026-05-02)
Location: interpreter.rs:1340
Symptom: instr.rb() and instr.nb() both extract bits 16–20 and return identical values. Using rb() misrepresents the operand as a register reference rather than a 5-bit immediate count. The companion stswi at line 1359 has the same pattern. A future rb() type-system refactor could break lswi/stswi silently.
Fix: instr.nb() at both sites.

PPCBUG-127 — Zero execution tests for lmw, lswi, lswx

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module
Symptom: No #[test] exists for any of the three opcodes. A regression in loop bounds, byte-packing, EA computation, or the NB=0 special case would go undetected.
Recommended minimum: lmw r30, 0(r1) (2-word load); lswi r3, r4, 8 (2-word byte pack); lswi r31, r4, 8 (register wraparound → r31 and r0); lswi r3, r4, 0 (NB=0→32 special case); post-PPCBUG-123 fix: lswx with XER TBC=4 (1-word load), TBC=0 (no-op), TBC=5 (partial word).

Batch 5 — store byte/halfword (group 24)

Per-group report: audit-out/group-24-store-byte-half.md.

Group 24 summary: 3 findings: 1 HIGH (cross-cutting reservation invalidation), 1 LOW/informational (update-form zero-extension correct but undocumented), 1 LOW (zero test coverage). EA computation, value truncation (as u8, as u16), RA=0 special cases, update-form writeback zero-extension, big-endian mem.write_u16 path, and sthbrx byte-reverse logic are all ISA-correct. The single HIGH finding is the systemic absence of invalidate_for_write calls — same class as PPCBUG-107, now documented for all 9 byte/halfword store opcodes.

PPCBUG-130 — All 9 store-byte/halfword opcodes missing `invalidate_for_write` (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Locations: interpreter.rs:1207 (stb), 1213 (stbu), 1219 (stbx), 1225 (stbux), 1231 (sth), 1237 (sthu), 1243 (sthx), 1249 (sthux), 1337 (sthbrx)
Class: same root cause as PPCBUG-107 (stw/stdcx family — invalidate_for_write never called from any store arm).
Symptom: Under --parallel, a stb, sth, or sthbrx (or any variant in this group) to a cache line reserved by another thread via lwarx/ldarx does NOT clear the table slot. The reserving thread's subsequent stwcx./stdcx. spuriously succeeds even though an intervening sub-word store has modified the line — violating store-conditional atomicity. Affects any lock-free protocol that uses byte or halfword stores adjacent to or inside a lwarx/stwcx. loop (e.g. byte-level spinlocks, tagged-pointer updates, audio ring-buffer flags).

Fix (per PPCBUG-107 pattern): before each mem.write_u8/u16, add:

if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
    if t.has_active_reservers() { t.invalidate_for_write(ea); }
}

Note: PPCBUG-107 is the canonical parent finding. PPCBUG-130 documents that the byte/halfword group must be included in the same fix sweep.

PPCBUG-131 — Update-form rA zero-extension correct but undocumented (LOW / informational)

Severity: LOW (informational — behavior is correct)
Status: open (documentation gap)
Locations: interpreter.rs:1216 (stbu), 1228 (stbux), 1240 (sthu), 1252 (sthux)
Symptom: Each update-form arm writes ctx.gpr[instr.ra()] = ea as u64 where ea: u32. This zero-extends to 64 bits — correct in the 32-bit ABI (addresses are 32-bit; upper half must be zero). No bug, but there is no comment explaining the deliberate zero-extension. A maintainer who computes EA as u64 throughout and drops the as u32 intermediate would silently sign-extend negative displacements into rA, mirroring the addis bug shape.
Fix: add comment // EA is u32; zero-extend into rA (32-bit ABI: upper 32 bits must be 0). at each update-form writeback line.

PPCBUG-132 — Zero unit tests for all 9 store-byte/halfword opcodes (LOW)

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module
Symptom: No test_stb* or test_sth* functions exist. Any regression in EA computation, value truncation, update-form writeback order, or sthbrx byte-swap logic would be invisible.
Recommended minimum: stb basic + ra=0; stbu/stbux with rA writeback check; stbx ra=0; sth big-endian byte check (0xDEAD → [0xDE, 0xAD]); sthu/sthux writeback; sthbrx byte-reversed check (0xDEAD → [0xAD, 0xDE]); post-PPCBUG-130 fix: lwarx + stb to same line + stwcx. → CR0.EQ=0.

IDs PPCBUG-133 through PPCBUG-139 are unallocated — reserved for group 24 follow-up.

Batch 5 — store word (group 25)

Per-group report: audit-out/group-25-store-word.md.

Group 25 summary: 8 findings: 4 HIGH (reservation invalidation per opcode), 0 MEDIUM, 4 LOW. Core arithmetic and semantics are entirely clean for all 6 opcodes. EA computation (RA=0 guards, simm16 sign-extend, u32 truncation), value truncation (as u32), update-form writebacks (ea as u64 zero-extension), big-endian mem.write_u32, stwbrx byte-reversal, and stwcx conditional-store logic (cache-line reservation check, CAS, CR0 update, reservation always cleared) all match the ISA and Canary exactly. The stwcx manual snapshot is stale (uses old reserved_addr field name; live code correctly uses reserved_line at cache-line granularity — actually MORE correct than the snapshot). Dominant finding is the same systemic miss as PPCBUG-107 and PPCBUG-130: invalidate_for_write is never called from any plain store arm.

PPCBUG-140 — stw: missing `invalidate_for_write` call (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Location: interpreter.rs:1183-1188
Systemic root cause: PPCBUG-107
Symptom: Under --parallel with the ReservationTable enabled, a plain stw by thread B to a cache line reserved by thread A does not clear thread A's table slot. Thread A's subsequent stwcx. spuriously succeeds (CR0.EQ=1) even though thread B has written the line. All lock-free sync primitives (spin_lock, CompareExchange, atomic counters) built on lwarx/stwcx. are broken in multi-threaded mode. stw is the most common store instruction — every stack write, pointer store, and integer field write is affected.

Fix: Before mem.write_u32(ea, ...):

if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
    if t.has_active_reservers() { t.invalidate_for_write(ea); }
}

has_active_reservers() is a single Relaxed load — zero cost in the common non-atomic case.

PPCBUG-141 — stwu: missing `invalidate_for_write` call (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Location: interpreter.rs:1189-1194
Systemic root cause: PPCBUG-107
Symptom: Same class as PPCBUG-140. stwu r1, -N(r1) is the canonical function-prologue stack-allocation idiom emitted by every compiled function. A thread holding a reservation on the stack region would see spurious stwcx. success after any prologue store.
Fix: Same pattern as PPCBUG-140, inserted before mem.write_u32.

PPCBUG-142 — stwx: missing `invalidate_for_write` call (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Location: interpreter.rs:1195-1200
Systemic root cause: PPCBUG-107
Symptom: Same class as PPCBUG-140. stwx is the indexed store used for array writes and indirect dereferences — common in loops that may run concurrently with reservation holders.
Fix: Same pattern as PPCBUG-140.

PPCBUG-143 — stwux: missing `invalidate_for_write` call (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Location: interpreter.rs:1201-1206
Systemic root cause: PPCBUG-107
Symptom: Same class as PPCBUG-140. Less common than stw/stwu but still a plain store that must participate in reservation invalidation.
Fix: Same pattern as PPCBUG-140.

PPCBUG-144 — stwbrx: missing `invalidate_for_write` call (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Location: interpreter.rs:1568-1573
Systemic root cause: PPCBUG-107
Symptom: Same class as PPCBUG-140. Byte-reversed stores (used for LE-payload GPU command buffers, file format fields) are still plain stores with respect to the reservation protocol.
Fix: Same pattern as PPCBUG-140. ea is already a u32 at this point (line 1570).

PPCBUG-145 — stwcx: stale manual snapshot uses `reserved_addr` (LOW)

Severity: LOW (documentation only; live code is correct)
Status: applied (P7 manual regen, 2026-05-02)
Location: ppc-manual/memory/stwcx.md (frozen snapshot section)
Symptom: The frozen snapshot shows ctx.reserved_addr == ea (exact-word comparison). The live code at interpreter.rs:1137-1153 uses ctx.reserved_line == line where line = ea & !RESERVATION_MASK (cache-line comparison). The live code is MORE correct per ISA (PowerISA 2.07B defines reservation at cache-line granularity). Snapshot reflects an earlier implementation before M3 introduced RESERVATION_MASK and reserved_line. Tests confirm live behavior is correct (stwcx_succeeds_within_same_cache_line).
Fix: Regenerate the stwcx.md snapshot to show current field names and add a note on the ISA cache-line granule.

PPCBUG-146 — Zero unit tests for stwu / stwx / stwux / stwbrx (LOW)

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module
Symptom: Four of the six group-25 opcodes have zero dedicated unit tests.
Recommended minimum:
- stwu r3, -8(r1): verify memory at r1-8 and gpr[1] updated to old_r1 - 8.
- stwx ra=0: store at gpr[rb], verify memory and no RA writeback.
- stwux: indexed update — verify store and RA writeback.
- stwbrx 0x11223344: bytes at EA should be [0x44, 0x33, 0x22, 0x11].

PPCBUG-147 — stwcx test suite missing key cases (LOW)

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:5167-5208 (two existing tests)
Missing:
- stwcx. without prior lwarx → CR0.EQ=0, memory not written.
- Post-PPCBUG-140-fix: lwarx then stw to same line then stwcx. → CR0.EQ=0.
- RA=0 form: stwcx. rS, 0, rB.
- Explicit memory check on failure path (assert memory unchanged).

IDs PPCBUG-148 and PPCBUG-149 are unallocated — reserved for group 25 follow-up.

Batch 5 (continued) — store multiple/string (group 27)

Per-group report: audit-out/group-27-store-mlsr.md.

Group 27 summary: 5 findings: 2 HIGH, 1 MEDIUM, 2 LOW. stswx is a permanent no-op (identical root cause as PPCBUG-123 for lswx — XER TBC field not modeled; fixed as side effect of PPCBUG-123/124). stmw, stswi, and stswx all omit invalidate_for_write, aggravated vs. single-word stores because a single stmw can dirty multiple cache lines. stswi uses instr.rb() instead of instr.nb() (maintenance hazard, same shape as PPCBUG-126 for lswi). Zero unit tests across all three opcodes.

PPCBUG-160 — stmw, stswi, stswx missing `invalidate_for_write`; multi-line atomicity exposure (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Locations: interpreter.rs:1521 (stmw), interpreter.rs:1357 (stswi), interpreter.rs:4189 (stswx)
Extends: PPCBUG-107. The prior stated range 1182-1278 does not cover these three arms. Multi-word instructions (stmw up to 128 bytes = 2 lines; stswx up to 127 bytes = ~2 lines) make the probability of missing a reservation invalidation much higher than single-word stores.
Symptom: thread B's stmw saves 18+ non-volatile registers across two cache lines. Thread A's lwarx reservation on the second line is not cleared. Thread A's stwcx. spuriously succeeds. Because stmw is the ABI-standard non-volatile register save, this is triggered constantly in function prologues — any lock-free primitive inside a prologue/epilogue window is at risk.
Fix (same pattern as PPCBUG-107): before each mem.write_u32/mem.write_u8 call, add the invalidate_for_write guard. See group-27 report for per-opcode code snippets.
Test gap: lwarx reserve a line, stmw across that line, stwcx. must return CR0.EQ=0.

PPCBUG-161 — `stswx` is a permanent no-op: XER TBC not modeled (HIGH)

Severity: HIGH
Status: applied (P6 112202c, 2026-05-02)
Location: interpreter.rs:4189 (stswx arm) + context.rs:235-243 (xer()/set_xer())
Companion: PPCBUG-123 (lswx), PPCBUG-124 (mtspr XER). This finding covers the store side.
Symptom: ctx.xer() & 0x7F always returns 0 (no xer_tbc field). stswx unconditionally stores zero bytes. The byte-loop body is otherwise correct and requires no further changes.
Fix: same three-line fix as PPCBUG-123 (add xer_tbc: u8 to PpcContext; update xer() and set_xer()). The stswx body is correct once TBC is live.
Test gap: mtspr XER (TBC=5) + stswx r3, 0, r4 → 5 bytes written big-endian.

PPCBUG-162 — `stswi` uses `instr.rb()` instead of `instr.nb()` for NB field (MEDIUM)

Severity: MEDIUM (maintenance hazard; not a runtime correctness bug today)
Status: applied (P6 112202c, 2026-05-02)
Location: interpreter.rs:1359
Companion: PPCBUG-126 (lswi identical pattern at line 1340).
Symptom: instr.rb() and instr.nb() extract the same bits 16-20, so values are equal now. If rb() is ever given a newtype wrapper (e.g. RegIdx) to enforce register semantics, the cast instr.rb() as u32 will either fail or yield wrong semantics — silently treating a register index as a byte count.
Fix: let nb = if instr.nb() == 0 { 32 } else { instr.nb() };

PPCBUG-163 — Zero unit tests for stmw, stswi, stswx (LOW)

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module
Symptom: No #[test] exists for any of the three opcodes. Regressions in loop bounds, byte order, EA computation, NB=0 handling, or register wraparound are invisible.
Recommended minimum: stmw 2-word and 32-word cases; stswi 4-byte / 0 to 32 / wraparound / partial; stswx (post PPCBUG-123 fix) TBC=4, TBC=0, TBC=5. See group-27 report for full list.

ID PPCBUG-164 is unallocated — reserved for group 27 follow-up.

Batch 5 (continued) — store doubleword (group 26)

Per-group report: audit-out/group-26-store-doubleword.md.

Group 26 summary: 0 HIGH, 2 MEDIUM, 2 LOW. The core semantics of all six opcodes are ISA-correct: ds() decoder extracts the DS-form displacement correctly; mem.write_u64 handles big-endian byte ordering; update-form writebacks are zero-extended and in the right order; stdcx. CR0 encoding, reservation check, and table-path interaction all match the ISA. stdbrx correctly applies swap_bytes(). No 32-bit writeback truncation issues (these are store ops, not ALU ops). Two MEDIUM findings: (1) PPCBUG-150 extends PPCBUG-107 to the doubleword stores (same gap — invalidate_for_write never called); (2) PPCBUG-151 identifies that stwcx. and stdcx. share the same reservation slot without a width discriminator, allowing a lwarx+stdcx. or ldarx+stwcx. cross-pair to succeed when it should fail. Four IDs used (PPCBUG-150..153).

PPCBUG-150 — `std`/`stdu`/`stdx`/`stdux`/`stdbrx` do not call `invalidate_for_write` (scope extension of PPCBUG-107)

Severity: MEDIUM (same classification as PPCBUG-107)
Status: applied (ca5b90b, 2026-05-01)
Locations:
- interpreter.rs:1258 (std)
- interpreter.rs:1264 (stdx)
- interpreter.rs:1269 (stdu)
- interpreter.rs:1275 (stdux)
- interpreter.rs:4163 (stdbrx)
Symptom: When --parallel is active and the ReservationTable is enabled, any of these five stores to an address another HW thread has reserved via ldarx will NOT invalidate that thread's reservation. The ldarx-holding thread's stdcx. can subsequently succeed even though the memory was overwritten — a classic LL/SC ABA gap. Fix session for PPCBUG-107 must include these five sites.

Fix: in each arm, after mem.write_u64(ea, ...), add:

if let Some(t) = &ctx.reservation_table {
    if t.has_active_reservers() { t.invalidate_for_write(ea); }
}

PPCBUG-151 — `stdcx.`/`stwcx.` reservation width not discriminated: cross-width pair silently succeeds

Severity: MEDIUM
Status: applied (ca5b90b, 2026-05-01)
Location: interpreter.rs:4119-4155 (stdcx) vs interpreter.rs:1134-1180 (stwcx)
Symptom: Both stwcx. and stdcx. match reservations using only (has_reservation, reserved_line). A lwarx reservation can be spuriously committed by stdcx., or a ldarx reservation by stwcx., as long as the cache line matches. The ISA requires pairing — lwarx must be committed by stwcx., and ldarx by stdcx.. Cross-width commit reads the wrong width from memory and writes back the wrong width, with no failure indication (CR0.EQ=1).
Fix: add a reservation_width: u8 field (4 or 8) to PpcContext. stwcx. requires reservation_width==4; stdcx. requires reservation_width==8. In the table path, pack the 1-bit width flag into one of the spare bits of the 64-bit slot (bits 39–32 are always zero for line addresses in the 32-bit guest address space).

PPCBUG-152 — `stdu`/`stdux` no invalid-form guard for RS==RA (LOW)

Severity: LOW (ISA-undefined; no Xbox 360 compiler emits this)
Status: open
Location: interpreter.rs:1267-1278
Symptom: When RA==RS, the store writes the original RS value, then RA (==RS) is overwritten with EA, destroying the source. ISA marks this invalid-form. Consistent with policy of other update-form stores in groups 18-22.
Fix: debug_assert!(instr.ra() != 0 && instr.ra() != instr.rs()) in debug builds.

PPCBUG-153 — Zero unit tests for std/stdu/stdx/stdux/stdbrx; stdcx. happy-path only (LOW)

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module (only test_ldarx_stdcx_pair at line 4629)
Missing coverage: std with negative DS; std with RA=0; stdu update writeback; stdx with RA=0; stdux indexed update; stdbrx byte-reversed output; stdcx. failure path (no prior reservation or EA mismatch); stdcx. has_reservation cleared on failure.
Recommended minimum: 6 tests — see per-group report for encodings.

IDs PPCBUG-154 through PPCBUG-159 are unallocated — reserved for group 26 follow-up.

Batch 5 (continued) — store float (group 28)

Per-group report: audit-out/group-28-store-float.md.

Group 28 summary: 7 findings: 3 HIGH, 1 MEDIUM, 3 LOW. EA computation, endianness, update-form writebacks, and stfiwx integer-word extraction are all correct. Critical bugs: (1) stfs* never raises FPSCR exception bits (VXSNAN, XX, OX, UX) required by PowerISA for double→single narrowing; (2) stfs* ignores FPSCR.RN rounding mode, always using round-to-nearest-even; (3) all 9 FP store arms omit invalidate_for_write (same class as PPCBUG-107). The stfd* family and stfiwx are clean (bit-pattern stores with no conversion). Zero unit tests across all 9 opcodes. 7 IDs used (PPCBUG-165..171). 3 IDs unallocated (PPCBUG-172..174).

PPCBUG-165 — stfs* does not raise FPSCR exception bits (VXSNAN, XX, OX, UX)

Severity: HIGH
Status: open
Locations: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux)
Symptom: PowerISA requires that stfs double→single narrowing raises FPSCR[VXSNAN] for SNaN input, FPSCR[OX] on overflow to ±∞, FPSCR[UX] on underflow to ±0/denormal, and FPSCR[XX] when the result is inexact. None of these bits are ever set. The narrowing is done via ctx.fpr[instr.rs()] as f32 (x86 CVTSD2SS); no FPSCR inspection or update follows. Games that poll FPSCR[OX] to detect overflow (physics engines clamping large velocities), or FPSCR[VXSNAN] after sentinel SNaN writes, get false negatives.
Canary parity: Canary also omits these FPSCR updates for stfs*. Both share the deviation.
Fix: after the narrowing, check fpscr::is_snan(src) → set VXSNAN; compare source vs. f64 round-trip of narrowed value for inexact; compare src.is_finite() && f32.is_infinite() for overflow. See group-28 report for illustrative code sketch.

PPCBUG-166 — stfs* ignores FPSCR.RN; always uses round-to-nearest-even

Severity: HIGH
Status: open
Locations: interpreter.rs:1284, 1289, 1296, 1301
Symptom: ctx.fpr[instr.rs()] as f32 uses the host MXCSR rounding mode, never consulting ctx.fpscr & fpscr::RN_MASK. Any game that configures FPSCR.RN to truncate/ceil/floor and then stores via stfs gets the wrong f32 in memory (wrong by at most 1 ULP). The stfs.md spec explicitly acknowledges this gap.
Canary parity: Canary also ignores FPSCR.RN for stfs. Both share the deviation.
Fix: read ctx.fpscr & fpscr::RN_MASK and set host MXCSR before narrowing, then restore. Minimum viable: debug_assert_eq!(ctx.fpscr & fpscr::RN_MASK, 0) for debug-build visibility.

PPCBUG-167 — All 9 FP store arms missing `invalidate_for_write` (PPCBUG-107 class)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Locations: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux), 1308 (stfd), 1313 (stfdu), 1320 (stfdx), 1325 (stfdux), 1333 (stfiwx)
Symptom: Same class as PPCBUG-107. Under M3 --parallel, a FP store by thread B to a cache line reserved by thread A via lwarx does not clear thread A's reservation table slot. Thread A's subsequent stwcx. spuriously succeeds. Rendering workers using FP stores to shared transform/particle buffers co-located with spinlock sites are at risk.
Fix: before each mem.write_f32/write_f64/write_u32 in every FP store arm:
```
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
    if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
Recommend a single sweep of all store groups (PPCBUG-107, 130, 160, 167) to avoid further drift.

PPCBUG-168 — stfs* SNaN narrowing: `as f32` quietens SNaN without raising FPSCR.VXSNAN

Severity: MEDIUM
Status: open
Locations: interpreter.rs:1284, 1289, 1296, 1301
Symptom: When FRS holds an f64 SNaN (bit 51 = 0), CVTSD2SS sets the f32 quiet bit (bit 22), producing a QNaN in memory, without raising FPSCR[VXSNAN]. The stored memory bytes are correct per IEEE-754 (narrowing an SNaN produces a QNaN). The bug is the missing FPSCR signal, a subset of PPCBUG-165. Contrast with PPCBUG-128 (lfs stores wrong FPR bits — HIGH severity; here memory bytes are right, only the flag is missing).
Note: fixed as a side effect of the PPCBUG-165 fix. No independent code change needed.

PPCBUG-169 — stfd* bit-pattern store: confirmed correct (informational)

Severity: LOW (confirmed clean, informational)
Status: wontfix
Locations: interpreter.rs:1305, 1311, 1317, 1323
Analysis: write_f64(ea, fpr) → write_u64(ea, fpr.to_bits()) → val.to_be_bytes(). Pure bit-pattern, correct big-endian. SNaN preserved. EA computation and update-form writebacks all correct. Canary parity confirmed. No bugs.

PPCBUG-170 — stfiwx: confirmed correct (informational)

Severity: LOW (confirmed clean, informational)
Status: wontfix
Location: interpreter.rs:1329-1335
Analysis: write_u32(ea, fpr.to_bits() as u32) correctly extracts the low 32 bits of the 64-bit FPR as a raw bit pattern (the integer word produced by fctiw/fctiwz) and stores big-endian. RA=0 handled correctly. No FPSCR effects required. Canary parity confirmed. No bugs.

PPCBUG-171 — Zero unit tests for all 9 store-float opcodes

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module
Symptom: No #[test] covers any of the 9 FP store arms. Regressions in EA computation, endianness, update-form writeback order, or double→single narrowing are invisible.
Recommended minimum (10 tests): stfd normal + SNaN bit-exact; stfdu update writeback; stfs round-trip (1.0); stfs overflow (→ ±∞); stfsx ra=0; stfsux update; stfiwx integer word extract; post-PPCBUG-165 fix: SNaN → FPSCR.VXSNAN set; post-PPCBUG-166 fix: RN=truncate.

IDs PPCBUG-172 through PPCBUG-174 are unallocated — reserved for group 28 follow-up.

Batch 6 — FPU single-precision (group 29)

Per-group report: audit-out/group-29-fpu-single.md.

Context: The live implementation is substantially more capable than the frozen ppc-manual snapshots indicated. to_single() correctly dispatches on FPSCR.RN; check_invalid_* helpers correctly set VXSNAN, VXISI, VXIMZ, VXZDZ, VXIDI, ZX; update_after_op sets OX, UX, and FPRF. The remaining bugs are: (1) XX/FI/FR (inexact) never set anywhere; (2) fmadd/fmsub *sx variants missing the VXISI check for the add-phase infinity collision (their *x double siblings have the same gap); (3) fnmadd/fnmsub NaN sign bit incorrectly flipped by Rust -; (4) fresx produces a full IEEE 1/b instead of the ~12-bit hardware estimate; (5) FPSCR.NI flush-to-zero not modelled; (6) SNaN→QNaN propagation relies on host SSE behavior rather than the ISA-canonical derivation.

8 IDs used (PPCBUG-180..187). 12 IDs unallocated (PPCBUG-188..199).

PPCBUG-180 — XX / FI / FR bits never set across all FPU *sx opcodes (and double siblings)

Severity: MEDIUM
Status: open
Locations: fpscr.rs:184-194 (update_after_op); affects interpreter.rs:2252-2494
Symptom: FPSCR[XX] (inexact) should be set whenever the mathematical result of an FP operation cannot be represented exactly in the destination format (single or double) and a rounding step occurs. FPSCR[FI] (fraction inexact) and FPSCR[FR] (fraction rounded) encode the direction. update_after_op sets OX (overflow to ±∞) and UX (subnormal result) but has no inexact-detection logic. Since most *sx operations on arbitrary inputs require rounding to single precision, XX is almost always wrong (false zero). Games using FPSCR polling to check exactness receive false "exact" results.
Canary parity: Canary's UpdateFPSCR also does not set XX/FI/FR. Both share this gap.
Fix: In update_after_op (or a post-to_single helper), compare the pre-round f64 result with the post-round f64 result. If they differ, set XX; inspect the difference sign to set FR; set FI = FR || (result was not exactly representable).

PPCBUG-181 — fmaddsx / fnmaddsx missing VXISI check for add-phase ±∞ collision

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Locations: interpreter.rs:2339-2348 (fmaddsx), 2383-2392 (fnmaddsx)
Symptom: When FRA × FRC = +∞ and FRB = -∞ (or vice versa), PowerISA §4.3.4 requires FPSCR[VXISI] to be set and the result to be a QNaN. The double-precision sibling fmaddx (line 2327) correctly calls fpscr::check_invalid_add(ctx, a * c, b, false) after the multiply-check. fmaddsx omits this call entirely — only check_invalid_mul runs. Games using fused-madd in dot-product accumulators that might overflow to ±∞ (e.g. lighting accumulators with very large normals) lose the VXISI signal.

Fix:

// inside fmaddsx arm, after check_invalid_mul:
fpscr::check_invalid_add(ctx, a * c, b, false);

Same for fnmaddsx (same operand pair, same false sense for the add).

PPCBUG-182 — fmsubsx / fnmsubsx missing VXISI check for subtract-phase ±∞ collision

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Locations: interpreter.rs:2361-2370 (fmsubsx), 2405-2414 (fnmsubsx)
Symptom: When FRA × FRC = ±∞ and FRB = ±∞ with the same sign, (±∞) − (±∞) should fire FPSCR[VXISI]. Neither fmsubsx nor fnmsubsx calls check_invalid_add.
Fix:
```
// inside fmsubsx arm, after check_invalid_mul:
fpscr::check_invalid_add(ctx, a * c, -b, false);
```
Same for fnmsubsx. The negated b turns the subtract into the add-form so that check_invalid_add(..., false) uses the correct infinity-sign comparison.

PPCBUG-183 — fnmaddsx / fnmsubsx NaN sign bit incorrectly flipped by Rust unary `-`

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Locations: interpreter.rs:2388 (fnmaddsx), 2410 (fnmsubsx)
Symptom: to_single(ctx, -(a.mul_add(c, b))) — Rust's unary -f64 always flips the IEEE sign bit, including when the value is NaN. PowerISA §4.3.2 specifies that the final negation in fnmadd/fnmsub is NOT applied to a QNaN result: if the fused computation yields a NaN (due to SNaN input, VXIMZ, or VXISI), the negation is skipped and the NaN is propagated with its canonical sign unchanged. xenia-rs flips the sign bit of any NaN result, producing a QNaN with the wrong sign. Observable by storing via stfd and inspecting bits. Games using sign-bit NaN tagging (e.g. 0xFFC00000 vs 0x7FC00000 as distinct sentinels) are affected.

Fix:

// fnmaddsx arm:
let inner = a.mul_add(c, b);
let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
// fnmsubsx arm:
let inner = a.mul_add(c, -b);
let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });

PPCBUG-184 — fresx produces full-precision IEEE 1/b instead of ~12-bit hardware estimate

Severity: HIGH
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:2481-2494
Symptom: fres on Xenon hardware produces a reciprocal approximation via a 256-entry LUT with linear interpolation, accurate to roughly 1/4096 relative error (~12 mantissa bits). xenia-rs computes to_single(1.0 / b) — the fully IEEE-754 correctly-rounded single-precision reciprocal. The result is up to ~4096× more accurate than hardware. Newton-Raphson refinement code x = fres(d); x = x*(2 - d*x) is not broken by this (NR converges even from an accurate seed), but code that checks the seed's error magnitude for convergence termination, or that relies on fres(d)*d ≠ 1.0 to decide whether to refine, may take the wrong branch. Also, fres(d)*d on xenia is much closer to 1.0 than on hardware, so a "was the estimate good enough?" check based on the residual will give wrong answers.
Canary parity: Canary uses f.Recip(f.Convert(frB, FLOAT32_TYPE)) — approximates by first converting to f32 (quantizing the input), then applying the host reciprocal. Still produces a fully-accurate IEEE single reciprocal rather than the 12-bit table estimate. Both emulators share the deviation. Canary's conversion-first approach is slightly closer to hardware (the input is quantized before the reciprocal), so if a future fix is desired, Canary's approach is the better reference.
Fix (minimal viable): Pre-convert input to f32 to match Canary's quantization: let b32 = b as f32; to_single(ctx, 1.0_f64 / b32 as f64). This matches Canary but still does not emulate the 12-bit LUT. Full fix requires an fres LUT matching Xenon's hardware table (documented in Xbox 360 SDK / GamePPCLisa docs).

PPCBUG-185 — FPSCR.NI flush-to-zero not modelled; subnormal results propagate through *sx

Severity: MEDIUM
Status: open
Location: All *sx arms in interpreter.rs; fpscr.rs has NI not defined as a constant
Symptom: Xenon firmware sets FPSCR.NI = 1 at boot. With NI=1, the Xenon FPU flushes subnormal inputs and results to the appropriate signed zero before and after every floating-point operation. xenia-rs inherits the host x86 IEEE-754 default (NI=0), which propagates subnormals. Subnormal differences: (a) subnormal FPR inputs are used as-is by xenia vs. treated as ±0 by hardware; (b) subnormal results are stored by xenia vs. flushed to ±0 by hardware. update_after_op sets UX when the result is subnormal, but does NOT flush it. Games with NI-dependent behavior — most Xbox 360 titles compiled with default Xenon ABI settings — may see different float results in subnormal-touching paths.
Canary parity: Canary also inherits host IEEE NI=0 semantics. Both share this gap.
Fix: After to_single (or the double-precision result), check ctx.fpscr & fpscr::NI_BIT (needs a constant adding) and if set, flush subnormals: if result.is_subnormal() { result = result.signum() * 0.0 }. Apply to inputs as well for strict correctness.

PPCBUG-186 — SNaN → QNaN propagation relies on host SSE; not ISA-canonical for all *sx

Severity: MEDIUM
Status: open
Locations: interpreter.rs:2252-2414 (all arithmetic *sx arms without explicit SNaN guard)
Symptom: When an SNaN input reaches faddsx/fsubsx/fmulsx/fdivsx, the code calls check_invalid_add/mul/div (correctly sets VXSNAN) but then performs the operation on the raw SNaN value: a + b, a * c, etc. On x86-64 SSE2, the hardware ADDSD/MULSD ops produce a QNaN from the first SNaN operand (bit 51 set, other mantissa bits preserved). This matches ISA §4.3.2.2 for the common case. However, for mul_add (VFMADD231SD on AVX), the SNaN propagation priority may differ: the ISA specifies FRA takes priority over FRB, but hardware FMA may use a different priority for the three-operand form. The fsqrtsx and fresx arms handle SNaN explicitly (via is_snan check) but do not synthesize the correct QNaN result — they rely on b.sqrt() / 1.0/b to produce a NaN, which the host does. This is a latent risk; active wrong-result cases require bit-level NaN inspection.

PPCBUG-187 — Zero interpreter execution tests for all 10 group-29 opcodes

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module (no #[test] covers any *sx or fresx)
Symptom: Regressions in rounding, FPSCR side effects, or operand-field decoding are invisible to CI. The existing fpscr unit tests cover helper functions in isolation; no test exercises the full step() path for any single-precision FPU opcode.
Recommended minimum (12 tests — see group-29 report for encodings): fadds exact; fadds VXISI; fsubs VXISI; fmuls 0×∞; fdivs ZX; fmadds VXISI regression (PPCBUG-181); fmsubs VXISI regression (PPCBUG-182); fnmadds NaN-sign (PPCBUG-183); fnmsubs NaN-sign (PPCBUG-183); fsqrts negative input VXSQRT; fsqrts round-trip; fres basic reciprocal.

IDs PPCBUG-188 through PPCBUG-199 are unallocated — reserved for group 29 follow-up.

Batch 6 (continued) — FPU arithmetic double (group 30)

Per-group report: audit-out/group-30-fpu-double.md.

Group 30 summary: 9 findings (PPCBUG-200..208). 2 MEDIUM cross-cutting, 3 MEDIUM opcode-specific, 4 LOW. Result arithmetic is correct for all 10 opcodes. FPSCR infrastructure is partially wired: VXSNAN, OX, UX, ZX, VXISI (add/sub), VXIMZ, VXZDZ, VXIDI, VXSQRT all set correctly for the opcodes that need them. Critical gaps: (1) XX/FR/FI bits never set by any opcode — same gap as PPCBUG-180 but now confirmed on the double-precision path; (2) FPSCR.RN not honored for double arithmetic — single-precision has round_to_single but double has no equivalent; (3) fmsubx/fnmaddx/fnmsubx omit the VXISI check for ∞-collision in the add step; (4) fnmaddx/fnmsubx flip NaN sign bit via Rust - operator but ISA requires NaN sign preserved. frsqrtex uses full-precision 1/sqrt(b) instead of the hardware estimate — acceptable. All FMA forms use f64::mul_add for correct single-rounding semantics. 9 IDs used (PPCBUG-200..208). 11 IDs unallocated (PPCBUG-209..219).

PPCBUG-200 — All group-30 opcodes: XX, FR, FI bits never set

Severity: MEDIUM
Status: open
Location: fpscr.rs:184-194 (update_after_op); interpreter.rs:2248,2268,2289,2310,2335,2357,2379,2401,2463,2510
Symptom: Same gap as PPCBUG-180 but confirmed for the double-precision path. update_after_op only tracks OX (overflow to infinity) and UX (subnormal). FPSCR[XX] (inexact sticky), FPSCR[FR] (round direction), and FPSCR[FI] (inexact for current op) are never updated by any group-30 opcode. Every double-precision arithmetic operation that rounds a non-representable result silently omits these bits.
Fix: Same as PPCBUG-180 — read MXCSR exception flags after each f64 operation and map to FI/XX/FR. For double, no to_single step is involved so the comparison must be done via MXCSR or by a post-op bit-level comparison of inputs vs. result.
Test gap: Zero tests verify XX set after any inexact double-precision operation.

PPCBUG-201 — All group-30 opcodes: FPSCR.RN not honored for double arithmetic

Severity: MEDIUM
Status: open
Location: interpreter.rs:2242-2512 (all 10 arms)
Symptom: Host f64 operators always use nearest-even (host MXCSR default). fpscr.rs has a complete rounding_mode(ctx) helper and directed rounding helpers for single-precision (round_to_single), but no equivalent for double arithmetic. Guest mtfsfi RN changes have no effect on faddx/fsubx/fdivx/fsqrtx etc.
Fix: Wrap each double-precision arithmetic arm with an MXCSR round-mode set/restore when ctx.fpscr & fpscr::RN_MASK != 0. Fast path (RN=0) stays zero-cost.
Test gap: No test changes RN and verifies directed rounding on any double arithmetic opcode.

PPCBUG-202 — fmaddx: non-FMA `a * c` used in check_invalid_add can spuriously raise/miss VXISI

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:2332
Symptom: check_invalid_add(ctx, a * c, b, false) uses a separate two-rounding multiply to approximate the FMA intermediate product. When the true FMA intermediate is finite but the standalone product overflows to ±∞, VXISI fires spuriously. When the true intermediate is ±∞ but the standalone product is finite (extreme cancellation), VXISI is missed.
Fix: Derive VXISI from input-value properties directly: if (a.is_infinite() || c.is_infinite()) (product is mathematically infinite) and b.is_infinite() with opposing sign → VXISI.
Test gap: No test covers the large-value cancellation case in fmaddx.

PPCBUG-203 — fmsubx, fnmaddx, fnmsubx: VXISI never raised for ∞-collision in add/sub step

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Locations: interpreter.rs:2354 (fmsubx), 2376 (fnmaddx), 2398 (fnmsubx)
Symptom: Same pattern as PPCBUG-181/182 for the double-precision variants. These three arms call only check_invalid_mul and omit check_invalid_add. Per ISA, all four FMA variants must raise VXISI when the add step yields ∞+∓∞. Example for fmsub: A×C = +∞, B = +∞ → +∞ − +∞ → VXISI. Currently the result NaN propagates silently with no FPSCR update. The fnmsub pattern is the canonical Newton-Raphson step — the most common FPU path in Xbox 360 graphics code.
Fix: Add fpscr::check_invalid_add(ctx, a * c, b, true) for fmsubx/fnmsubx and fpscr::check_invalid_add(ctx, a * c, b, false) for fnmaddx (apply PPCBUG-202 sign-fix simultaneously).
Test gap: Zero tests for VXISI on any of the three opcodes.

PPCBUG-204 — fmaddx check_invalid_add sub-issue (sign logic reliant on imprecise product)

Severity: LOW (sub-issue of PPCBUG-202)
Status: open
Location: interpreter.rs:2332
Symptom: VXISI logic is internally consistent with the passed a * c value, but that value can have the wrong sign in extreme overflow/underflow cases. Resolve as part of PPCBUG-202.

PPCBUG-205 — fnmaddx / fnmsubx: Rust `−` flips NaN sign bit; ISA requires NaN sign preserved

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Locations: interpreter.rs:2377 (fnmaddx), interpreter.rs:2399 (fnmsubx)
Symptom: Same pattern as PPCBUG-183 for the double-precision variants. Rust's unary - applied to a NaN result flips the IEEE-754 sign bit. PowerISA Book I §4.3.4 states the negation is not applied to NaN results. Title code using NaN sentinels (audio middleware, debug fills) receives sign-flipped NaN payloads.

Fix:

let fma = a.mul_add(c, b);   // fnmaddx
let result = if fma.is_nan() { fma } else { -fma };
// and analogously for fnmsubx

Test gap: No test exercises fnmaddx/fnmsubx with NaN-producing inputs to check sign of result NaN.

PPCBUG-206 — frsqrtex edge cases correct; no code change needed (informational)

Severity: LOW (confirmed clean, informational)
Status: wontfix
Location: interpreter.rs:2496-2512
Analysis: ZX fires for ±0. VXSQRT guard correctly excludes -0.0. frsqrte(+∞)=+0 correct. Full-precision is acceptable over-precision.
Fix: Add comment: // Full-precision: hardware gives ~12-14 bit estimate. NR converges identically.
Test gap: Zero frsqrtex unit tests — add 4 (±0 inputs, negative input+VXSQRT, SNaN, +∞).

PPCBUG-207 — FMA opcode OX logic correct, OX edge cases untested (informational)

Severity: LOW (confirmed clean, informational)
Status: wontfix
Location: interpreter.rs:2335,2357,2379,2401
Analysis: inputs_were_finite correctly suppresses OX when an input is already infinite. OX fires when all inputs are finite but the FMA result overflows — ISA-correct.
Test gap: Zero tests for OX scenario in any FMA opcode.

PPCBUG-208 — Zero tests for fsubx, fdivx, fmsubx, fnmaddx, fnmsubx, fsqrtx, frsqrtex

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module
Symptom: 7 of 10 group-30 opcodes have zero tests. faddx has 1 happy-path test; fmulx has 1; fmaddx has 1. None have FPSCR/Rc=1/edge-case coverage.
Recommended minimum (12 tests): fsubx normal; fsubx VXISI; fdivx normal; fdivx ZX; fdivx VXZDZ; fmsubx normal; fnmaddx normal; fnmsubx normal; fnmaddx NaN-sign regression (PPCBUG-205); fsqrtx normal; fsqrtx negative+VXSQRT; frsqrtex positive.

IDs PPCBUG-209 through PPCBUG-219 are unallocated — reserved for group 30 follow-up.

Pending batches

Batch 2: groups 6-11 — logical immediate, logical register, sign-extend/CLZ, word rotate, doubleword rotate, shift.
Batch 3: groups 12-17 — compare, branch, trap+sc, CR logical, SPR/MSR, cache+sync.
Batch 4: groups 18-23 — loads (byte, halfword, word, doubleword, multiple/string, float).
Batch 5 (partial): groups 24, 26, 27, 28 done; group 25 (store word) pending.
Batch 6 (partial): groups 29, 30 done; group 31 (FPU convert/compare) pending.
Batch 7: groups 32-34 — VMX integer (add/sub, compare/min/max, logical/shift).
Batch 8: groups 35-38 — VMX permute/pack, VMX float, VMX multiply-sum, VMX load/store.
Phase C: decoder field extractors, decoder opcode-lookup, disassembler formatter parity.
Phase D: this file gets re-sorted by severity and finalized.

Batch 6 (continued) — FPU sign/move/compare/convert/round (group 31)

Per-group report: audit-out/group-31-fpu-misc.md.

Group 31 summary: 9 findings (PPCBUG-221..231; IDs 220/222/226 retracted after analysis). 1 HIGH, 3 MEDIUM, 5 LOW. The sign-bit manipulation family (fabsx, fnegx, fnabsx, fmrx) and fselx are all ISA-correct — Rust arithmetic maps to bit-level operations that preserve SNaN payloads. fcmpu is correct (FPRF and VXSNAN set; no spurious VXVC). The conversion group is mostly correct for result values and overflow sentinels; the main gaps are FPSCR inexact/FR/FI tracking (shared with groups 29/30) and one subtle NearestEven tie-breaking defect in round_to_i64 that affects fctidx. fcmpo silently omits VXSNAN/VXVC despite having a comment acknowledging the gap.

9 IDs used (PPCBUG-221, 223, 224, 225, 227, 228, 229, 230, 231). IDs 220/222/226 retracted. IDs PPCBUG-232..239 unallocated.

PPCBUG-221 — `fctidx` / `round_to_i64` NearestEven tie-breaking uses f64::EPSILON; broken for |v| > 2^52

Severity: HIGH
Status: applied (P5 d39d0ba, 2026-05-02)
Location: fpscr.rs:220–238 (round_to_i64, NearestEven case)
Symptom: The tie-breaking code computes diff = (v - v.trunc()).abs() and tests (diff - 0.5).abs() < f64::EPSILON to detect a half-integer. Above |v| = 2^52, v.trunc() == v for all representable f64 values (all are exact integers), so diff == 0.0 and the tie-breaking branch is never taken — the code falls through to v.round() as i64, which is round-half-away-from-zero instead of round-half-to-even. Every fctid call on a large odd half-integer (e.g. (2^52 + 1).5) produces the wrong integer. In practice these exact 0.5 cases are rare for large values but can appear in audio sample-count arithmetic and physics fixed-point pipelines.

Fix: replace the NearestEven arm with a fractional-part-only tie check that is exact for |v| <= 2^52 and degenerates correctly to truncation above 2^52:

RoundingMode::NearestEven => {
    let t = v.trunc();
    let frac = v - t; // exact for |v| <= 2^52; ==0 above (already integer)
    let fa = frac.abs();
    if fa > 0.5 { t as i64 + if v >= 0.0 { 1 } else { -1 } }
    else if fa < 0.5 { t as i64 }
    else {
        // Exact 0.5 tie — round to even.
        let fi = t as i64;
        if fi & 1 == 0 { fi } else { fi + if v >= 0.0 { 1 } else { -1 } }
    }
}

Test gap: add round_to_i64 tests in fpscr.rs:tests: 0.5→0, 1.5→2, 2.5→2, 3.5→4, -0.5→0, -1.5→-2. Existing tests cover 2.5→2 and 3.5→4 (currently accidentally correct).

PPCBUG-223 — `fcmpo` omits FPSCR[VXSNAN] and FPSCR[VXVC] on NaN operands

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:2645–2675
Symptom: fcmpo body is identical to fcmpu — it sets FPRF and the CR field correctly but calls no fpscr::set_exception. PowerISA requires: QNaN → FPSCR[VXVC, VX, FX]; SNaN → additionally FPSCR[VXSNAN]. fcmpu correctly sets VXSNAN for SNaN; fcmpo does not. A comment in the source acknowledges "not modeled yet."
Impact: fcmpo. (Rc=1) checking CR1.FX after a NaN compare will see FX=0 instead of FX=1. mffsx after fcmpo will not reflect VXVC. Xbox 360 CRT comparison primitives (islessgreater, ordered relational operators) use fcmpo.

Fix:

if fra.is_nan() || frb.is_nan() {
    ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: false, so: true };
    if fpscr::is_snan(fra) || fpscr::is_snan(frb) {
        fpscr::set_exception(ctx, fpscr::VXSNAN | fpscr::VXVC);
    } else {
        fpscr::set_exception(ctx, fpscr::VXVC);
    }
}

PPCBUG-224 — `fcfidx` does not set FPSCR[XX/FX] for inexact i64→f64 conversion

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:2528–2536
Symptom: Only FPRF is updated. Per ISA, fcfid sets FPSCR[XX, FX] (and FR/FI) when the i64 value has more than 53 significant bits and precision is lost. Any i64 with |v| > 2^53 triggers inexact. Common trigger: large frame/sample counters, address values.
Fix: after the conversion, compare (result as i64) != (bits as i64) and call fpscr::set_exception(ctx, fpscr::XX) if inexact.

PPCBUG-225 — `frspx` does not set FPSCR[XX/FX/FR/FI] on inexact rounding

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:2516–2527
Symptom: update_after_op sets OX/UX only. The ISA requires FR/FI/XX/FX on any f64→f32 rounding that is not exact. frsp is the canonical double→single-precision narrowing idiom in compiler output — virtually every call is inexact.
Fix: after to_single, compare result vs b; if different and both finite, call fpscr::set_exception(ctx, fpscr::XX | fpscr::FI | ...) with FR set if magnitude increased.

PPCBUG-227 — `fctiwx` rounding: `round_to_i32` inherits NearestEven defect via `round_to_i64`

Severity: LOW
Status: applied (P5 d39d0ba, 2026-05-02)
Location: fpscr.rs:241–243
Symptom: round_to_i32 calls round_to_i64 then clamps. The PPCBUG-221 defect in round_to_i64 does not manifest for i32-range values (the epsilon check accidentally works at this scale), but the structural fragility is inherited. Fixing PPCBUG-221 cures this.
Recommendation: add unit tests round_to_i32(0.5)==0, round_to_i32(1.5)==2, round_to_i32(2.5)==2 to verify correct round-to-even behavior.

PPCBUG-228 — Zero interpreter execution tests for fabsx/fnegx/fnabsx/fmrx/fselx/fcmpo/fcfidx/fctidx/fctidzx/frspx

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs #[cfg(test)] module
Symptom: 10 of the 13 group-31 opcodes have zero dedicated tests. test_fcmpu covers only the ordered comparison 5.0 > 3.0. test_fctiwzx covers one positive truncation. test_fadd/test_fmul are group-30 tests, not group-31.
Recommended minimum: SNaN-preservation test for fabsx/fnegx/fnabsx; fselx with NaN/−0/−1; fcmpo QNaN→VXVC (after PPCBUG-223 fix); fcfidx exact and inexact; fctidx tie cases; frspx inexact → XX set (after PPCBUG-225 fix); fctiwx nearest-even tie; fctiwzx NaN sentinel.

PPCBUG-229 — `fctidx` / `fctidzx` do not set FPSCR[XX/FX] for inexact inputs

Severity: LOW
Status: applied (P5 d39d0ba, 2026-05-02)
Locations: interpreter.rs:2537–2574
Symptom: Per ISA, float-to-integer conversions set FPSCR[XX, FX] when the source value is not an integer (the fractional part is discarded). Neither opcode sets XX. Shared root cause with PPCBUG-224/225.

PPCBUG-230 — `fctiwx` / `fctiwzx` do not set FPSCR[XX/FX] for inexact inputs

Severity: LOW
Status: applied (P5 d39d0ba, 2026-05-02)
Locations: interpreter.rs:2575–2612
Symptom: Same omission as PPCBUG-229 for the word-width conversion pair.

PPCBUG-231 — `frspx` SNaN input result written as QNaN (host platform dependency)

Severity: LOW
Status: open
Location: interpreter.rs:2519–2524
Symptom: Rust's as f32 (CVTSD2SS) can set the quiet bit on SNaN input, producing a QNaN in the FPR. Per ISA, frsp on SNaN should quieten it — so the QNaN result is correct in kind. The risk is that the exact QNaN bit-pattern may differ from PPC's canonical quietening (which ORs bit 22 into the f32 mantissa). Game code inspecting the NaN payload after frsp may see a different payload. Same structural root cause as PPCBUG-128 (lfs SNaN quietening), but lower severity because frsp IS arithmetic.

IDs PPCBUG-232 through PPCBUG-239 are unallocated — no further bugs found in group 31.

Batch 7 — VMX integer add/sub (group 32)

Per-group report: audit-out/group-32-vmx-int-addsub.md.

Scope: vaddubm, vaddubs, vadduhm, vadduhs, vadduwm, vadduws, vaddsbs, vaddshs, vaddsws, vaddcuw, vsububm, vsububs, vsubuhm, vsubuhs, vsubuwm, vsubuws, vsubsbs, vsubshs, vsubsws, vsubcuw.

Overall verdict: All 20 opcodes are arithmetically correct. No HIGH-severity bugs found. Lane indexing (big-endian, PPC element 0 = Vec128::bytes[0]), saturation arithmetic, VSCR.SAT sticky-set, and vaddcuw/vsubcuw carry/borrow semantics are all implemented correctly. 4 LOW-severity findings (2 test gaps, 1 code organization, 1 API hazard).

PPCBUG-240 — 18 of 20 group-32 opcodes have zero interpreter-level tests

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs #[cfg(test)] module
Symptom: Only test_vaddubs_saturates_and_sets_vscr_sat covers any group-32 opcode. vaddubm, vsububm, vadduhm, vsubuhm, vadduwm, vsubuwm, vaddsbs, vsubsbs, vadduhs, vsubuhs, vaddshs, vsubshs, vadduws, vsubuws, vaddsws, vsubsws, vaddcuw, vsubcuw — all 18 have no tests. No high risk today but no regression guard.
Recommended minimum: wrap-around test (byte, halfword, word); sat-at-max and sat-at-min tests; VSCR.SAT sticky-set across two successive saturating instructions; vaddcuw carry lane; vsubcuw no-borrow lane.

PPCBUG-241 — `vadduwm` / `vsubuwm` stranded in a separate section from the rest of group-32

Severity: LOW (maintenance hazard)
Status: open
Location: interpreter.rs:2090–2104 (stranded) vs. interpreter.rs:2784 (§4a group-32 section)
Symptom: The two word-modulo opcodes are matched 700 lines above the rest of the group, with only a comment at line 2819 as a cross-reference. A future sweep of §4a for group-32 changes would miss them.
Fix: Move both arms into §4a and remove the comment at line 2819.

PPCBUG-242 — `set_vscr_sat(false)` can non-stickily clear SAT from arithmetic handlers

Severity: LOW (API hazard)
Status: open
Location: context.rs:252–259
Symptom: set_vscr_sat(bool) accepts false, which would clear the sticky SAT bit. All current arithmetic callers pass true only (inside if sat { ... } guards), so no mis-clear occurs today. But the API is misleading — a future saturating handler that writes set_vscr_sat(lane_sat) with lane_sat = false would silently clear a previously-set bit.
Fix: Rename to sticky_set_vscr_sat() (no bool argument, always ORs). Retain force_vscr_sat(bool) for mtvscr.

PPCBUG-243 — `vmx.rs` saturation helpers: u16/i16/u32/i32 variants have zero unit tests

Severity: LOW (test gap)
Status: open
Location: crates/xenia-cpu/src/vmx.rs:705–799
Symptom: vmx.rs tests cover 5 cases of sat_add/sub_i8/u8. The 8 helpers for wider types (sat_add_u16, sat_sub_u16, sat_add_i16, sat_sub_i16, sat_add_u32, sat_sub_u32, sat_add_i32, sat_sub_i32) are mathematically correct but unguarded by any test. Recommended additions listed in the per-group report.

IDs PPCBUG-244 through PPCBUG-274 are unallocated — no further bugs found in group 32.

Batch 7 — VMX integer compare / min / max / avg (group 33)

Per-group report: audit-out/group-33-vmx-int-compare.md.

PPCBUG-275 — All VC-form vector compare dot forms: `rc_bit()` reads wrong bit; CR6 never updated

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Affected opcodes: vcmpequb., vcmpequh., vcmpgtsb., vcmpgtsh., vcmpgtub., vcmpgtuh.
Location: decoder.rs:75 + interpreter.rs:3318, 3331, 3344, 3357, 3370, 3383
Symptom: rc_bit() is implemented as self.raw & 1 != 0 (reads LSB = bit 0 of the word). For VC-form instructions the Rc flag is at PPC bit 21 = LSB bit 10, not bit 0. Bit 0 is the LSB of the 10-bit XO field. All integer compare XO values are even (XO=6, 70, 518, 774, 582, 838), so their bit 0 is always 0. The CR6 update block is unconditionally dead regardless of whether the programmer wrote the dot form. vcmpequb. vMask, vData, vNeedle + bc 12,26 (branch on CR6.LT = all-true) is the canonical AltiVec memchr idiom; it will always fall through.

Fix:

// decoder.rs — add:
/// Rc bit for VC-form vector compare instructions (PPC bit 21 = LSB bit 10).
#[inline] pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }

Replace instr.rc_bit() with instr.vc_rc_bit() at interpreter.rs:3318, 3331, 3344, 3357, 3370, 3383.

PPCBUG-276 — `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`: same VC-form Rc bug

Severity: MEDIUM
Status: applied (52b05b1, 2026-05-01)
Affected opcodes: vcmpequw., vcmpequw128., vcmpgtuw., vcmpgtsw.
Location: interpreter.rs:2237, 3396, 3406
Symptom: Same root cause as PPCBUG-275. XO for vcmpequw=134, vcmpgtuw=646, vcmpgtsw=902 — all even, bit 0 always 0. Word-compare dot forms never update CR6. vcmpequw128 uses the VMX128_R Rc encoding which also likely reads the wrong bit.
Fix: Use instr.vc_rc_bit() at interpreter.rs:2237, 3396, 3406. Separately verify VMX128_R Rc bit position for vcmpequw128 (may require its own extractor).

PPCBUG-277 — Zero tests for all `vcmp*` dot forms and CR6 correctness

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs #[cfg(test)] module
Symptom: No test exercises any of the 10 integer vector compare opcodes. Critical missing: vcmpequb. all-true → CR6.LT=1; vcmpequb. all-false → CR6.EQ=1; vcmpgtsb signed boundary (0x80 vs 0x7F must yield false, not true); vcmpgtsh at 0x8000 vs 0x7FFF.

PPCBUG-278 — Zero tests for all 12 `vmax` / `vmin` opcodes

Severity: LOW (test gap)
Status: open
Location: interpreter.rs #[cfg(test)] module
Symptom: None of vmaxub/uh/uw/sb/sh/sw, vminub/uh/uw/sb/sh/sw are tested. Critical missing: vmaxsb(0x80, 0x7F) = 0x7F (signed max of -128 and +127); vminsb(0x80, 0x7F) = 0x80. Without these, signed vs unsigned confusion in min/max would not be caught.

PPCBUG-279 — Zero tests for all 6 `vavg*` opcodes; no signed-boundary or rounding coverage

Severity: LOW (test gap)
Status: open
Location: interpreter.rs #[cfg(test)] module; vmx.rs test module
Symptom: avg_u8 through avg_i32 helpers have no unit tests. Key rounding case: avg_u8(0, 1) must be 1 (round up), not 0 (truncation). avg_i32(i32::MIN, i32::MIN) must be i32::MIN without overflow.

IDs PPCBUG-280 through PPCBUG-314 are unallocated — no further bugs found in group 33.

Batch 6 — VMX integer logical / shift / rotate / select (group 34)

Per-group report: audit-out/group-34-vmx-logic-shift.md.

Group 34 summary: the bitwise logical ops (vand/vandc/vor/vxor/vnor and their 128 variants) are all ISA-correct — Vec128 is [u8; 16] with no padding bits, so !(u32) flips exactly 32 bits per lane with no upper-bit pollution (the PPCBUG-029/030/031 class does not apply to VMX register files). The per-lane shifts (vslb/vsrb/vsrab, vslh/vsrh/vsrah, vslw/vsrw/vsraw and their 128 variants) all correctly mask the shift count to the lane width before shifting; vsraw uses i32 arithmetic right shift which is correctly defined in Rust for shift-by-31. The per-lane rotates (vrlb/vrlh/vrlw and 128 variants) are correct. The whole-register bit shifts (vsl/vsr) and whole-register byte shifts (vslo/vsro and 128 variants) correctly extract the shift count from VB.b[15] with the proper bit masks. vsel and vsel128 are correct including the read-before-write ordering on vsel128's vc=vd aliasing.

One HIGH bug found: vrlimi128 extracts both the rotate-amount (z) field and the blend-mask (IMM) field from the wrong bit positions of the instruction word.

0 MEDIUM bugs with code change needed. 1 HIGH. 10 LOW (test gaps and informational).

PPCBUG-315 — vrlimi128 z and IMM fields extracted from wrong bit positions

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Location: interpreter.rs:3551–3552
Symptom: shift = ((instr.raw >> 16) & 0x3) reads integer bits 16–17 — the low 2 bits of the 5-bit IMM (blend-mask) field — instead of the 2-bit z (rotate) field at integer bits 6–7. mask = (instr.raw >> 2) & 0xF reads integer bits 2–5 — VD128h extension bits and a reserved field — instead of the low 4 bits of IMM at integer bits 16–19. Every vrlimi128 executes with a wrong rotate amount and a wrong per-word select mask. The only benign case is the degenerate encoding where z == IMM[1:0] and the garbage mask happens to equal the intended mask — unlikely in real code.
VX128_4 field layout (LSB-0 integer bit numbering after PPC big-endian byte-swap to host):
- VD128l : 5 at integer bits 21–25 (PPC bits 6–10)
- IMM : 5 at integer bits 16–20 (PPC bits 11–15) — blend mask, 4 bits used
- VB128l : 5 at integer bits 11–15 (PPC bits 16–20)
- z : 2 at integer bits 6–7 (PPC bits 24–25) — rotate amount 0..3
- VD128h : 2 at integer bits 2–3 (PPC bits 28–29)

Fix:

let shift = ((instr.raw >> 6) & 0x3) as usize;  // z field: integer bits 6-7
let mask  = (instr.raw >> 16) & 0xF;             // IMM low 4 bits: integer bits 16-19

Canary reference: ppc_decode_data.h:585–608 FormatVX128_4; ppc_emit_altivec.cc:1318,1324.
Note: the rotate logic (b[(shift + i) % 4]) and mask-select logic ((mask >> (3-i)) & 1) in the interpreter body are ISA-correct — only the field extraction is wrong.
Test gap: no interpreter execution test for vrlimi128 (PPCBUG-325).

PPCBUG-316 — Zero interpreter execution tests for vslb/vsrb/vsrab (LOW)

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:3440–3463

PPCBUG-317 — Zero interpreter execution tests for vslh/vsrh/vsrah (LOW)

Severity: LOW (test gap)
Status: open
Location: interpreter.rs:3472–3503

PPCBUG-318 — vslo/vsro byte-shift count max is 15 (correct; informational)

Severity: LOW (informational / wontfix)
Status: wontfix
N is a 4-bit field; max shift is 15 bytes = 120 bits (not 128). VD retains the 8 LSBs of VA in position [127:120] at N=15. ISA-correct.

PPCBUG-319 — vsel128 vc=vd read-before-write ordering (correct; informational)

Severity: LOW (informational / wontfix)
Status: wontfix
c = ctx.vr[vc] is read before ctx.vr[vd] = result. Correctly sequenced.

PPCBUG-320 — Zero interpreter execution tests for vslw/vsrw/vsraw/vrlw (+128 variants)

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:2108–2155

PPCBUG-321 — Zero interpreter execution tests for vsl/vsr

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:3508–3521

PPCBUG-322 — Zero interpreter execution tests for vslo/vsro (+128 variants)

Severity: LOW (test gap)
Status: open
Location: interpreter.rs:3523–3541

PPCBUG-323 — Zero interpreter execution tests for vand/vandc/vor/vxor/vnor (+128 variants)

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:1900–1944

PPCBUG-324 — Zero interpreter execution tests for vsel/vsel128

Severity: LOW (test gap)
Status: open
Location: interpreter.rs:1945–1967

PPCBUG-325 — Zero interpreter execution tests for vrlb/vrlh/vrlw/vrlimi128 (+128 variants)

Severity: LOW (test gap; fix PPCBUG-315 before writing vrlimi128 tests)
Status: open
Location: interpreter.rs:3464–3503, 2144–2155, 3550–3565

IDs PPCBUG-326 through PPCBUG-354 are unallocated — no further bugs found in group 34.

Batch 8 — VMX permute / merge / splat / pack / unpack (group 35)

Per-group report: audit-out/group-35-vmx-permute.md.

Summary: 5 HIGH, 3 MEDIUM, 9 LOW. Four VX128_* field-extraction bugs; one missing post-pack permutation logic; VSCR.SAT and pack saturation semantics are all correct. Zero interpreter tests for any group-35 opcode.

PPCBUG-360 — vperm128: VC register read from wrong field (vd128() instead of VX128_2 VC bits 23-25)

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Location: interpreter.rs:1979
Symptom: vperm128 uses the VX128_2 instruction form. The permute-control register VC is a 3-bit field at PPC bits 23-25 (LSB integer bits 6-8). The code does vc = instr.vd128() which reads PPC bits 6-10 + 21-22 — a completely different set of bits. Every vperm128 therefore permutes with a control vector read from the wrong register, producing garbage output. vperm128 is one of the most-used VMX128 ops in Xbox 360 graphics code (texture/vertex data layout).

Fix:

// decoder.rs — add accessor:
#[inline] pub fn vc128_2(&self) -> usize { ((self.raw >> 6) & 0x7) as usize }
// interpreter.rs:1979 — replace:
vc = instr.vc128_2(); // VX128_2 VC field at PPC bits 23-25

ISA ref: ppc-manual/vmx/vperm.md, VX128_2 encoding; ppc_decode_data.h:541-561; ppc_emit_altivec.cc:1203-1204 (VX128_2_VC).

PPCBUG-361 — vsldoi128: SH field MSB reads bit 4 (reserved) instead of bit 9

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Location: interpreter.rs:2012
Symptom: VX128_5 SH is a 4-bit field at LSB integer bits 6-9. Code does ((raw >> 6) & 0x7) | (((raw >> 4) & 0x1) << 3). This reads bit 4 (a reserved field, always 0 in valid encodings) as the MSB of SH instead of bit 9. Shifts of 8-15 bytes silently resolve as shifts of 0-7 bytes. vsldoi128 with SH >= 8 (common in vector rotation patterns) always produces the wrong result.

Fix:

let sh = ((instr.raw >> 6) & 0xF) as usize; // SH field: integer bits 6-9

ISA ref: ppc-manual/vmx/vsldoi.md, VX128_5 encoding; ppc_decode_data.h:609-634; canary VX128_5_SH.

PPCBUG-362 — vpermwi128: PERMh (high 3 bits of 8-bit PERM immediate) read from VD128l bits instead of bits 6-8

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Location: interpreter.rs:4089
Symptom: VX128_P PERM = PERMl[4:0] | (PERMh[2:0] << 5) where PERMl is at integer bits 16-20 and PERMh is at integer bits 6-8. Code does (raw >> 16) & 0xFF which reads bits 16-23. Bits 21-23 are VD128l[4:2], not PERMh. The top 3 bits of the 8-bit PERM immediate are wrong; output word lane selections for lanes 0 and 1 are controlled by garbage bits. Same pattern as PPCBUG-315.

Fix:

let imm = ((instr.raw >> 16) & 0x1F) | (((instr.raw >> 6) & 0x7) << 5); // VX128_P PERM

ISA ref: ppc_decode_data.h:664-686; ppc_emit_altivec.cc:1214.

PPCBUG-363 — vpkd3d128: post-pack permutation (pack + z fields) entirely absent; output always placed in wrong lane when pack != 0

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Location: interpreter.rs:3783-3808
Symptom: Canary's vpkd3d128 does three things: (1) pack VB by type, (2) permute the result with the existing VD register using a control determined by pack (IMM[1:0]) and shift (z field at integer bits 6-7), (3) store. Xenia-rs does only (1) and (3), skipping the entire lane-placement permutation. When pack != 0, the packed value must be merged into a specific 32-bit or 64-bit slot of VD — this merge never happens. pack=0 is the only safe case. Most D3D vertex pack sequences use pack=1 (32-bit slot) with varying shift.
Fix: Extract pack = uimm & 3 and shift = (instr.raw >> 6) & 3 (z field), read existing ctx.vr[vd], apply the permutation table from ppc_emit_altivec.cc:2125-2188, write back.
ISA ref: ppc_emit_altivec.cc:2088-2191.

PPCBUG-364 — vsldoi (non-128): correct; PPCBUG-365 — vsplt*: correct; informational

Severity: LOW (wontfix)
Status: wontfix
vsldoi correctly extracts SH as (raw >> 6) & 0xF. vspltb/vsplth/vspltw correctly read UIMM from the VA position (integer bits 16-20, masked to lane width). No bugs.

PPCBUG-366 — vspltisb / vspltish: sign-extension idiom is correct but non-obvious; future regression risk

Severity: MEDIUM
Status: open (clarity fix recommended)
Location: interpreter.rs:2059-2060, 2064-2066
Symptom: simm | !0x1F where simm is typed i8/i16 is functionally correct (Rust narrows !0x1F to the target type), but the pattern is fragile under refactoring. Recommend:
```
let simm = (((instr.raw >> 16) & 0x1F) as i32).wrapping_shl(27).wrapping_shr(27) as i8;
```

PPCBUG-367 — vupkhpx / vupklpx: channel replication vs zero-extend divergence; canary is unimplemented

Severity: MEDIUM
Status: open
Location: vmx.rs:318-330
Symptom: unpack_pixel_555 replicates 5-bit RGB channels (r << 3 | r >> 2) to fill 8 bits. ISA specifies zero-extension into bits 7:3, leaving bits 2:0 as zero. The replicate approach produces slightly different values (and slightly higher values), diverging from hardware.
Fix: let r8 = r << 3; (drop the | r >> 2 replication term).

PPCBUG-368 — vpkpx: pack_pixel_555 channel assignment unverified against hardware; canary comparison inconclusive

Severity: MEDIUM
Status: open (needs hardware trace or more detailed canary analysis)
Location: vmx.rs:310-316
Symptom: The xenia-rs layout comment says R=bits 8-15, G=16-23, B=24-31. Canary's vkpkx_in_low uses different shift amounts (>> 9 for R, >> 6 for G, >> 3 for B), suggesting either a different input layout assumption or the channels are swapped. Without a hardware reference, cannot determine which is authoritative.

PPCBUG-369 — vpkd3d128 z-field not extracted (sub-issue of PPCBUG-363)

Severity: LOW (tracked under PPCBUG-363)
Status: applied (52b05b1, 2026-05-01)
Location: interpreter.rs:3785
The z field (VX128_4, integer bits 6-7) is never extracted. Correct extraction: (instr.raw >> 6) & 0x3.

PPCBUG-370 — Zero interpreter tests for vperm / vperm128 (test gap)

Severity: LOW
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:1970-1995

PPCBUG-371 — Zero interpreter tests for vsldoi / vsldoi128 (test gap)

Severity: LOW
Status: open
Location: interpreter.rs:1997-2020

PPCBUG-372 — Zero interpreter tests for vpermwi128 (test gap)

Severity: LOW
Status: open
Location: interpreter.rs:4087-4099

PPCBUG-373 — Zero interpreter tests for vmrghb / vmrglb / vmrghh / vmrglh (test gap)

Severity: LOW
Status: open
Location: interpreter.rs:3570-3600

PPCBUG-374 — Zero interpreter tests for vspltb / vsplth / vspltw / vspltw128 (test gap)

Severity: LOW
Status: open
Location: interpreter.rs:2022-2048

PPCBUG-375 — Zero interpreter tests for vspltisb / vspltish / vspltisw / vspltisw128 (test gap)

Severity: LOW
Status: open
Location: interpreter.rs:2050-2068

PPCBUG-376 — Zero interpreter tests for all vpk* (16 ops) + VSCR.SAT coverage (test gap)

Severity: LOW
Status: open
Location: interpreter.rs:3607-3718

PPCBUG-377 — Zero interpreter tests for vupkhsb / vupklsb / vupkhsh / vupklsh (test gap)

Severity: LOW
Status: open
Location: interpreter.rs:3722-3754

PPCBUG-378 — Zero interpreter tests for vpkd3d128 / vupkd3d128 (test gap; blocked on PPCBUG-363)

Severity: LOW
Status: open
Location: interpreter.rs:3783-3835

IDs PPCBUG-379 through PPCBUG-419 are unallocated — no further bugs found in group 35.

Batch 9 — VMX float arithmetic / compare / convert / estimate (group 36)

Per-group report: audit-out/group-36-vmx-float.md.

Group 36 summary: 21 findings (PPCBUG-420..440). 6 HIGH, 8 MEDIUM, 7 LOW. The most critical bugs are: (1) four VMX float compare VC-form opcodes use rc_bit() (bit 0) instead of the correct VC-form Rc bit (bit 10) — CR6 is never updated, same root cause as PPCBUG-275; (2) vmaddfp128 and vmaddcfp128 have their multiplicand and accumulator operands swapped — every matrix multiply / Newton-Raphson step using these opcodes produces the wrong result; (3) VMX128_R dot-form compares (vcmpeqfp128. etc.) decode as Invalid due to missing key4 entries in decode_op6.

6 HIGH, 8 MEDIUM, 7 LOW. 21 IDs used (PPCBUG-420..440). 39 IDs unallocated (PPCBUG-441..479).

PPCBUG-420 — vcmpeqfp / vcmpgefp / vcmpgtfp: `rc_bit()` reads wrong bit; CR6 never updated

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Affected opcodes: vcmpeqfp., vcmpgefp., vcmpgtfp.
Location: interpreter.rs:1875, 1885, 1895
Symptom: rc_bit() = self.raw & 1 reads LSB bit 0. For VC-form the Rc flag is at PPC bit 21 = LSB bit 10. All XO values (vcmpeqfp=198, vcmpgefp=454, vcmpgtfp=710) have bit 0 = 0, so CR6 is never updated for any float compare dot form. vcmpeqfp. + bc 12,24 (branch all-equal) always falls through.
Cross-reference: PPCBUG-275 (identical root cause for integer vcmp). Canary reads i.VXR.Rc (ppc_emit_altivec.cc:625, 633, 641).
Fix: Add pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 } to decoder.rs and replace instr.rc_bit() at interpreter.rs:1875, 1885, 1895.

PPCBUG-421 — vcmpbfp: `rc_bit()` reads wrong bit (VC-form); Rc gate permanently dead

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Location: interpreter.rs:3428
Symptom: Same root cause as PPCBUG-420. XO=966, bit 0 = 0; CR6 update never fires for vcmpbfp.. The CR6 value logic (eq = !any_out) is correct; only the gate is wrong.
Fix: Use instr.vc_rc_bit() at interpreter.rs:3428.

PPCBUG-422 — vcmpeqfp128 / vcmpgefp128 / vcmpgtfp128 / vcmpbfp128: `rc_bit()` reads wrong bit (VX128_R-form)

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Location: interpreter.rs:1875, 1885, 1895, 3428 (shared arms with non-128 forms)
Symptom: For VX128_R-form, Rc is at PPC bit 27 = LSB bit 4 (confirmed from canary's VX128_R bitfield: uint32_t Rc : 1 at bit 4 from LSB). rc_bit() reads bit 0. Fix PPCBUG-423 first (dot forms decode as Invalid before this even matters).
Fix: Add pub fn vx128r_rc_bit(&self) -> bool { (self.raw >> 4) & 1 != 0 } and use it in the VX128_R compare arms.

PPCBUG-423 — vcmpeqfp128. / vcmpgefp128. / vcmpgtfp128. / vcmpbfp128.: dot forms decode as `Invalid`

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Location: decoder.rs:640-648 (decode_op6 VMX128 compare key4 table)
Symptom: decode_op6 extracts key4 = (bits22-24 << 3) | bit27. When Rc=1, PPC bit 27 is set, making key4 = non-dot value + 1. Dot-form key4 values (1, 9, 17, 25, 33) are all absent from the match table. Decoder returns PpcOpcode::Invalid. Any game shader using a VMX128-form float compare dot form traps with unimplemented opcode.

Fix: Add dot-form entries to the key4 match table mapping to the same opcodes (the interpreter arm uses instr.vx128r_rc_bit() to conditionally update CR6):

0b000001 => return PpcOpcode::vcmpeqfp128,
0b001001 => return PpcOpcode::vcmpgefp128,
0b010001 => return PpcOpcode::vcmpgtfp128,
0b011001 => return PpcOpcode::vcmpbfp128,
0b100001 => return PpcOpcode::vcmpequw128,

PPCBUG-424 — vmaddfp128: operand swap — computes VA×VB+VD instead of VA×VD+VB

Severity: HIGH
Status: applied (52ece4b, 2026-05-02)
Location: interpreter.rs:1771 (r[i] = ai.mul_add(bi, di))
Symptom: Canary (ppc_emit_altivec.cc:806-809) documents (VD) <- (VA × VD) + VB and routes as MulAdd(VA, VD, VB). Xenia-rs reads VA, VB, VD then computes ai.mul_add(bi, di) = VA × VB + VD — VB and VD roles swapped. Every shader using vmaddfp128 for matrix multiply or Newton-Raphson accumulation accumulates the wrong value. The existing denorm-flush test aliases vA=vD=v2, making the swap invisible.
Fix: r[i] = ai.mul_add(di, bi);

PPCBUG-425 — vmaddcfp128: operand swap — computes VD×VB+VA instead of VA×VD+VB

Severity: HIGH
Status: applied (52ece4b, 2026-05-02)
Location: interpreter.rs:4065 (r[i] = di.mul_add(bi, ai))
Symptom: Canary (ppc_emit_altivec.cc:819) documents (VD) <- (VA × VD) + VB. Xenia-rs computes VD × VB + VA. Both the first multiplicand and the addend are wrong.
Fix: r[i] = ai.mul_add(di, bi);
Test gap: zero tests for vmaddcfp128. Add test with distinct VA, VB, VD registers.

PPCBUG-426 — vnmsubfp: two rounding steps instead of fused FMA; NaN sign may be flipped

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:1786 (r[i] = bi - ai * ci)
Symptom: vmaddfp uses single-rounded ai.mul_add(ci, bi), but vnmsubfp uses bi - ai * ci (two operations, two rounding steps). ISA specifies a single fused operation. Canary acknowledges the same limitation (ppc_emit_altivec.cc:1136). Additionally, the implicit negation in subtraction may flip the sign bit of a NaN result (see PPCBUG-183).
Fix: r[i] = -ai.mul_add(ci, -bi); — single FMA rounding: -(ai*ci + (-bi)) = bi - ai*ci.

PPCBUG-427 — vnmsubfp128: same two-rounding form as vnmsubfp

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:1803 (r[i] = di - ai * bi)
Symptom: Same class as PPCBUG-426 for the VMX128 form.
Fix: r[i] = -ai.mul_add(bi, -di);

PPCBUG-428 — vrefp / vrefp128: full-precision 1/x instead of ~12-bit hardware estimate

Severity: MEDIUM
Status: open
Location: interpreter.rs:1853 (r[i] = 1.0 / b[i])
Symptom: Same class as PPCBUG-184 (fresx). Xenon vrefp provides ~12-bit accuracy; xenia-rs computes full IEEE-754 division. Canary also uses full precision in practice.

PPCBUG-429 — vrsqrtefp / vrsqrtefp128: full-precision 1/sqrt(x) instead of ~12-bit estimate

Severity: MEDIUM
Status: open
Location: interpreter.rs:1862 (r[i] = 1.0 / b[i].sqrt())
Symptom: Same class as PPCBUG-428 for reciprocal square root.

PPCBUG-430 — vexptefp / vexptefp128: full-precision exp2(x) instead of ~12-bit estimate

Severity: MEDIUM
Status: open
Location: interpreter.rs:3934 (r[i] = b[i].exp2())
Symptom: Same class as PPCBUG-428. NaN/Inf edge cases may diverge.

PPCBUG-431 — vlogefp / vlogefp128: full-precision log2(x) instead of ~12-bit estimate

Severity: MEDIUM
Status: open
Location: interpreter.rs:3944 (r[i] = b[i].log2())
Symptom: Same class as PPCBUG-428.

PPCBUG-432 — vrfin / vrfin128: Rust `round()` is round-half-away-from-zero; ISA requires round-to-nearest-even

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:2172 (r[i] = b[i].round())
Symptom: vrfin(0.5) → ISA = 0.0; Rust = 1.0. vrfin(2.5) → ISA = 2.0; Rust = 3.0. Canary uses SSE2 ROUNDPS which is round-to-nearest-even.
Fix: Use f32::round_ties_even() (stable since Rust 1.77).

PPCBUG-433 — vctsxs / vcfpsxws128: NaN input returns 0 instead of saturating to INT_MIN (0x80000000)

Severity: MEDIUM
Status: applied (P5 d39d0ba, 2026-05-02)
Location: vmx.rs:217 (if x.is_nan() { return (0, true); })
Symptom: AltiVec ISA: NaN in vctsxs saturates to INT_MIN (0x80000000). Xenia-rs returns 0.
Fix: if x.is_nan() { return (i32::MIN, true); }

PPCBUG-434 — vctuxs NaN → 0 is correct; informational

Severity: LOW (wontfix)
Status: wontfix
Location: vmx.rs:225
Note: Unsigned NaN saturates to 0 per ISA. Xenia-rs is correct. Add a comment.

PPCBUG-435 — vaddfp / vsubfp / vmulfp128: subnormal inputs not flushed when VSCR.NJ=1

Severity: MEDIUM (latent — Xbox 360 always boots with NJ=1)
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:1713, 1729, 1812
Symptom: VSCR.NJ=1 requires flush-to-zero for subnormal inputs. vmaddfp family correctly calls vmx::flush_denorm(); plain add/sub/mul do not check VSCR.

PPCBUG-436 — vmsum3fp128 / vmsum4fp128: per-product intermediates not individually flushed

Severity: MEDIUM (latent)
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:4076, 4083
Symptom: flush_denorm on final sum only. Per-lane products can be subnormal and accumulate before the final flush.

PPCBUG-437 — vmaddfp / vmaddfp128 / vmaddcfp128 / vnmsubfp128: subnormal output not flushed

Severity: MEDIUM (latent)
Status: applied (P5 d39d0ba, 2026-05-02)
Location: interpreter.rs:1752–1754, 1771–1773, 4064–4067, 1803–1805
Symptom: VSCR.NJ=1 requires flushing subnormal results. Inputs flushed; outputs are not.

PPCBUG-438 — Zero tests for vcmpeqfp / vcmpgefp / vcmpgtfp / vcmpbfp and dot forms

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs test module

PPCBUG-439 — Zero tests for vrfiz / vrfin / vrfip / vrfim and 128-bit variants

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:2158–2192

PPCBUG-440 — Zero tests for vctsxs / vctuxs / vcfsx / vcfux and 128-bit variants

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs:3842–3923

IDs PPCBUG-441 through PPCBUG-479 are unallocated — no further bugs found in group 36.

Batch 8 — VMX integer multiply-sum / multiply-half / sums / special (group 37)

Per-group report: audit-out/group-37-vmx-mulsum.md.

Note: All opcodes in this group are XEINSTRNOTIMPLEMENTED() stubs in xenia-canary; correctness is derived from the IBM ISA and ppc-manual/vmx/ snapshots. vrlimi128 is already tracked as PPCBUG-315.

PPCBUG-482 — `vmhaddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)

Severity: WITHDRAWN
Status: no bug
Note: Draft analysis suggested >>16; the spec snapshot ppc-manual/vmx/vmhaddshs.md explicitly shows prod = (VA[i]*VB[i]) >> 15 and the pathological-case example confirms 0x8000*0x8000 >> 15 = 32768. Xenia-rs matches the spec exactly. No code change.

PPCBUG-483 — `vmhraddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)

Severity: WITHDRAWN
Status: no bug
Note: ppc-manual/vmx/vmhraddshs.md explicitly shows (product + 0x4000) >> 15. Xenia-rs matches. No code change needed.

PPCBUG-487 — vsumsws/vsum2sws/vsum4sbs/vsum4ubs/vsum4shs: VB operand mis-named as "c"/"VC"

Severity: MEDIUM
Status: open
Location: interpreter.rs:3249-3307
Symptom: All five vsum* handlers use a VX-form instruction (two operands: VA and VB). The code names the VB source c and the comment references "vC" — implying a non-existent third register operand. Only instr.ra() and instr.rb() are valid for VX form; there is no rc(). The arithmetic is correct (rb() is called), but the naming misleads maintainers into thinking there is a VA-form three-operand encoding.
Fix: Rename c → b and update comments to say VB instead of vC in all five handler bodies.

PPCBUG-490 — Zero tests for all six vmsum* opcodes

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: interpreter.rs #[cfg(test)] section
Symptom: No unit test for vmsumubm, vmsummbm, vmsumuhm, vmsumuhs, vmsumshm, vmsumshs. Critical missing: saturation + VSCR.SAT for vmsumuhs/vmsumshs; mixed-sign byte product for vmsummbm; modulo wrap for vmsumshm.

PPCBUG-491 — Zero tests for `vmhaddshs` and `vmhraddshs`

Severity: LOW (test gap)
Status: open
Location: interpreter.rs #[cfg(test)] section
Symptom: No test for either multiply-high-add instruction. Key cases: VA = 0x8000, VB = 0x8000 (minus-one-times-minus-one saturating case); VA = VB = 0x7FFF, VC = 0x7FFF (add post-shift result to max accumulator). Verify VSCR.SAT is set on saturation and clear on non-saturating inputs.

PPCBUG-492 — Zero tests for `vmladduhm`

Severity: LOW (test gap)
Status: open
Location: interpreter.rs #[cfg(test)] section

PPCBUG-493 — Zero tests for all eight `vmule` / `vmulo` opcodes

Severity: LOW (test gap)
Status: open
Location: interpreter.rs #[cfg(test)] section
Symptom: No test for vmuleub, vmuloub, vmulesb, vmulosb, vmuleuh, vmulouh, vmulesh, vmulosh. Key: even vs odd lane distinction (vmulesh vs vmulosh) is untested.

PPCBUG-494 — Zero tests for all five vsum* opcodes

Severity: LOW (test gap)
Status: open
Location: interpreter.rs #[cfg(test)] section
Symptom: No test for vsumsws, vsum2sws, vsum4sbs, vsum4ubs, vsum4shs. Missing: zero-output-lanes verification for vsumsws (w[0..2] must be 0) and vsum2sws (w[0], w[2] must be 0); VSCR.SAT on overflow for all signed/unsigned variants.

PPCBUG-495 — `vsumsws` comment says "vC[3]" should say "VB[3]"

Severity: LOW (cosmetic)
Status: open
Location: interpreter.rs:3248

IDs PPCBUG-480, PPCBUG-481, PPCBUG-482 (withdrawn), PPCBUG-483 (withdrawn), PPCBUG-484, PPCBUG-485, PPCBUG-486, PPCBUG-488, PPCBUG-489, PPCBUG-496, PPCBUG-497, PPCBUG-498 are either withdrawn (no bug found after re-examination), informational, or references to existing IDs. IDs PPCBUG-499 through PPCBUG-509 are unallocated — no further bugs found in group 37.

Batch 8 — VMX load/store (group 38)

Per-group report: audit-out/group-38-vmx-loadstore.md.

Opcodes: lvebx, lvehx, lvewx, lvewx128, lvlx, lvlx128, lvlxl, lvlxl128, lvrx, lvrx128, lvrxl, lvrxl128, lvsl, lvsl128, lvsr, lvsr128, lvx, lvx128, lvxl, lvxl128, stvebx, stvehx, stvewx, stvewx128, stvlx, stvlx128, stvlxl, stvlxl128, stvrx, stvrx128, stvrxl, stvrxl128, stvx, stvx128, stvxl, stvxl128.

Group 38 summary: The load family (lvx, lvxl, lvlx, lvrx, lvsl, lvsr, lvebx, lvehx, lvewx, lvewx128 and all 128/LRU-hint variants) is arithmetically correct. EA computation, alignment masking, big-endian byte ordering, RA=0 special cases, and lane indexing all match the ISA and the ea_indexed helper. 5 HIGH bugs found — the systemic invalidate_for_write gap (PPCBUG-107 family) applies to ALL 16 VMX store opcodes, and stvewx128 has an additional severe memory-corruption bug (writes 16 bytes instead of 1 word). 1 MEDIUM (behavioral divergence between lvebx/lvehx/lvewx and canary's full-line simplification — xenia-rs is architecturally more correct). 1 MEDIUM (lvsr sh=0 edge-case correctness, documentation gap). 3 LOW test-coverage gaps.

PPCBUG-510 — `stvewx128` stores all 16 bytes instead of one word; 12-byte memory corruption (HIGH)

Severity: HIGH
Status: applied (cedee3c, 2026-05-02)
Location: interpreter.rs:2776-2781
Symptom: Uses & !0xF (16-byte alignment) then stores all 16 bytes of the vector. ISA semantics: word-align EA, extract the word lane (EA & 0xF) >> 2, store 4 bytes only. The non-128 stvewx (interpreter.rs:1675-1687) is correct — stvewx128 was not updated to match. Corrupts 12 adjacent bytes on every execution.
Canary reference: InstrEmit_stvewx_ (cc:170-185) — ea & ~3, extract lane, ByteSwap, store 4 bytes only. stvewx128 routes through the same helper as stvewx.
Fix: mirror the stvewx body with instr.vs128() substituted for instr.rs().

PPCBUG-511 — `stvx`, `stvx128`, `stvxl`, `stvxl128` missing `invalidate_for_write` (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Locations: interpreter.rs:1598-1603 (stvx), 1605-1610 (stvx128), 1699-1705 (stvxl/stvxl128)
Root cause: PPCBUG-107 (systemic)
Symptom: Under --parallel, a 16-byte stvx to a reserved line does not clear the reservation table slot. The reserving thread's stwcx. spuriously succeeds.
Fix: per PPCBUG-107 pattern — add invalidate_for_write(ea) guard before the byte loop.

PPCBUG-512 — `stvebx`, `stvehx`, `stvewx`, `stvewx128` missing `invalidate_for_write` (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Locations: interpreter.rs:1655 (stvebx), 1664 (stvehx), 1675 (stvewx), 2776 (stvewx128)
Root cause: PPCBUG-107 (systemic)
Note: stvewx128 must also fix PPCBUG-510 before adding the invalidation call (or the invalidation covers the wrong, over-wide address range).

PPCBUG-513 — `stvlx`, `stvlx128`, `stvlxl`, `stvlxl128` missing `invalidate_for_write` (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Locations: interpreter.rs:2746-2749 (stvlx/stvlxl), 2751-2754 (stvlx128/stvlxl128)
Root cause: PPCBUG-107 (systemic)
Note: partial stores can span a 128-byte line boundary when ea & 0xF != 0 and n = 16 - shift crosses the line; two invalidate_for_write calls may be needed.

PPCBUG-514 — `stvrx`, `stvrx128`, `stvrxl`, `stvrxl128` missing `invalidate_for_write` (HIGH)

Severity: HIGH
Status: applied (ca5b90b, 2026-05-01)
Locations: interpreter.rs:2756-2759 (stvrx/stvrxl), 2761-2764 (stvrx128/stvrxl128)
Root cause: PPCBUG-107 (systemic)
Note: stvrx at shift=0 is a no-op (no bytes written); guard can skip the call in that case. Otherwise invalidate ea & !0xF (the preceding aligned block).

PPCBUG-515 — `lvebx`, `lvehx`, `lvewx` implement element semantics; canary uses full-line load (MEDIUM)

Severity: MEDIUM
Status: open
Locations: interpreter.rs:1613-1653
Symptom: xenia-rs places the loaded byte/halfword/word into the correct lane and preserves other lanes from VD (ISA-correct for the "undefined" lanes). Canary does a full aligned 16-byte lvx-style load that overwrites all lanes. Both are valid under the ISA's "undefined" specification, but game code compiled against canary may observe the canary behavior. The divergence is documented and no code change is required unless canary compatibility becomes an explicit goal.

PPCBUG-516 — `lvsr` sh=0 produces {16,17,...,31}; correct per ISA but undocumented (MEDIUM)

Severity: MEDIUM (documentation gap — computation is correct)
Status: open
Location: interpreter.rs:2218-2226
Symptom: When EA is 16-byte aligned, lvsr produces byte values all >= 16 (the "select entirely from VB" identity for vperm). The formula (16 - sh) + i cannot overflow u8 because sh <= 15 guarantees (16 - sh) + 15 <= 31. No computation bug — but there is no comment explaining why values > 15 are correct. Add a comment and a debug_assert!(sh <= 15).

PPCBUG-517 — Zero test coverage for lvlx/lvrx/stvlx/stvrx boundary edge cases (LOW)

Severity: LOW (test gap)
Status: applied (P8 4029041, 2026-05-02)
Location: vmx.rs tests (lines 756-792); interpreter.rs test module
Missing: shift=15 for lvlx (1 byte loaded), shift=1 for lvrx (15 bytes), stvlx/stvrx round-trip, stvrx at shift=0 confirmed no-op, full lvlx+lvrx+vor unaligned memcpy idiom verified byte-exact.

PPCBUG-518 — Zero interpreter-level execution tests for all 36 VMX load/store opcodes (LOW)

Severity: LOW (test gap)
Status: open
Location: interpreter.rs test module
Missing: lvx alignment masking, stvx byte-order verification, lvebx lane placement, lvsl/lvsr permute index values, lvewx128 after PPCBUG-510 fix. 17 recommended minimum tests enumerated in per-group report.

PPCBUG-519 — `stvrx` aligned no-op is silent; no debug trace (LOW)

Severity: LOW
Status: open
Location: vmx.rs:284-292 (store_vector_right)
Symptom: shift=0 returns immediately with no trace event. Confusing during memory- visibility debugging. Add tracing::trace! in debug builds.

IDs PPCBUG-520 through PPCBUG-559 are unallocated — no further bugs found in group 38.

Phase C1 — Decoder field extractors

Per-group report: audit-out/phase-c1-decoder-fields.md.

Comprehensive audit of all DecodedInstr field accessors in decoder.rs lines 21-165, cross-checked against ISA form specs, Canary FormatXxx structs, and the interpreter's inline re-extraction. Phase B already found PPCBUG-040/046/275/315/360-363/420-422. Phase C1 adds 8 new findings (PPCBUG-560..567).

Confirmed-clean (no new finding): op, rd/rs/rt, ra, rb, rc, simm16, uimm16, d, ds, li, bd, bo, bi, aa, lk, oe, to, mb/me (M-form only), sh, spr, crm, crfd/crfs, l, crbd/crba/crbb, nb, va128/vb128/vd128/vs128, extract_vx128_uimm5.

PPCBUG-560 — sh64() test helper wrong bit order; masks PPCBUG-040 from unit tests (HIGH)

Severity: HIGH
Status: applied (52b05b1, 2026-05-01)
Location: xenia-rs/crates/xenia-cpu/tests/disasm_goldens.rs:160-176 (function rldicl)
Symptom: The rldicl test helper encodes sh[5:1] at PPC bits 16-20 and sh[0] at PPC bit 30. The ISA encodes sh[4:0] at PPC bits 16-20 and sh[5] at PPC bit 30. The wrong sh64() formula (sh_lo << 1) | sh_hi correctly inverts the wrong encoding, making the test pass — but fails on real binary code.

Counterexamples (ISA-encoded input → sh64() output):

True shift sh64() result Error

1 2 +1

16 32 +16

32 1 -31

33 3 -30

63 63 0 (coincidence)

Only sh=0 and sh=63 decode correctly. All other shifts (1-62) are wrong against real code.

True shift	sh64() result	Error
1	2	+1
16	32	+16
32	1	-31
33	3	-30
63	63	0 (coincidence)

Fix for sh64() (per PPCBUG-040):

pub fn sh64(&self) -> u32 {
    (extract_bits(self.raw, 30, 30) << 5) | extract_bits(self.raw, 16, 20)
}

Fix for test helper (must be in same commit):

// Correct: sh_lo = sh & 0x1F → PPC bits 16-20; sh_hi = sh >> 5 → PPC bit 30
(30 << 26) | (rs << 21) | (ra << 16) | ((sh & 0x1F) << 11)
    | (mb_lo << 6) | (mb_hi << 5) | (0 << 2) | ((sh >> 5) << 1) | rc

Cross-reference: PPCBUG-040 (primary finding). PPCBUG-560 is the test-infrastructure companion.

PPCBUG-561 — Missing `mb_md()` accessor on `DecodedInstr`; interpreter inlines wrong formula at 6 sites (MEDIUM)

Severity: MEDIUM
Status: applied (52b05b1, 2026-05-01)
Location: decoder.rs — accessor absent; disasm.rs:1256 has correct local helper; interpreter.rs lines 696, 706, 716, 726, 736, 746 each inline the wrong formula
Symptom: Interpreter uses (instr.mb() << 1) | ((instr.raw >> 1) & 1) which: (a) reads SH5 (PPC bit 30, host bit 1) instead of MB5 (PPC bit 26, host bit 5) as the high bit; (b) places the high bit at position 0 instead of position 5. disasm.rs has the correct version already — expose it as DecodedInstr::mb_md().
Cross-reference: PPCBUG-046 (primary finding).

Fix:

// Add to decoder.rs:
#[inline] pub fn mb_md(&self) -> u32 {
    extract_bits(self.raw, 21, 25) | (extract_bits(self.raw, 26, 26) << 5)
}

Replace all 6 inline sites in interpreter.rs with instr.mb_md().

PPCBUG-562 — Missing `vc_rc_bit()` and `vx128r_rc_bit()` per-form Rc accessors (MEDIUM)

Severity: MEDIUM
Status: applied (52b05b1, 2026-05-01)
Location: decoder.rs — no per-form Rc accessors; interpreter.rs uses generic rc_bit() (bit 31) for both VC and VX128_R forms
Symptom: Generic rc_bit() reads PPC bit 31 (LSB). VC-form Rc is at PPC bit 21 = (raw >> 10) & 1. VX128_R-form Rc is at PPC bit 27 = (raw >> 4) & 1. Using bit 31 for these forms means the CR6 update gate is permanently disabled for all dot-form VMX vector compares — root cause of PPCBUG-275/420/421/422.

Fix:

/// Rc for VC-form vector compare (vcmpeqfp, vcmpgefp, vcmpgtfp, vcmpbfp, etc.) — PPC bit 21.
#[inline] pub fn vc_rc_bit(&self) -> bool { extract_bits(self.raw, 21, 21) != 0 }
/// Rc for VX128_R-form compare (vcmpeqfp128, vcmpgefp128, etc.) — PPC bit 27.
#[inline] pub fn vx128r_rc_bit(&self) -> bool { extract_bits(self.raw, 27, 27) != 0 }

Cross-reference: PPCBUG-275 / PPCBUG-420 / PPCBUG-421 / PPCBUG-422.

PPCBUG-563 — Missing `vx128_4_z()` and `vx128_4_imm()` for VX128_4 form (MEDIUM)

Severity: MEDIUM
Status: applied (52b05b1, 2026-05-01)
Location: decoder.rs — accessors absent; interpreter.rs:3551-3552 (vrlimi128) reads wrong bit positions
Symptom: VX128_4 form has IMM (5-bit) at PPC bits 11-15 (host bits 16-20) and z (2-bit) at PPC bits 24-25 (host bits 6-7). Interpreter vrlimi128 uses (raw >> 16) & 0x3 for shift (reads VB128l partial) and (raw >> 2) & 0xF for mask (reads VD128h region).

Fix:

#[inline] pub fn vx128_4_imm(&self) -> u32 { extract_bits(self.raw, 11, 15) }
#[inline] pub fn vx128_4_z(&self) -> u32 { extract_bits(self.raw, 24, 25) }

Cross-reference: PPCBUG-315.

PPCBUG-564 — Missing `vx128_p_perm()` for VX128_P form; PERMh reads XO bits (MEDIUM)

Severity: MEDIUM
Status: applied (52b05b1, 2026-05-01)
Location: decoder.rs — accessor absent; interpreter.rs:4089 (vpermwi128) uses (raw >> 16) & 0xFF which reads PERMl (correct) but uses XO/reserved bits 21-23 for PERMh instead of PPC bits 23-25
Symptom: Top 3 bits of the 8-bit PERM selector are wrong for every vpermwi128 instruction. Lane selections for words 0 and 1 are garbage.

Fix:

#[inline] pub fn vx128_p_perm(&self) -> u32 {
    extract_bits(self.raw, 11, 15) | (extract_bits(self.raw, 23, 25) << 5)
}

Cross-reference: PPCBUG-362.

PPCBUG-565 — Missing `vx128_5_sh()` for VX128_5 form; vsldoi128 MSB reads reserved bit (MEDIUM)

Severity: MEDIUM
Status: applied (52b05b1, 2026-05-01)
Location: decoder.rs — accessor absent; interpreter.rs:2012 (vsldoi128) uses (raw >> 4) & 0x1 for the shift MSB (reads PPC bit 27 = reserved) instead of PPC bit 22 = host bit 9 = (raw >> 9) & 1
Symptom: vsldoi128 shift amounts ≥ 8 (where the 4th bit matters) use a garbage bit. The correct 4-bit SH is at PPC bits 22-25 (host bits 6-9) = (raw >> 6) & 0xF.

Fix:

#[inline] pub fn vx128_5_sh(&self) -> u32 { extract_bits(self.raw, 22, 25) }

Cross-reference: PPCBUG-361.

PPCBUG-566 — Missing XER TBC field accessor documentation for lswx/stswx (LOW)

Severity: LOW
Status: applied (P6 112202c, 2026-05-02)
Location: decoder.rs — XER[25:31] (7-bit transfer byte count) is runtime state, not an instruction field; no accessor exists and no documentation notes the gap
Symptom: lswx/stswx use XER[25:31] as their byte count. The interpreter has no way to read this via the normal accessor pattern. Not a bit-position error, but a structural gap.
Recommendation: add ctx.xer_tbc() -> u8 to PpcContext returning (ctx.xer() >> 25) & 0x7F. Document that these are the only instructions that read XER as a count operand.

PPCBUG-567 — Zero unit tests pin any scalar field accessor (LOW)

Severity: LOW (test gap)
Status: open
Location: decoder.rs unit tests; tests/disasm_goldens.rs
Symptom: Phase 4 tests pin va128/vb128/vd128/vs128 only. No test verifies: sh64() against ISA-encoded instructions (existing test validates wrong round-trip — PPCBUG-560), mb_md() (absent), vc_rc_bit()/vx128r_rc_bit() (absent), ds() for negative displacement, spr() for LR/CTR/XER beyond DEC.
Recommended additions: decoder-level unit tests using ISA-correct encodings for sh64, mb_md, the two new Rc accessors, ds negative, spr for LR=8 and CTR=9. See phase-c1-decoder-fields.md for concrete encoding examples.

IDs PPCBUG-568 through PPCBUG-599 are unallocated — no further bugs found in Phase C1 scope.

Phase C2 — Decoder opcode-lookup tables

Per-group report: audit-out/phase-c2-decoder-lookup.md.

Methodology: complete line-by-line comparison of all decode_opNN functions in xenia-rs/crates/xenia-cpu/src/decoder.rs against xenia-canary/src/xenia/cpu/ppc/ppc_opcode_lookup_gen.cc, plus cross-reference of ppc-manual/forms/ for VC, VX128_R, VX128_5, VA, VX128_3, VX128_4 forms.

Overall verdict: the decoder is structurally sound and entry-by-entry matches Canary for all real Xbox 360 instructions, with one pre-known exception (PPCBUG-600 = PPCBUG-423). Zero new wrong-entry bugs. One new medium-severity cross-reference bug (dot-form gap), one medium maintainability risk (key-ordering dependency), three LOWs (test gaps, reserved-encoding misidentification, undocumented fast-path).

PPCBUG-600 — `decode_op6` key4: VMX128 compare dot-forms decode as Invalid (MEDIUM)

Severity: MEDIUM (cross-reference for PPCBUG-423; same root cause, Phase C2 ID)
Status: applied (52b05b1, 2026-05-01) (dup-of:423 for the fix; this ID is for Phase C2 tracking)
Location: decoder.rs:640-648 (decode_op6, key4 match table)
Symptom: The VX128_R form places Rc at PPC bit 27. The key4 formula is (bits 22-24 << 3) | bit27. When Rc=1 (dot-form), bit27=1 and key4 is odd. Only even key4 values are in the table. Five dot-form encodings fall through to PpcOpcode::Invalid:
- vcmpeqfp128. → key4=0b000001 (1), decodes as Invalid
- vcmpgefp128. → key4=0b001001 (9), decodes as Invalid
- vcmpgtfp128. → key4=0b010001 (17), decodes as Invalid
- vcmpbfp128. → key4=0b011001 (25), decodes as Invalid
- vcmpequw128. → key4=0b100001 (33), decodes as Invalid
Contrast: standard VMX VC-form compares (op=4 key3) are correct because their Rc bit (bit21) is outside the key3 window (bits 22-31). VMX128_R uses a different form where Rc is at bit27, which is inside the key4 window.

Fix: Add 5 dot-form entries to key4 in decode_op6:

0b000001 => return PpcOpcode::vcmpeqfp128,
0b001001 => return PpcOpcode::vcmpgefp128,
0b010001 => return PpcOpcode::vcmpgtfp128,
0b011001 => return PpcOpcode::vcmpbfp128,
0b100001 => return PpcOpcode::vcmpequw128,

The interpreter's existing instr.rc_bit() check already handles CR6 update for dot-forms — decoder just needs to emit the right opcode.

See also: PPCBUG-423 (Phase B original finding) for impact assessment and full context.

PPCBUG-601 — `decode_op6` key ordering creates undocumented correctness dependency (MEDIUM)

Severity: MEDIUM (maintainability risk; no current wrong-decode for real code)
Status: open
Location: decoder.rs:603-637 (decode_op6, key1/key2/key3 dispatch)
Symptom: key1 (bits 21-22 << 5 | bits 26-27), key2 (bits 21-23 << 4 | bits 26-27), and key3 (bits 21-27) all overlap. Correctness depends on an implicit invariant: vpkd3d128 and vrlimi128 (matched by key2) always have bits 26-27 = 01, while all 15 key3 unary entries always have bits 26-27 = 11. If a future instruction were added to key2 with bits 26-27 = 11, it would shadow a key3 entry. No comment in the source documents this constraint.

Fix: Add a comment block above the key2/key3 dispatches explaining the invariant:

// key2 matches bits 26-27 == 01 only (vpkd3d128, vrlimi128).
// key3 entries all have bits 26-27 == 11. No overlap is possible
// for any currently-defined Xbox 360 instruction.

PPCBUG-602 — `decode_op4` vsldoi128 fallback: over-broad single-bit catch-all (LOW)

Severity: LOW (only fires for reserved/undefined encodings in practice)
Status: open
Location: decoder.rs:558-561
Symptom: The VX128_5 form for vsldoi128 is identified by op=4, bit27=1. The dispatch uses a bare if extract_bits(code, 27, 27) == 1 after the other tables, rather than an exact VX128_5-form check. Reserved VA extended opcodes that happen to have their key4 bit4 (= word bit27) set decode as vsldoi128 instead of Invalid. Example: VA XO=0b100011 (35, reserved gap between vmladduhm=34 and vmsumubm=36) — key4 misses, bit27=1 fires → decoded as vsldoi128. ISA specifies reserved encodings should trap; this silently assigns a meaning.

Fix (optional): Strengthen to an exact match:

// VX128_5 form: SH@22-25, VA128h@26, XO=bit27. Bits 28-31 carry VD128h/VB128h.
// Only vsldoi128 uses this form. Verify the XO bit and absence of load/store marker.
if extract_bits(code, 27, 27) == 1 && extract_bits(code, 30, 31) != 0b11 {
    return PpcOpcode::vsldoi128;
}

Alternatively, accept current behavior and add a comment.

PPCBUG-603 — Primary opcode 9 maps to Invalid; correct but undocumented (LOW)

Severity: LOW (test gap / documentation only)
Status: open
Location: decoder.rs:369 (the _ => PpcOpcode::Invalid arm of lookup_opcode)
Symptom: Primary opcode 9 (dozi in original POWER ISA) is undefined on Xenon/750CL and correctly decodes as Invalid. Canary also returns PPC_DECODER_MISS. No comment documents this intentional absence.
Fix: Add // 9 = dozi (POWER-only, not present on Xenon) comment near the match, or explicitly add 9 => PpcOpcode::Invalid with a comment.

PPCBUG-604 — Zero decoder unit tests for decode_op5, decode_op6, decode_op30, decode_op63 (LOW)

Severity: LOW (test gap)
Status: open
Location: decoder.rs:897-1107 (test module)
Symptom: The 10 existing decoder tests cover addi, lwz, branch, stw, ori, and cache mechanics. None exercise VMX128 (op=5, op=6), rotate-doubleword (op=30), or FPU (op=63) opcode paths. In particular, no test would have caught PPCBUG-600 (vcmpeqfp128 dot-form decodes as Invalid) before it caused a runtime trap.
Recommended minimum additions (8 tests):
1. vcmpeqfp128 (Rc=0) → decodes as vcmpeqfp128.
2. vcmpeqfp128. (Rc=1) → decodes as vcmpeqfp128 (tests PPCBUG-600 fix).
3. vcmpeqfp (op=4, Rc=0) → key3 check, bit21=0.
4. vcmpeqfp. (op=4, Rc=1) → key3 check, bit21=1, same decode.
5. vsldoi128 (op=4, bit27=1) → fallback fires.
6. rldicl (op=30) → decode_op30.
7. fadd (op=63, Rc=0) → arithmetic table.
8. fadd. (op=63, Rc=1) → same decode as fadd.

PPCBUG-605 — `decode_op31` sradix fast-path is correct but undocumented (LOW)

Severity: LOW (documentation gap only)
Status: open
Location: decoder.rs:702-705
Symptom: The sradix pre-check uses bits 21-29 (9 bits). The subsequent main table uses bits 21-30 (10 bits). Because no main-table entry has bits 21-29 = 0b110011101, the fast-path cannot shadow a legitimate main-table entry. However, this is not documented in the source, and a reader might worry that sradix (Rc=0, bits 21-30 = 0b1100111010) or sradix. (Rc=1, same bits 21-30) could conflict with a future entry at key 0b1100111010.
Fix: Add a comment: // sradix: XS-form, XO=413 (bits 21-29=0b110011101). // No main-table entry uses bits 21-30 starting with 0b110011101x.

IDs PPCBUG-606 through PPCBUG-639 are unallocated — no further bugs found in Phase C2.

Phase C3 — Disassembler formatter parity

Per-group report: audit-out/phase-c3-disasm.md.

Methodology: Full line-by-line audit of disasm.rs:format() and all ~70 per-class helpers. Cross-referenced against xenia-canary/src/xenia/cpu/ppc/ppc_opcode_disasm_gen.cc, tests/golden/extended_mnemonics.json, and tests/golden/base_mnemonics.json. Checked: mnemonic correctness (Rc/OE/LK/AA/L-field), operand formatting (signed vs unsigned, hex vs decimal), simplified-mnemonic priority, branch-condition extended forms, VMX register naming, VX128 field extraction, and golden test coverage.

Overall verdict: The formatter is structurally sound. All OE/Rc/LK/AA suffix handling, the simplified mnemonic priority order, VMX 5-bit and VMX128 7-bit register naming, SPR mnemonics, and CR-logical extended forms are correct. Two HIGH bugs found: the bdnz/bdz extended mnemonic appends a spurious condition suffix, and the pre-existing sync/lwsync bug (PPCBUG-088) is re-assessed as HIGH in disassembler scope. Two MEDIUM bugs: decimal vs hex for SIMM immediates and D-form displacements (diverges from every real PPC disassembler). Several LOW findings for golden fixture correctness and edge cases.

Key finding: the disassembler's VX128 field extraction (vperm128 VC, vsldoi128 SH, vpermwi128 PERM) is CORRECT in all three cases where the interpreter (PPCBUG-360/361/362) has the wrong extraction. The disassembler was written independently and got them right.

PPCBUG-640 — `fmt_bc`: pure `bdnz`/`bdz` emits `bdnzge`/`bdzge` (spurious condition suffix) (HIGH)

Severity: HIGH
Status: applied (d4f6ea7, 2026-05-02)
Location: disasm.rs:829-834
Symptom: For bcx with BO=16 (bdnz: decrement CTR, branch if CTR≠0, CR ignored):
- decr = (16 & 4) == 0 = true
- uncond = (16 & 16) != 0 = true
- Code falls into the if decr branch and computes cond_name_opt from (cr_bit=0, cond_true=false) → Some("ge")
- Emits: bdnzge — WRONG. ISA simplified form is bdnz.
For BO=18 (bdz): same path → bdzge — WRONG.

The bug is absent in fmt_bclr which has an explicit if decr && uncond guard at line 872 producing bdnzlr/bdzlr correctly. fmt_bc lacks this guard.

The golden fixture "bdnz 0x82000040" (PPCBUG-650 companion) pins the wrong output.

Fix: In fmt_bc, inside the if decr block, gate the condition string on !uncond:

if decr {
    let z = if bo & 0x02 != 0 { "z" } else { "nz" };
    let cond_str = if uncond { "" } else { cond_name_opt.unwrap_or("") };
    let ext_mnem = format!("bd{z}{cond_str}{a}{l}");
    let ext_ops = format!("{cr}0x{target:08X}");
    with_ext(&base_mnem, base_ops, 8, &ext_mnem, ext_ops, 8)
}

Also update golden fixtures PPCBUG-650.

Impact: All analysis-DB queries for bdnz loops (common in pixel-shader and vertex processing loops) return zero rows; they are stored as bdnzge. Developers inspecting loop structures see a misleading condition name on a CTR-only branch.

PPCBUG-641 — `sync` emits `"sync"` for `lwsync` (L=1) — re-assessment of PPCBUG-088 (HIGH)

Severity: HIGH (disassembler scope; PPCBUG-088 was LOW for interpreter scope)
Status: open (see PPCBUG-088 for fix)
Location: disasm.rs:364
Symptom: PpcOpcode::sync always emits "sync". The L-field at PPC bit 10 selects lwsync (L=1, encoding 0x7C2004AC). lwsync is the acquire barrier in every Xbox 360 spinlock. Every lwsync in the disassembly DB is stored as mnemonic='sync'. SELECT * WHERE mnemonic='lwsync' returns zero rows regardless of binary content.
Note: the golden fixture for lwsync (PPCBUG-649) currently pins the wrong output.

PPCBUG-642 — `fmt_bcctr` missing extended form for CTR-decrement/ignore-CR BO values (MEDIUM)

Severity: MEDIUM
Status: open
Location: disasm.rs:880-902
Symptom: bcctrx with BO=16 (decrement CTR, ignore CR) falls through to base() with no extended form. fmt_bclr (the equivalent for bclrx) correctly handles the same case with an explicit decr && uncond check at line 872, producing bdnzlr. Note: bcctr with CTR-decrement is undefined by PowerISA; this encoding should never appear in valid compiled code. The inconsistency is a maintenance concern rather than a runtime bug.
Fix: Add a decr && uncond check before the cond_branch_ext call in fmt_bcctr, mirroring lines 872-876 in fmt_bclr. Or add a comment explaining the ISA undefined status.

PPCBUG-643 — SIMM immediate display: decimal diverges from Canary and real disassemblers (MEDIUM)

Severity: MEDIUM
Status: open
Location: disasm.rs:946 (addi), 976 (addic), 989 (subfic), 990 (mulli), 1003 (cmpi), 1048-1061 (fmt_ld/fmt_st), and all similar SIMM sites
Symptom: SIMM immediates are formatted via Rust's {imm} (decimal). Canary uses "-0x{:X}" / "0x{:X}" (signed hex) for every SIMM field. GNU objdump, IDA Pro, and all standard PPC disassemblers use hex. The inconsistency is internal to xenia-rs: addis/oris/xoris use hex (0x{imm_u:X}), but addi/addic/mulli use decimal. This misleads analysis-DB queries that mix instructions (e.g. addi r3, r1, -4 vs addis r3, r0, 0x8000).
Impact: Medium — the output is not wrong (the value is correctly computed), but cross-referencing with Canary output or objdump requires manual conversion.

PPCBUG-644 — D-form load/store displacement uses decimal instead of hex (MEDIUM)

Severity: MEDIUM
Status: open
Location: disasm.rs:1053 (fmt_ld), 1061 (fmt_st), 1069 (fmt_ds)
Symptom: format!("{rn}, {d}({})", gpr(ra)) outputs decimal for the displacement. Canary outputs "-0x8(r1)" not "-8(r1)". Every standard PPC disassembler uses hex. Affects 25+ D-form and DS-form opcodes. Negative displacements (-8, -16, etc.) are especially confusing in decimal when reading stack frame accesses.

Fix:

let d_str = if d < 0 { format!("-0x{:X}", -d) } else { format!("0x{:X}", d) };
base(mnem, format!("{rn}, {d_str}({})", gpr(ra)), 8)

Update all golden fixture rows with displacement values.

PPCBUG-645 — `cntlzdx` Rc suffix: moot for valid encodings, but WONTFIX (LOW)

Severity: LOW
Status: wontfix
Location: disasm.rs:286
Note: fmt_x_unary_rc would emit cntlzd. for Rc=1, but valid cntlzd encodings always have Rc=0. Canary emits cntlzd always. No impact for valid code.

PPCBUG-646 — `fmt_rlwimi` inslwi/insrwi priority overlap: confirmed correct (LOW)

Severity: LOW
Status: wontfix
Note: After careful analysis, the inslwi guard excludes insrwi overlap cases (sh != 31u32.wrapping_sub(me)). Priority is correct. Informational only.

PPCBUG-647 — `fmt_rlwinm` `extrwi` uses `wrapping_sub` which can give misleading results for invalid encodings (LOW)

Severity: LOW
Status: open
Location: disasm.rs:1137
Symptom: let b = sh.wrapping_sub(n) % 32; — for invalid sh < n encodings, wrapping_sub gives a large u32, % 32 gives a confusing value. For all compiler-emitted encodings sh >= n holds. Add && sh >= 32 - mb to the guard to avoid the fallthrough.

PPCBUG-648 — `fmt_mftb` TBR=268: ext mnemonic identical to base mnemonic (LOW)

Severity: LOW
Status: open
Location: disasm.rs:1443
Symptom: 268 => with_ext("mftb", base_ops, 8, "mftb", gpr(rd), 8) — base is mftb, extended is also mftb. display() picks the extended form (omitting the 268 operand), making it ambiguous vs. mftbu. Consider: either emit base-only (mftb r3, 268) or rename the base to mftb.raw for disambiguation.

PPCBUG-649 — Golden fixture for `lwsync` pins wrong output (no ext_mnemonic) (LOW)

Severity: LOW (test coverage gap)
Status: applied (2be25bd, 2026-05-02)
Location: tests/golden/extended_mnemonics.json, entry "lwsync"
Symptom: Fixture has mnemonic: "sync" and no ext_mnemonic. After PPCBUG-088/641 fix, expected output is mnemonic: "sync", ext_mnemonic: "lwsync". Current fixture defeats regression detection — the test passes with wrong output.

PPCBUG-650 — Golden fixtures for `bdnz`/`bdz` pin wrong extended mnemonic (LOW)

Severity: LOW (companion to PPCBUG-640)
Status: applied (d4f6ea7, 2026-05-02)
Location: tests/golden/extended_mnemonics.json, rows "bdnz 0x82000040" and "bdz 0x82000040"
Symptom: Both rows have ext_mnemonic: "bdnzge" and ext_mnemonic: "bdzge". After PPCBUG-640 fix, correct values are "bdnz" and "bdz".

PPCBUG-651 — `fmt_vmx128_pack_d3d` shared by `vpkd3d128` and `vrlimi128`: confirmed correct (LOW)

Severity: LOW
Status: wontfix
Note: Both opcodes use VX128_4 form. Shared formatter outputs identical operand lists (vd, vb, imm, z) which is correct for both. Informational only.

PPCBUG-652 — Zero golden fixtures for any VMX128 opcode disassembly (LOW)

Severity: LOW (test coverage gap)
Status: open
Location: tests/golden/ — all three JSON files
Symptom: No fixture pins the formatted output of any VMX128 instruction. Regressions in VMX128 field extraction (e.g. a re-introduction of PPCBUG-360/361/362 in the disassembler) would be invisible. Recommend adding at minimum: vaddfp128, vperm128, vsldoi128, vpkd3d128, vcmpeqfp128., vmaddfp128.

PPCBUG-653 — `fmt_trap_imm` unconditional trap extended form: confirmed not-a-bug (LOW)

Severity: LOW
Status: wontfix
Note: twi 31, rA, IMM (to=31) has no ISA simplified mnemonic unless RA=0 and IMM=0 (which matches tw 31, r0, r0 = trap). The fmt_trap_imm correctly emits base-only for twi 31, rA, N. Informational.

PPCBUG-654 — `fmt_rldimi` `insrdi` guard excludes valid `mb=0` (b=0) case (LOW)

Severity: LOW
Status: open
Location: disasm.rs:1220
Symptom: Guard if mb > 0 excludes insrdi rA, rS, n, 0 (b=0 → mb=0). A valid compiler-emitted rldimi with sh+mb+n=64 and mb=0 falls through to base form instead of displaying the insrdi simplified mnemonic.
Fix: Remove the mb > 0 guard; the inner n > 0 guard is sufficient to avoid degenerate cases.

IDs PPCBUG-655 through PPCBUG-679 are unallocated — no further bugs found in Phase C3.

Phase C4 — Post-merge audit corrections (2026-05-02)

PPCBUG-700 — VMX128 register accessors disagreed with canary's bitfield layout (HIGH)

Severity: HIGH (silent mis-decoding of any VMX128 instruction with a register >= 32)
Status: applied
Locations: decoder.rs:138-160 (va128/vb128/vd128), decoder.rs:80 (vx128r_rc_bit)
Discovery: independent reviewer of the P3 phase merge, comparing our rust accessors against canary's FormatVX128/VX128_2/VX128_4/VX128_5/VX128_R bitfield struct in xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663.
Symptom: this entry contradicts the audit's own line 2958 ("confirmed-clean") assessment. The previous audit miscounted bit-field offsets — under x86_64 LSB-first C++ bitfield packing, the canary fields land at:
- VA128 = VA128l(5) | VA128h(1)<<5 | VA128H(1)<<6 = PPC[11-15] | PPC[26]<<5 | PPC[21]<<6 (3 fields, 7 bits)
- VB128 = VB128l(5) | VB128h(2)<<5 = PPC[16-20] | PPC[30-31]<<5 (2 fields, 7 bits)
- VD128 = VD128l(5) | VD128h(2)<<5 = PPC[6-10] | PPC[28-29]<<5 (2 fields, 7 bits)
- Rc (VX128_R only) = PPC[25] (host bit 6) — not PPC[27] as PPCBUG-422/562 prescribed. Rust code instead used va128: PPC[11-15] | PPC[29]<<5 (one bit, wrong position); vb128: PPC[16-20] | PPC[28]<<5 | PPC[30]<<6 (wrong positions); vd128: PPC[6-10] | PPC[21]<<5 | PPC[22]<<6 (wrong positions); vx128r_rc_bit at PPC[27].
Why it lurked: the buggy convention was internally consistent with hand-crafted test fixtures (which set bit 29 / 21 / 22 to encode "high" registers, matching the buggy accessor). Real Xbox 360 game code follows canary's convention, so any production encoding with VR >= 32 was silently mis-decoded — but no unit test exercised that path.
Fix: rewrite the four accessors to canary's bit positions; rewrite the vmx128_test_word helper and unit tests; re-encode the goldens for vmaddfp128/ vmaddcfp128/vnmsubfp128/vperm128/vsrw128/vpermwi128/vrlimi128. Drop the speculative key4_dt dot-form dispatch in decode_op6 (canary has no separate dot-form opcodes for VX128_R compute ops; Rc is a runtime modifier). Update encode_vpkd3d128 test helper for canary's VD128h placement.
Cross-reference: invalidates the audit's confirmed-clean note at line 2958. Subsumes the partial fix-shape proposed in PPCBUG-422 (Rc-bit position).

May 2026 Comprehensive Audit (extends prior PPCBUG namespace)

Started: 2026-05-02. Charter: audit-2026-05-charter.md. Severity: P0 blocker / P1 wrong-result / P2 spec drift / P3 cosmetic.

ORACBUG (M01 — oracles and goldens)

Per-milestone report: audit-out/m01-oracles.md.

ORACBUG-001 — base_mnemonics.json self-derived circular

Severity: P1
Status: open
Location: crates/xenia-cpu/tests/disasm_goldens.rs:70-88 (build_rows); fixture crates/xenia-cpu/tests/golden/base_mnemonics.json
Symptom: every "expected" mnemonic/operands/etc. is captured from xenia_cpu::disasm::format() at golden-creation time and frozen. The frozen JSON is asserted against future runs of the same function. Detects regression-from-snapshot, not absolute correctness. Human-readable label field is never asserted.
Recommendation: add canary-disasm differential (see M02) and POWERISA-derived parallel oracle for ~20 representative cases.

ORACBUG-002 — extended_mnemonics.json self-derived circular

Severity: P1
Status: open
Location: crates/xenia-cpu/tests/golden/extended_mnemonics.json (623 rows)
Symptom: same as ORACBUG-001, with extra risk: extended mnemonic emission is decision-tree output (li, lis, mr, not, slwi, srwi, clrldi, blr, bctr, beq/bne, lwsync, …). A bug in the canonicalization decision tree is not caught.

ORACBUG-003 — vmx128_registers.json self-derived + hand-coded raw bytes

Severity: P1
Status: open
Location: crates/xenia-cpu/tests/disasm_goldens.rs:421-527
Symptom: same circularity, plus 4-operand multiply-add cases (lines 513-519) bypass encoding helpers and use HARD-CODED u32 literals (0x146328F0, 0x14632930, 0x14632970). PPCBUG-700 demonstrated this risk: the prior buggy convention was internally self-consistent in fixtures and lurked until a manual canary cross-check.

ORACBUG-004 — sylpheed_n2m.json structurally insufficient

Severity: P0
Status: open
Location: crates/xenia-app/tests/golden/sylpheed_n2m.json
Symptom: at -n 2M instructions all rendering metrics are 0 (packets/draws/swaps/resolves/render-targets/textures). Sylpheed's first VdSwap fires at ~18M cycles. The golden cannot detect 11 of 14 digest fields by construction.
Risk: this is the only end-to-end Sylpheed regression catcher in the workspace. Future fixes optimized to pass this gate are optimized against a blind oracle.
Recommendation: add sylpheed_n50m.json (CI-feasible, captures VdSwap=1) and sylpheed_n4b.json (matches canonical reference invocation; commit-time gate).

ORACBUG-005 — db_schema_golden.rs synthetic PE missing direct-branch coverage

Severity: P3
Status: open
Location: crates/xenia-analysis/tests/db_schema_golden.rs:23-53
Symptom: the synthetic PE has 4 instructions (mflr/nop/blr/nop). Direct-branch path of the DB writer (target_hex column population) is never exercised; only the indirect-only path is. Schema columns are correctly locked but coverage is thin.

ORACBUG-006 — RunDigest missing high-leverage fields

Severity: P2
Status: open
Location: crates/xenia-app/src/main.rs:1267-1306 (RunDigest struct + capture)
Symptom: digest exposes 14 fields, missing several high-signal counters that already exist in the system: unique_pcs_executed, kernel_calls_per_export histogram, mmio_reads/writes, scheduler.deadlock_recoveries, scheduler.deadlock_halts, events_signaled, events_waited, events_with_zero_signals, lwarx_count, stwcx_success_count, stwcx_fail_count.
Risk: M11's run-matrix can only diff coarse counters. Several "is the renderer chain alive?" probes are not captured.

ORACBUG-007 — analysis-shim parity test inherits CIRCULAR provenance

Severity: P2
Status: open
Location: crates/xenia-analysis/tests/disasm_goldens.rs:50-89 (check_fixture)
Symptom: test does (a) shim-vs-cpu parity (good — catches drift) and (b) cpu-vs-fixture (inherits circularity from ORACBUG-001/002/003). The primary purpose (parity) is sound; only the secondary assertion is suspect.

ORACBUG-008 — encode_vx128 helper lacks canary citation

Severity: P3
Status: open
Location: crates/xenia-cpu/tests/disasm_goldens.rs:53-68
Symptom: the encode helper currently encodes per canary's VX128 layout (post-PPCBUG-700) but lacks a comment block citing canary's xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663. A future "simplification" without canary cross-check could silently regress to the prior buggy convention.

PPCBUG (M05 — scheduler + reservation + block_cache)

PPCBUG-701 — Reservation generation 24-bit ring: false-match risk under long-delay paths (P3, latent)

Severity: P3
Status: open
Location: crates/xenia-cpu/src/reservation.rs:67-83 (pack), :188-191 (next_gen mask)
Symptom: next_gen is masked to 24 bits when packed (& 0xFF_FFFF). After 16,777,216 reservations, the generation wraps. If thread A's lwarx and its paired stwcx. are separated by ≥16M peer reservations on the same bank slot, and the bank still holds A's (line, gen) at commit time, try_commit will incorrectly succeed.
Risk: very low under realistic workloads (reservation count between an lwarx-stwcx pair is typically <100, and same-bank displacement bumps gen regardless). Not observable on Sylpheed.
Recommendation: defer until empirical evidence shows wraparound. If pursued, widen gen to 32 bits by stealing the line-address-low bits (low 7 bits of line are always zero — recoverable via masking).
Canary: canary's bitmap model has the equivalent bit-aliasing risk at RESERVE_BLOCK_SHIFT granularity but no time-domain wrap.

PPCBUG-702 — `invalidate_for_write` doc says collisions invalidate; code says they don't (P3, doc drift)

Severity: P3
Status: open
Location: crates/xenia-cpu/src/reservation.rs:38-46 (doc) and :235-256 (code)
Symptom: the file-level doc invariant 2 says "any plain store to a reserved line invalidates it (slot CASed to zero). Hash-collision side-effect: a store to a different line that maps to the same bank also invalidates" — but the actual code at :248-256 explicitly returns early when bank_line != line, leaving the reservation alone. The code is more correct (fewer spurious failures), but the doc contradicts it.
Recommendation: update the file doc to describe the "tag-checked invalidation" actually implemented. No code change needed.

PPCBUG-703 — `--parallel` is non-deterministic; `XENIA_SCHED_SEED` does not regulate it (P3, doc gap)

Severity: P3
Status: open
Location: crates/xenia-cpu/src/scheduler.rs:232-249, :710-734; crates/xenia-app/src/main.rs:2230-2415
Symptom: --parallel workers race for the kernel mutex within each round; observable interleavings depend on host OS scheduling, not on XENIA_SCHED_SEED. The seed regulates ONLY the per-round slot-list shuffle, which has no effect under --parallel since workers race for the lock independently. Same-seed-same-input runs under --parallel produce different observable schedules.
Risk: M11's bisection cannot reliably reproduce an observed regression under --parallel; lockstep must be used for bisection.
Recommendation: document the determinism boundary in CLI help text. If true determinism is needed under --parallel, the kernel-mutex acquisition order must be re-introduced as a coordinator-driven sequence (a regression of the M3 perf goal).

PPCBUG-704 — `icbi` is a no-op; correctness depends on `bump_page_version` from data-store path (P3, latent)

Severity: P3
Status: open
Location: crates/xenia-cpu/src/interpreter.rs:1697-1701; crates/xenia-cpu/src/block_cache.rs:142-178
Symptom: icbi (instruction cache block invalidate) is collapsed into the cache/sync no-op arm. Self-modifying code is currently caught only because every write_u8/16/32/64 in xenia-memory/src/heap.rs unconditionally calls bump_page_version. If a future optimization makes bump_page_version conditional (e.g., distinguish data vs code pages, or skip bumping for non-instruction-page writes), icbi will need to actively bump the cache line.
Risk: latent; no current SMC failure observed.
Recommendation: add a comment in the cache/sync arm pointing at the implicit invariant: "icbi is correct because every store bumps page_version; if that changes, icbi must bump explicitly". Cross-references M06 memory invariants.

PPCBUG-705 — Phaser `phase: AtomicU32` wrap at 4 B rounds (P3, latent)

Severity: P3
Status: open
Location: crates/xenia-cpu/src/phaser.rs:64, :128, :172
Symptom: phase is AtomicU32 and fetch_add(1, Release). After 4,294,967,296 rounds the counter wraps. Wait-loop predicate phase != pre_phase is false at exact wraparound on a stalled arriver — appears as a missed wake at exact 2^32 round count.
Risk: at xenia-rs's actual round rate (~10^4 rounds/sec) this requires ~5 days of continuous runtime. Not realistic.
Recommendation: widen to AtomicU64 next time the phaser API is touched. No urgency.

PPCBUG (M02 — decoder/disasm)

PPCBUG-706 — Tracker drift; PPCBUG-088/641 (sync/lwsync) shown as open but disasm fix at crates/xenia-cpu/src/disasm.rs:364-372 is already applied. P3 (tracker hygiene). Recommendation: flip both to applied. See audit-out/m02-decoder-disasm.md.
PPCBUG-707 — Disasm column-pad width inconsistent across opcode families (8/9/10/11/12/14) and divergent from canary's single kNamePad=11 (xenia-canary/src/xenia/cpu/ppc/ppc_opcode_disasm.h:22). P3 cosmetic; ~150 call sites in disasm.rs. Affects every textual diff with canary. See audit-out/m02-decoder-disasm.md.
PPCBUG-708 — fmt_bc/fmt_bclr/fmt_bcctr base form uses CR-bit names (crb()) for BI; canary emits raw BI integer (ppc_opcode_disasm_gen.cc:158-186). Extended forms unaffected. P3 cosmetic; 3 lines to change. See audit-out/m02-decoder-disasm.md.
PPCBUG-709 — mfspr/mtspr/mftb base form emits symbolic SPR name (LR/CTR); canary emits raw SPR integer (ppc_opcode_disasm_gen.cc:1601-1602). Extended forms (mflr/mtctr/etc.) unaffected. P3 cosmetic. See audit-out/m02-decoder-disasm.md.
PPCBUG-710 — decoder.rs:79 has a stale doc-comment claiming vx128r_rc_bit reads PPC bit 27 (host bit 4); the immediately following line 80-82 correctly says PPC bit 25 (host bit 6). Code is correct; comment contradicts itself. P3 doc hazard. Recommendation: delete line 79.
PPCBUG-711 — decoder.rs:183-199 (extract_vx128_uimm5) has a 17-line doc comment narrating the pre-PPCBUG-700 buggy convention; references "First-Pixels M3" without citing the PPCBUG IDs. P3 cleanup. Recommendation: trim to 3-4 lines, move history to audit-findings.md.

PPCBUG (M04 — FPSCR + VMX)

PPCBUG-712 — crates/xenia-cpu/src/overflow.rs:29-102: 64-bit overflow helpers (add_ov_64, sub_ov_64, adde_ov_64, sum_overflow_64, neg_ov_64) are dead code; interpreter inlines all 32-bit i128 overflow checks for the 32-bit ABI. P3 cosmetic. See audit-out/m04-fpscr-vmx.md.
PPCBUG-713 — crates/xenia-cpu/src/interpreter.rs:3848-3852 (vcmpbfp/vcmpbfp128): CR6.LT never set when all lanes are out-of-bounds. Canary's f.UpdateCR6(f.Or(gt, lt)) (ppc_emit_altivec.cc:579) sets LT = all-true(out-mask). xenia-rs hardcodes lt: false. P2; coupled with PPCBUG-421 (Rc-bit position) — both must land together. See audit-out/m04-fpscr-vmx.md.
PPCBUG-714 — crates/xenia-cpu/src/{fpscr.rs,interpreter.rs}: VXSOFT constant defined (fpscr.rs:51) but no setter anywhere. Software-triggered only via mtfsf paths, which were not verified to honour the bit. P3. See audit-out/m04-fpscr-vmx.md.
PPCBUG-715 — crates/xenia-cpu/src/interpreter.rs:2681,2694,2736,2750: fmsubx/fmsubsx/fnmsubx/fnmsubsx compute a.mul_add(c, -b). Rust's unary - flips the sign bit of a NaN b, corrupting NaN-payload propagation. Distinct from PPCBUG-205 which fixed the output negation; this is the input negation. P2; recommendation: replace -b with if b.is_nan() { b } else { -b }. See audit-out/m04-fpscr-vmx.md.
PPCBUG-716 — crates/xenia-cpu/src/fpscr.rs:320-325 (update_cr1): maps FPSCR[FX]→CR1.lt, [FEX]→CR1.gt, [VX]→CR1.eq, [OX]→CR1.so. Logic matches canary CopyFPSCRToCR1 (ppc_hir_builder.cc:491-501), but reuse of generic CrField field names without a comment block tying fx→lt invites future confusion. P3 docs. See audit-out/m04-fpscr-vmx.md.

PPCBUG (M03 — interpreter)

PPCBUG-720 — interpreter.rs:118 addi truncates result to 32 bits (as u32 as u64); canary ppc_emit_alu.cc:103-115 does full 64-bit add. Charter only documents addis truncation, not addi. P1. [REGRESSION-CANDIDATE] See audit-out/m03-interpreter.md.
PPCBUG-721 — interpreter.rs:138-152 addic/addicx operate on 32-bit narrowed operands; CA from result32 < ra32. Canary ppc_emit_alu.cc:117-135 is fully 64-bit via AddDidCarry. P1. [REGRESSION-CANDIDATE]
PPCBUG-722 — interpreter.rs:155-163 subfic 32-bit-only; canary ppc_emit_alu.cc:459-466 is 64-bit. P1. [REGRESSION-CANDIDATE]
PPCBUG-723 — interpreter.rs:165-172 mulli casts product as u32 discarding bits [32:63]; canary uses 64-bit signed multiply (low 64 of 128-bit product per ISA). P2.
PPCBUG-724 — interpreter.rs:1244,4594 stwcx/stdcx width-discriminator (reservation_width == 4/8) is stricter than canary (ppc_emit_memory.cc:868-908 no width check) and stricter than PowerISA. Reopen of PPCBUG-151. P0. [REGRESSION-CANDIDATE — STRONG] Bisect around a107ac9.
PPCBUG-725 — interpreter.rs:1665 mtmsrd L=1 mask is EE | RI (0x8001); canary ppc_emit_control.cc:828-837 uses EE only (0x8000). P2.
PPCBUG-726 — interpreter.rs:737-748 rlwimix zeroes RA[0:31] via as u32 ... as u64; canary ppc_emit_alu.cc:1010-1033 preserves RA[0:31] via 64-bit OR with MASK(MB+32, ME+32). P2.
PPCBUG-727 — interpreter.rs:2901,2922 fctidx/fctidzx overflow boundary val >= (i64::MAX as f64) mis-flags (2^63 - 1024, 2^63) as overflow due to f64 precision (i64::MAX rounds up to 2^63 in f64). P3.
PPCBUG-728 — interpreter.rs:1705-1724 dcbz/dcbz128 only call invalidate_for_write(ea) once. Confirmed sufficient (32B fits in 128B line; dcbz128 IS a 128B line). WONTFIX, informational guard for future widening.
PPCBUG-729 — interpreter.rs:1117,1124,1130 lwa/lwax/lwaux correctly sign-extend per hotfix f1166d0. CLEARED, verification only.
PPCBUG-730 — Reservation granule is 128 bytes (Xenon-correct) vs canary's byte-granular real_addr(EA). Documented, recommendation: append to charter §"Known Intentional Divergences from Canary". P3 informational.
PPCBUG-731 — interpreter.rs:908-938 bcx LR write timing in both AA paths. Confirmed equivalent to canary. P3 informational.
PPCBUG-732 — interpreter.rs:962-981 bcctrx correctly omits CTR decrement (CTR is target). Confirmed equivalent to canary. P3 informational.
PPCBUG-733 — interpreter.rs:1610 mtspr CTR truncates input to 32 bits (val as u32 as u64); mfspr CTR returns 64-bit. Canary ppc_emit_control.cc:792 stores full 64-bit. PowerISA: CTR is 64-bit SPR. P2.
PPCBUG-734 — interpreter.rs:2980-3040 fcmpu/fcmpo correctly distinguish ordered/unordered VXSNAN/VXVC. Canary ppc_emit_fpu.cc:329-367 has bug — bool ordered parameter never read. P3 (Rust is more correct); recommend appending to charter §"Known Intentional Divergences from Canary".
PPCBUG-735 — interpreter.rs:441,450,459,476,493,617,681,689,706,720,769,779,789,799,809,819 64-bit Rc-form ALU ops (mulld., mulhd., mulhdu., divd., divdu., cntlzd., sld., srd., srad., sradi., rldicl., rldicr., rldic., rldimi., rldcl., rldcr.) call update_cr_signed(0, x as i64) — full 64-bit signed view; canary ppc_hir_builder.cc:397-421 UpdateCR(n, v) does Truncate(v, INT32_TYPE) first — always 32-bit. CR0 disagrees with canary on values that change sign between i32 and i64 view. P1. [REGRESSION-CANDIDATE — STRONG]

MEMBUG (M06 — memory subsystem)

Headline: write-visibility verdict = NOT broken at the memory layer (same-thread store/load is mechanically sound; BST paradox cause is upstream — see M03 candidates). 9 findings; 1 P1, 4 P2, 4 P3. See audit-out/m06-memory.md.

MEMBUG-001 — crates/xenia-memory/src/heap.rs:155-171 bump_page_version Release fence on page_versions[idx] correctly publishes the prior data store on x86_64 (TSO) and on weaker hosts via Release-store ordering. Doc-only risk: any future code that publishes via page_versions without first executing the data store and the Release-store inside bump_page_version would silently lose the visibility edge. P2 docs.
MEMBUG-002 — crates/xenia-memory/src/heap.rs:8 hardcodes PAGE_SIZE = 4096 for the entire 4 GB. Canary uses 4K/64K/16MB across 9 distinct heaps (memory.cc:222-242). Consequence: PageEntry::region_page_count is in 4K units rather than heap-native units — guest queries that walk region_page_count * page_size overshoot for 64K-heap-allocated regions. Latent. P2.
MEMBUG-003 — crates/xenia-memory/src/heap.rs:184-202 no physical-address aliasing across 0xA0000000/0xC0000000/0xE0000000. Canary maps all three onto the same physical-membase view (memory.cc:235-242). A guest CPU write to one alias is invisible at another. Risk: MmGetPhysicalAddress-shape round-trips and DMA-buffer aliasing return stale bytes. P1, latent.
MEMBUG-004 — crates/xenia-memory/src/heap.rs is_mapped accepts addresses in 0xFFD00000-0xFFFFFFFF; canary LookupHeap (memory.cc:434) returns null. Latent — corrupt high-byte pointers don't fault. P2.
MEMBUG-005 — crates/xenia-memory/src/platform.rs:31 always commits with PROT_READ | PROT_WRITE; xenia-rs cannot fault on writes to guest-read-only-protected pages. Matches canary's emit_inline_mmio_checks mode (no host-level protect enforcement). P3 informational.
MEMBUG-006 — crates/xenia-gpu/src/mmio_region.rs:62-67,108-115 unmapped GPU MMIO reads/writes log at tracing::trace!; should be warn (rate-limited per (reg_index, kind) pair) so renderer-divergence first-line observability doesn't require enabling trace globally. P2.
MEMBUG-007 — crates/xenia-memory/src/heap.rs:434-436,450-452,467-469 cross-page bump_page_version guard verified correct for all access widths. P3 informational.
MEMBUG-008 — icbi-correctness invariant (cross-references PPCBUG-704): every data store must bump_page_version. If any future perf optimization makes that conditional, icbi (currently no-op) must be made explicit. P3 documentation.
MEMBUG-009 — Static analysis: 29 distinct callers of sub_82454770 (intrusive-list-merge validator); only the BST registration through sub_82175E68 → sub_82175F10 trips the throw. Confirms the renderer-blocker is NOT a memory-layer issue — every list-merge operation would fail uniformly if it were. P3 informational, supports M06 verdict.

XAMBUG (M08 — XAM)

XAMBUG-001 — crates/xenia-kernel/src/xam.rs:204-208 xam_task_schedule allocates a handle and returns 0 without ever invoking the callback. Canary xam_task.cc:43-81 spawns an XThread that runs the callback (which typically signals XTASK_MESSAGE.event_handle). Sylpheed callsite confirmed at thunk 0x8284dafc ← sub_824a9710 (0x824a9a10). Likely cause of one or more parked-waiter handles in M10. P0 candidate.
XAMBUG-002 — crates/xenia-kernel/src/xam.rs async XAM exports (XamContentCreate, XamContentClose, XamContentDelete, XamContentCreateEnumerator, XamContentSetThumbnail, XamContentGetCreator, XamShowKeyboardUI, XamShowDeviceSelectorUI, XamShowMessageBoxUIEx, XamShowGamerCardUIForXUID, XamEnumerate, XMsgStartIORequest, XMsgStartIORequestEx) are all stub_success and never touch overlapped_ptr. Canary completes the overlapped via CompleteOverlappedImmediate / CompleteOverlappedDeferredEx and returns X_ERROR_IO_PENDING (xam_content.cc:418-422, xam_msg.cc:64-67, xam_ui.cc:382-389). Any wait on the overlapped event hangs forever. P0 candidate.
XAMBUG-003 — crates/xenia-kernel/src/xam.rs:45 XamUserGetSigninState is stub_return_zero (always 0 = "not signed in"). Canary xam_user.cc:90-104 returns signin_state (typically 1 = signed-in offline) when a profile exists. Sylpheed callsite confirmed at thunk 0x8284db3c ← sub_824a9c90. Boot guard bl XamUserGetSigninState; cmpwi r3,0; beq <bail> would force the bail branch. P1, possibly P0.
XAMBUG-004 — crates/xenia-kernel/src/xam.rs:232-239 xam_user_get_xuid returns 0 (success) with xuid=0. Canary xam_user.cc:30-67 returns X_E_NO_SUCH_USER when the user isn't signed in. P1.
XAMBUG-005 — crates/xenia-kernel/src/xam.rs:241-248 xam_user_get_name returns 0 (success) with empty buffer. Canary xam_user.cc:137-164 returns X_ERROR_NO_SUCH_USER when the user isn't signed in. P1.
XAMBUG-006 — crates/xenia-kernel/src/xam.rs:192-200 XamLoaderLaunchTitle/XamLoaderTerminateTitle return normally with gpr[3]=0. Canary xam_info.cc:380-432 explicitly does not return — calls kernel_state()->TerminateTitle(). Sylpheed has 2 callsites for XamLoaderTerminateTitle. Returning normally allows the title to keep executing past a fatal-exit path. P1.
XAMBUG-007 — crates/xenia-kernel/src/xam.rs:257-273 xam_get_execution_id heap-allocates a 24-byte struct on every call and writes hardcoded title_id=0x535107D4, media_id=0x2D2E2EEB, version=0, base_version=0, disc=1/1. Canary xam_info.cc:321-336 writes the guest pointer to the existing XEX EXECUTION_INFO opt-header. Hardcoded bytes diverge from real header for version/base_version; per-call leaks. P1.
XAMBUG-008 — crates/xenia-kernel/src/xam.rs:212-228 xam_alloc ignores flags. Canary xam_info.cc:434-455 notes 0x00100000 controls zero-fill; canary always uses SystemHeapAlloc which zero-fills. Severity depends on whether xenia-rs's state.heap_alloc zero-fills: P1 if not, P2 if yes.
XAMBUG-009 — crates/xenia-kernel/src/xam.rs:73-74 XamUserCreateAchievementEnumerator and XamUserCreateStatsEnumerator are stub_success and don't fill *handle_ptr. Canary xam_user.cc:580-647 and :1025-1059 create real XEnumerator objects. Game reads stale memory as the handle; subsequent XamEnumerate returns 0x12 only by happy coincidence. P2.
XAMBUG-010 — crates/xenia-kernel/src/xam.rs:77-82 UI dialog exports (XamShowSigninUI, XamShowKeyboardUI, XamShowDeviceSelectorUI, XamShowGamerCardUIForXUID, XamShowDirtyDiscErrorUI, XamShowMessageBoxUIEx) are all stub_success and never write result_ptr->ButtonPressed. Canary fills the result and completes overlapped (xam_ui.cc:322-419). Game reads stale ButtonPressed → may take wrong dialog branch. P2.
XAMBUG-011 — crates/xenia-kernel/src/xam.rs:305-307 XGetAVPack returns 0x16 (=22), outside canary's documented range 0..8. Canary xam_info.cc:35-46 defaults to 8 (HDMI). Comment in xam_info.cc:248-251 warns games may PAL-check against {3,4,6,8} — 0x16 matches none. Recommend changing to 8. P2.
XAMBUG-012 — crates/xenia-kernel/src/xam.rs:50 XamEnumerate returns 0x12 (ERROR_NO_MORE_FILES). Canary xam_enum.cc:25-32 returns X_ERROR_INVALID_HANDLE for unknown handle and WriteItems for valid ones. xenia-rs is "convenient happy path" only because XAMBUG-009 means no real handle exists. P2.
XAMBUG-013 — crates/xenia-kernel/src/xam.rs:275-277 XamGetSystemVersion returns 0x20000000. Canary xam_info.cc:229-237 returns 0 with explicit "pretend old" comment; both arbitrary, both kStub. Could affect symbol-loading branches in title code. P3.
XAMBUG-014 — crates/xenia-kernel/src/xam.rs:309-311 XGetGameRegion returns 0xFF (8-bit). Canary xam_info.cc:256-277 returns 16-bit values from a 109-entry country table (e.g. 0x0101 for Japan, 0xFFFF for "all"). Sylpheed J probably masks fine but the value is structurally wrong. P3.
XAMBUG-015 — crates/xenia-kernel/src/xam.rs:317-328 XGetVideoMode writes only 5 fields (20 bytes). Canary's X_VIDEO_MODE struct is larger; trailing fields left with stale stack data on the guest side. P3.
XAMBUG-016 — crates/xenia-kernel/src/xam.rs:142-162 (xam_input_get_state) only bumps state.input_packet_number when gamepad_key != last_input_bytes. Fake-pad steady state keys to 0; packet_number stays 0 forever. Games that detect "input never changed since startup" via packet_number monotonicity may misbehave. canary increments under similar conditions only on real change; spirit-match. Sylpheed unaffected at boot. P3.

KRNBUG (M07 — kernel HLE)

Per-milestone consolidated report: audit-out/m07-kernel-hle.md. Sub-reports under audit-out/m07{a,b,c,d}-*.md retain local sub-prefixes; master IDs unified below.

Headline P0 / P1

KRNBUG-017 (P0 under --parallel) — Kf-spinlock no-op (KfAcquireSpinLock/Release, KeRaiseIrql, KeLowerIrql). Lockstep tolerates this; --parallel allows concurrent guest CS entry → state corruption invisible to existing tests. M07a, exports.rs.
KRNBUG-Vd-04 (P0) — VdSwap bypasses PM4 ring; canary writes Type-0 fetch-constant patch + PM4_XE_SWAP into reserved slot, ours fills NOPs and calls state.gpu.notify_xe_swap directly. Most plausible cause of swaps=2→swaps=1 regression. M07c, exports.rs vd_swap.
KRNBUG-008 (P1) — ExCreateThread ignores xapi_thread_startup parameter. Canary invokes the prologue callback before user entry; we skip it. M07b.
KRNBUG-011 (P1) — ExCreateThread ignores creation_flags bit 0x80 (guest_object return). M07b.
KRNBUG-013 (P1) — ExGetXConfigSetting stub_success writes nothing into output buffer; Sylpheed reads garbage stack memory during early boot. M07b.
KRNBUG-Mm cluster (P1) — MmAllocatePhysicalMemoryEx ignores all attribute bits (protect, page_size, range, alignment, WC/NoCache). Pool family entirely unregistered. M07c.
KRNBUG-D08 (P1 candidate) — VSYNC_INSTR_PERIOD = 150_000 calibrated for ~10 MIPS lockstep; under --parallel (~24× slower) drops to ~2.5 Hz wall. Plausible swap-regression contributor. M07d, interrupts.rs.

Other P1 / P2 / P3

77 KRNBUG IDs total filed across M07a/b/c/d. Severity distribution: 3 P0, 11 P1, 28 P2, 35 P3.

Full list and rationale in sub-reports. M07-lead consolidation at audit-out/m07-kernel-hle.md. Highlights:

Nt/Ke/Kf: KRNBUG-005 (NtAllocateVirtualMemory ignores flags), KRNBUG-008 sub-prefix-a (NtCreateFile drops desired_access/share/disposition), KRNBUG-014 (DPC family unimplemented).
Rtl/Ex: 35+ canary-table Rtl* ordinals unregistered (KRNBUG-001 sub-prefix-b; needs trace-handles audit to triage), CS stale-owner override (KRNBUG-004 sub-prefix-b).
Ob/Mm/Vd: ObReferenceObjectByName + ObOpenObjectByName + ObTranslateSymbolicLink unregistered, ExFreePool / MmFreePool entirely missing, VdGetCurrentDisplayInformation/VdQueryVideoFlags/VdInitializeScalerCommandBuffer/VdInitializeEngines all stubbed.
Xex/misc: XexCheckExecutablePrivilege always 0, XexGetProcedureAddress ignores string-name path, sprintf/_vsnprintf produce empty buffers (KRNBUG-D12).

XAMBUG (M08 — XAM)

Per-milestone report: audit-out/m08-kernel-xam.md. 16 XAMBUG IDs.

XAMBUG-001 (P0 candidate) — XamTaskSchedule never invokes callback

Location: crates/xenia-kernel/src/xam.rs:204-208. Returns 0 without spawning the task. Canary spawns an XThread to run the callback; the callback typically signals an XTASK_MESSAGE.event_handle. Strong candidate for one or several of the 4 parked-waiter handles (0x1004, 0x100c, 0x15e4, 0x42450b5c). Sylpheed callsite confirmed at sub_824a9710 / 0x824a9a10.

XAMBUG-002 (P0 candidate) — 13 async XAM exports never complete overlapped

Location: xam.rs Content*, ShowUI, XMsgStartIORequest, XamEnumerate. All stub_success and never call CompleteOverlappedImmediate / Deferred on overlapped_ptr. Any guest wait on the overlapped event hangs.

XAMBUG-003 (P1, possibly P0) — XamUserGetSigninState returns 0

Location: xam.rs. xenia-rs returns 0; canary returns 1 (signed-in offline by default). Sylpheed boot guard would force the bail branch.

Other 13 XAMBUG IDs

XAMBUG-004..016, mostly P2/P3 cosmetic. Highlight: XAMBUG-016 (P3) packet_number never increments in fake-pad steady state because key stays 0.

MEMBUG (M06 — memory subsystem)

Per-milestone report: audit-out/m06-memory.md. 9 MEMBUG IDs (1 P1, 4 P2, 4 P3).

Verdict: write-visibility NOT BROKEN

Same-thread store→load through crates/xenia-memory/src/heap.rs is mechanically sound. Both paths derive raw *mut u8/*const u8 pointers from the same membase mapping; no per-thread cache, no write-back buffer, no block-cache layer that returns stale data bytes (block cache only caches decoded instructions, never data). The bump_page_version Release-store comes after the byte store and is a cross-thread visibility primitive; same-thread program order trivially observes the just-written byte.

BST paradox at sub_82175E68 → sub_82175F10 is OPEN but not a memory bug. Both registrar and validator run on the same HW slot in the same scheduler round. Likely upstream causes: M03 PPCBUG-720..735 (interpreter 32/64-bit truncation bugs) corrupting the comparison feeding the validator, or constructor-side logic in sub_821766A0/sub_825ED268.

MEMBUG-003 (P1) — physical-address aliasing across cached/write-combine ranges not implemented

Location: crates/xenia-memory/src/heap.rs. The 0xA000_0000 (write-back), 0xC000_0000 (write-combine), 0xE000_0000 (uncached) virtual ranges are all distinct mappings in xenia-rs but should alias the same physical memory. Latent risk for any DMA-buffer round-trip; not currently observed to break Sylpheed but is a correctness gap.

Other MEMBUG IDs

MEMBUG-001..009. Highlights: MEMBUG-002 P2 (MMIO aperture single-bit-mask fast-path doesn't validate against region table on hit), MEMBUG-005 P2 (no protection-fault path; reads of unmapped memory return 0), MEMBUG-007 P3 (Be serde missing round-trip test).

GPUBUG (M09 — GPU pipeline)

Per-milestone consolidated report: audit-out/m09-gpu.md. Sub-reports under audit-out/m09{a,b,c}-*.md. 33 IDs; severity: 6 P0, 12 P1, 8 P2, 7 P3.

Headline P0

GPUBUG-001 (P0) — VdSwap kernel-bypass: vd_swap zero-fills 64-dword reserved ring slot with NOPs and calls state.gpu.notify_xe_swap directly. Canary writes Type-0 fetch-constant patch + PM4_XE_SWAP into the slot and lets the CP consume it. PM4_XE_SWAP opcode handler at gpu_system.rs:1232 is dead code at runtime. Confirms KRNBUG-Vd-04. Most plausible cause of swaps=2→swaps=1 regression.
GPUBUG-100 / shader-005 (P0) — operand modifiers (swizzle/abs/neg) never read from word-1 in WGSL interpreter; every ALU instruction executes against unmodified operands.
GPUBUG-101 / shader-006 (P0) — c# constant-register selector bit masked off; every shader reads r[low7] (temp) instead of constants. WVP matrix etc. never read.
GPUBUG-102 / shader-007 (P0) — vertex fetch never applies GpuSwap endian; big-endian VBs decode as garbage on little-endian host.
GPUBUG-103/104/105 / draw-008/009/010 (P0) — 8 of 26 draw_state register addresses misdecoded: VGT_DRAW_INITIATOR, VGT_DMA_BASE, VGT_DMA_SIZE, PA_SC_WINDOW_SCISSOR_TL/BR (reading SCREEN_SCISSOR), RB_COLOR_INFO_1/2/3, PA_SU_VTX_CNTL, index_size from bit 8 instead of bit 11.

Headline P1

GPUBUG-006 (P1) — sync_with_mmio Relaxed-load on WPTR; broken Release/Acquire pair; latent under --parallel.
GPUBUG-shader-002 (P1) — D3D9 legacy Inf*0=+0 not honored. Canary documents same divergence as causing white-screen in 4D5307E6.
GPUBUG-301 (P1) — read/write_sample_64bpp doubles pitch but surface_pitch_tiles() already pre-doubles for 64bpp → quadruple stride for 64bpp resolves. Tests bypass from_register_file so don't catch this.
GPUBUG-304 (P1) — bind_primary_texture hardcodes version_when_uploaded: 0 so guest writes never invalidate uploaded textures.
GPUBUG-305 (P1) — texture cache missing K1555, K24_8, K_8, K1010102, K10_11_11, _AS_* formats; bound to magenta stub.
Plus 7 more P1 in shader/draw_state region (GPUBUG-106..112).

Other P2/P3

15 IDs. Highlights: GPUBUG-002 (P2) PM4 type-3 coverage 35/47 not 47/47 as memory file claimed — missing COND_EXEC, WAIT_REG_EQ, WAIT_REG_GTE, EVENT_WRITE_CFL plausibly hit by Sylpheed; GPUBUG-302 (P2) RenderTargetKey::is_64bpp returns wrong format set; GPUBUG-303 (P2) CPU-side TextureCache::ensure_cached is dead code.

Verdict

Renderer-blocker explanation: The GPU pipeline is structurally wrong at multiple stages (shader operand decode + constant selector + vertex endian + 8 register addresses + VdSwap bypass). draws=0 and swap regression both fall out of this class of failure. Combined fix queue: GPUBUG-001 + GPUBUG-100..105 must land together — partial fixes likely won't unblock visible rendering.

XMODBUG (M10 — cross-module seams)

Per-milestone consolidated report: audit-out/m10-cross-module.md. Sub-reports under audit-out/m10-x{1..5}-*.md. 22 IDs; severity: 1 P0, 6 P1, 5 P2, 10 P3.

Headline P0

XMODBUG-013 (P0) — Missing fetch-constant patch in VdSwap. Re-confirms KRNBUG-Vd-04 / GPUBUG-001 from the seam perspective. Frontbuffer slot 0 retains stale texture descriptor; Sylpheed bloom/blur path reads garbage. Strongest single P0 cause of swap regression.

Headline P1

XMODBUG-001 (P1) — stwcx/stdcx data write happens AFTER try_commit clears the slot. Race window: another HW thread can lwarx the cleared slot, read pre-write data, and commit. Latent under --parallel.
XMODBUG-002 (P1) — GuestMemory::write_bulk (used by NtReadFile and XEX loader) skips both bump_page_version and reservation invalidation. Latent if any code-bearing memory is bulk-written.
XMODBUG-010 (P1) — CP_INT_STATUS never produced from GPU side; only synthetic vsync interrupts ever reach the kernel. Real CP-side events (EOP, RSC, IB-end) missing.
XMODBUG-011 (P1) — VSYNC_INSTR_PERIOD fragile proxy. Re-confirms KRNBUG-D08 from seam perspective.
XMODBUG-012 (P1) — notify_xe_swap synthetic interrupts displace real CP interrupts in 4-deep queue.

Other P2/P3

15 IDs. Notable:

XMODBUG-005 (P2) — nt_close on a handle with parked waiters silently strands them.
XMODBUG-003 (P2) — no MemoryBarrier around reserved ops; latent on non-x86 hosts.
XMODBUG-021 (P2) — WaitAll partial-satisfaction false-wake (semantic gap, not a race).
XMODBUG-022 (P2) — force-wake path doesn't scrub waiter lists like timed-wake path does.

Verdict

The renderer plateau and swap regression are explained by a multi-causal failure at the GPU pipeline + kernel-↔-GPU seam. Combined fix queue: KRNBUG-Vd-04 / GPUBUG-001 / XMODBUG-013 (VdSwap rewrite to write real PM4 sequence) + GPUBUG-100..102 (shader operand decode + constant-register selector + vertex fetch endian) + GPUBUG-103..105 (8 register addresses) must land coherently to unblock visible rendering.

The 4 parked-waiter handles remain unexplained at this audit's depth. M11 follow-up should run the --trace-handles audit at -n 5B and pivot to PPC-level trace if no signal exists.

SWAPBUG (M11 — swap-regression bisection)

Per-milestone report: audit-out/m11-runs.md.

SWAPBUG-001 (P0) — PPCBUG-001 addi 32-bit truncation regresses swaps=2 → 1

Severity: P0 — direct cause of the headline swaps=2 → 1 regression that motivated this entire audit.
Status: open (audit-only; fix decision left to follow-up).
Location: crates/xenia-cpu/src/interpreter.rs:114-118 — the single as u32 as u64 cast at the end of the addi opcode arm.
Bisection trail:
- Phase-level: pre-P1/P1/P2/P3 → swaps=2; P4/d945aea → swaps=1.
- Internal P4 commits: 145a7a4 → swaps=2; bf8208e ("PPCBUG-001/002/003/004/005/007 4b immediate ALU truncation") → swaps=1.
- Hunk-level (revert each PPCBUG individually within bf8208e): only PPCBUG-001 revert restores swaps=2. PPCBUG-002/003/004/005/007 reverts leave swaps=1.
Mechanism: addi is the most common opcode (282k uses, 3.4% of all instructions in sylpheed.db). Adding as u32 as u64 to its writeback truncates the upper 32 bits of the result. Sylpheed has at least one control-flow site that depends on the un-truncated 64-bit value.
Cross-references: confirms M03 PPCBUG-720 prediction ("addi/addic/subfic truncate to 32 bits without canary parity"). The fix is canary-divergent — canary does NOT truncate addi.
Recommendation: revert the addi truncation. Re-examine the test addi_li_neg_one_zero_extends_upper to assert canary semantics, not the over-truncated form. Independently re-examine the addis truncation (which IS deliberate per the addis fix memory file but may have its own broader implications).

SWAPBUG-002 (P2) — PPCBUG-004 mulli truncation affects IRQ delivery anomalously

Severity: P2 — anomalous side effect, not blocking.
Status: open.
Location: crates/xenia-cpu/src/interpreter.rs mulli arm (changed in bf8208e).
Symptom: reverting mulli truncation alone (on top of bf8208e) drops interrupts_delivered from 629 to 101 at -n 100M lockstep. Swaps stays at 1. The OPPOSITE direction from SWAPBUG-001.
Mechanism (hypothesis): a mulli result is consumed by an instruction-count or frame-count computation that controls vsync injection target selection or some early-boot loop iteration count.
Recommendation: no immediate action; investigate as part of M07d KRNBUG-D08 / XMODBUG-011 vsync-timing audit.

ANLBUG (M11 — analysis crate)

ANLBUG-001 (P2) — `xenia-rs dis` does not create SQL views by default

Severity: P2 — feature mismatch between tests and CLI.
Status: open.
Location: crates/xenia-app/src/main.rs:3189 — w.create_sql_views() is gated on --analyze=Sql or --analyze=Both. Default is Rust, which skips view creation.
Symptom: regenerated sylpheed.db has none of the application views (v_branch_xrefs, v_call_graph, v_function_first_instruction, v_imports_called, v_reachability_from_entry). The schema-golden test creates them; the user-facing CLI does not.
Cross-reference: ORACBUG-005 (M01) — schema test uses synthetic 4-instr PE; doesn't catch this gap.
Recommendation: either always create views in --db mode, or document the requirement clearly in CLI help.

Fix session 2026-05-03 — outcome

Single-session fix sprint executed against this audit's recommended queue. 12 IDs closed across 11 commits + 9 merge commits on master. Branch lineage: each phase a topic branch, merged with --no-ff to preserve hunk-bisect lineage; all branches deleted post-merge.

Phase	Commit	IDs closed	Severity	Notes
A	`9ab986e`	SWAPBUG-001 / PPCBUG-001	P0	addi 32-bit truncation revert. swaps 1→2 confirmed.
B	`1f416aa`	ORACBUG-004 (partial: ORACBUG-006)	P0	sylpheed_n50m stable-digest golden + `--stable-digest` CLI flag. n4b deferred (canonical invocation pathologically slow per audit).
C	`82f3d61`	KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013	3× P0	VdSwap PM4 ring path (writes Type-0 fetch-constant patch + Type-3 PM4_XE_SWAP into ring memory at WPTR). Direct `notify_xe_swap` retained as idempotent safety net.
D1	`78ea81c`	GPUBUG-101	P0	ALU src1/2/3_sel temp-vs-constant selector decoded from word-0 bits 29-31.
D2	`c5c6713`	GPUBUG-100 (abs deferred)	P0	per-operand component-relative swizzle + negate decoded from word-1. abs flag (dual-meaning bit 7 / word-2) intentionally deferred.
D3	`ec2d955`	GPUBUG-102	P0	per-format `gpu_swap` endian byte-swap on vertex fetch (kNone/k8in16/k8in32/k16in32).
E	`8723d68`	GPUBUG-103, GPUBUG-104, GPUBUG-105	3× P0	8 register addresses re-validated against canary `register_table.inc`; index_size bit 8→11; PA_SU_VTX_CNTL 0x2083→0x2302.
F1	`e7d0fcf`	KRNBUG-017	P0-under-parallel	Kf*SpinLock + KeReleaseSpinLockFromRaisedIrql + KeTryToAcquireSpinLockAtRaisedIrql now write the lock value to guest memory.
G1	`8fc1b1d`	GPUBUG-006	P1	sync_with_mmio Acquire/Release pairs the producer-side Release at mmio_region.rs:78.
G2	`780e854`	XMODBUG-002	P1	GuestMemory::write_bulk now bumps page_versions for every page it touches.

Headline outcome

Metric	Pre-sprint	Post-sprint	Goal	Met?
`swaps` (-n 100M)	1	2	≥2	✅
`draws` (-n 100M)	0	0	>0	❌ (multi-causal — see below)
Tests passing	551	556	≥551	✅
Renderer plateau	locked	partially unblocked	unblocked	partial

The audit's central prediction — Phases C+D+E together unlock draws > 0 — was not met empirically at -n 100M lockstep. The plateau persists because:

shader_blobs_live stays at 0 after 100M. The game has not yet issued IM_LOAD; resource-loader worker threads are still parked.
The audit's parked-waiter analysis (project_xenia_rs_audit_2026_05_02.md, 4 handles 0x1004 / 0x100c / 0x15e4 / 0x42450b5c) remains unresolved. Phase F1 (Kf-spinlock) lands but doesn't unblock these handles; XAMBUG-001 was ruled out by M10-X2.

Phases attempted but deferred

F2 (XAMBUG-001 XamTaskSchedule callback spawn): per audit M10-X2, ruled out as the parked-waiter cause. Bug is real but doesn't move the renderer-plateau needle within this sprint. Implementing the XThread spawn for the callback is moderate complexity (~45 min); deferred to a follow-up session.
F3 (XAMBUG-002 overlapped completion helper): requires new infrastructure (KernelState::complete_overlapped) plus wiring 13 async XAM stubs. Substantial. Deferred.
G2 (KRNBUG-D08 / XMODBUG-011 VSYNC wall-clock): switching from instruction-count proxy to wall-clock would destabilize the lockstep digest's interrupts_delivered field (which the existing full-digest sylpheed_n2m oracle still tracks). Deferred to allow paired oracle-update.
G3 (PPCBUG-720/721/722 addic/addic./subfic revert): verified canary directly (xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc:117-136) — canary uses full 64-bit add with sign-extended immediate, not the "i32 → i64 → u64" path the Plan agent suggested. The current xenia-rs 32-bit ABI workaround is plausibly correct for Xbox 360 user mode (per the addis pattern). The "PPCBUG" label may itself be wrong; defer until canary semantics are re-confirmed against a known-good Sylpheed code-path trace.
KRNBUG-Mm cluster (P1 sweep): substantial implementation work (proper protect/page_size/range honoring in MmAllocatePhysicalMemoryEx; per-heap offsets in MmGetPhysicalAddress; real Mm tracking for MmFreePhysicalMemory). Deferred.

Sprint acceptance criteria

#	Criterion	Met?
1	Phase A: SWAPBUG-001 reverted, swaps=2 confirmed	✅
2	Phase B: sylpheed_n50m + n4b goldens	✅ partial — n50m landed; n4b deferred (perf)
3	Phases C+D+E: 100M lockstep produces `draws > 0`	❌ multi-causal
4	Phase F: ≥1 of 4 parked-waiter handles signals	❌ — F1 alone insufficient
5	Phase G: ≥3 P1 groups landed	❌ partial — 2 landed (G1, G2-XMODBUG-002)
6	`cargo test --workspace --release` ≥557	❌ — 556 (off by 1; new sylpheed_oracles is ignore-gated)
7	audit-findings.md marked applied	✅ this section
8	Memory file updated	✅ (separate file)
9	Workspace clean; no skipped/ignored tests added	⚠ — sylpheed_n50m is `#[ignore]` per design (3-min run)
10	All work merged to master	✅ — no dangling branches

Recommended next session

Investigate parked-waiter handles directly at -n ≥4B with --trace-handles. The audit's hypothesis is that one of the 4 handles' producer never fires; pinpoint the producer code-path to identify the missing kernel-side signal.
Phase G2 + matching n2m oracle re-baseline: switch VSYNC to wall-clock and re-baseline interrupts_delivered together as a single commit pair.
F2/F3 if appetite is there for new XAM infrastructure; non-zero chance one of the unblocked completions is the missing producer for one of the 4 parked handles.
Resume KRNBUG-Mm cluster for proper memory-protect / range / per-heap honoring; required before canary-disambiguating the addic/subfic class (canary semantics are a 64-bit add against guest memory the Mm layer doesn't fully model yet).

Follow-up session 2026-05-03 — outcome

Three audit IDs closed across 3 commits, merged to master with --no-ff. HEAD: 8668550. Tests: 556 → 561 (+5 from new wall-clock + ghost-trail tests).

Audit IDs landed

ID	Commit	Description
GPUBUG-DRAIN-001	`7a1b6b3`	VdSwap PM4 fallback warning silenced under `--parallel`. New `drain_until_wptr(target, time_budget)` mirrors canary's `WorkerThreadMain` predicate; vd_swap skips PM4 ring injection (unreliable when ring backs up under --parallel) and uses direct `notify_xe_swap`. The slot-0 fetch-constant patch is deferred (GPUBUG-FETCH-PATCH-001). DrainFence handler publishes the digest mirror before reply (was racing the CPU's post-drain digest_snapshot read).
KRNBUG-AUDIT-001	`d1105aa`	Diagnostic instrumentation: `--trace-handles-focus=<LIST>` flag + per-handle DIAGNOSIS report. `record_signal` falls through to ghost-trail capture for focused handles even when no `record_create` exists. Producer-class classification (GuestExport / KernelInternal). Distinguishes "guest never tried" from "signal landed but missed waiter" in one run.
KRNBUG-D08	`27d3608`	V-sync wall-clock under `--parallel`. Lockstep stays on the deterministic instruction-count proxy (sylpheed goldens unchanged). `--parallel` switches to wall-clock via `tick_vsync_wallclock`, raising delivered v-syncs from ~2 → 17 at -n 30M. INTERRUPT_QUEUE_CAP=4 still bottlenecks burst delivery.

Parked-waiter producer-trace finding

Empirical run at -n 500M lockstep with the new --trace-handles-focus=0x1004,0x100c,0x15e4,0x42450b5c:

handle=0x00001004 kind=Event/Manual waiters=1 signaled=false
                  signal_attempts=0 (primary=0, ghost=0) waits=1 wakes=0
   created cycle=0 tid=1 lr=0x824a9f6c src=NtCreateEvent
   timeline: cycle=0 tid=10 lr=0x824ac578 src=do_wait_single[wait]
   GuestExport=0  KernelInternal=0  waits=1
   => producer is a missing kernel signal source (or BST-paradox upstream)

Same shape for 0x100c and 0x15e4. 0x42450b5c shows <UNCREATED> + <AUDIT_BLIND> (waiter parked via a non-do_wait_single path).

Conclusion: hypothesis (A) confirmed for 3 of the 4 handles. The producer code path is genuinely missing — NO Nt/KeSetEvent / KePulseEvent / KeReleaseSemaphore call EVER targets these handles during 500M instructions of execution. The PPC-vs-Rust traversal paradox (BST-bug from project_xenia_rs_sylpheed_event_chain_2026_04_29) is NOT the cause for these specific handles. The 3 handles share the same creator (lr=0x824a9f6c, tid=1, all at cycle=0) and the same wait-call wrapper (lr=0x824ac578) — likely 3 sibling worker threads all waiting for "work to do" notifications that never come. Most likely producer-class candidates for next session:

File I/O completion (signal_io_completion_event) — currently a real implementation but possibly never reached; trace NtReadFile paths to see if completion events would target these handles.
XAM async task completion — F2/F3 deferred from prior sprint.
Audio buffer-complete — XAudioRegisterRenderDriverClient is a one-shot stub.
Timer DPCs — KeSetTimer real impl but APC delivery may be routing wrong.

Acceptance criteria

#	Criterion	Met?
1	Phase 1: zero "PM4_XE_SWAP not consumed" warnings under canonical invocation	✅
2	Phase 2: per-handle DIAGNOSIS for all four parked handles	✅
3	Phase 3: vsync rate restored under --parallel; n2m golden untouched	✅ partial — rate up but FIFO cap=4 still bottlenecks
4	cargo test ≥556	✅ 561
5	All work merged to master	✅
6	STRETCH ≥1 of 4 handles signals	❌ — but data-driven hypothesis fail-fast tells us why (producer missing, not wake-eligibility bug)
7	STRETCH draws > 0 at -n 100M lockstep	❌ — gating remains parked-waiter handles

Recommended next session

Producer hunt for the 3 Event/Manual handles. With the diagnostic baked in, a focused hunt: identify the guest function at lr=0x824ac578 (the shared wait-call wrapper), walk its callers, find what kernel signal source SHOULD be wired for each handle. Likely starting points: file I/O completion (signal_io_completion_event), XamTaskSchedule callback (F2), XAudio buffer-complete.
Raise INTERRUPT_QUEUE_CAP for --parallel workloads — the 3044 dropped vsyncs at -n 30M --parallel suggest the FIFO is the next bottleneck.
F2/F3 (XAM async completion) per the still-deferred list, especially if Phase 2 of next session pinpoints a missing XAM producer.
GPUBUG-FETCH-PATCH-001: re-enable the PM4_TYPE0 fetch-constant patch via a side-channel (GpuCommand variant) when draws actually start firing — relevant for bloom/blur N+1.

Producer-hunt session 2026-05-03

XAMBUG-PRODUCER-001 — XamTaskSchedule was a no-op stub

Status: fixed. Hypothesis falsified for the parked-waiter set.

Site: crates/xenia-kernel/src/xam.rs:204 (pre-fix). Canary parity: xenia-canary/src/xenia/kernel/xam/xam_task.cc:43-80.

The pre-fix stub allocated a handle, logged it, and returned STATUS_SUCCESS — it never spawned a thread. Replaced with a canary-faithful implementation: allocates a ThreadImage, allocates a KernelObject::Thread handle, and routes through Scheduler::spawn with entry=callback, start_context=message_ptr (canary's third positional XThread arg). Stack sized as max(0x4000, page-aligned 0x10_0000).

Verification:

Unit test xam::tests::xam_task_schedule_spawns_real_thread confirms the spawned thread's pc == callback and gpr[3] == message_ptr.
Workspace tests: 561 → 562 green.
--stable-digest -n 100M lockstep: instructions=100000002 unchanged from baseline (interpreter determinism preserved).
--trace-handles-focus=0x1004,0x100c,0x15e4 -n 500M: no kernel.calls{name=XamTaskSchedule} counter appears — the call site at 0x824a9a10 is never reached within 500M instructions. Boot stalls earlier on the parked handles.

Outcome: the 3 focus handles still show signal_attempts=0 (primary=0, ghost=0) after 500M instructions. The XAM-task hypothesis is therefore falsified for this run — XamTaskSchedule cannot be the missing producer for these specific handles, because Sylpheed's only call site to it isn't reached before the deadlock.

The fix lands regardless: the stub was a real correctness bug that will manifest the moment the call site is reached (post-deadlock-resolution).

Recommended next producer candidate

XAudioRegisterRenderDriverClient (currently a one-shot stub, called once per the metric counter). Audio buffer-complete callbacks are a known signal source on Xbox 360 audio engines; the stub may be hiding the producer for one of the 3 handles. If that lead is also falsified, escalate to file I/O completion (signal_io_completion_event already real but possibly mis-routed) or Timer DPC delivery.

APUBUG-PRODUCER-001 — XAudioRegisterRenderDriverClient was stub + no callback ticker

Status: fixed (registration + ticker + injection landed). Hypothesis falsified for handles 0x1004 / 0x100c / 0x15e4.

Site: crates/xenia-kernel/src/exports.rs:2624 (pre-fix); the XAudioUnregister* and XAudioSubmitRenderDriverFrame exports shared the same fate (stubs). New module: crates/xenia-kernel/src/xaudio.rs.

Canary parity:

xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_audio.cc:56-93 (the three exports — register reads callback_ptr[0..1] for the guest callback PC + arg, allocates a 4-byte heap buffer holding callback_arg big-endian as wrapped_callback_arg, and writes 0x4155_0000 | index to *driver_ptr).
xenia-canary/src/xenia/apu/audio_system.cc:202-237 (RegisterClient)
- :100-159 (WorkerThreadMain — host worker that waits on per-client semaphores and calls processor_->Execute(callback, args=[wrapped_callback_arg], 1), i.e. r3 = wrapped pointer).
xenia-canary/src/xenia/apu/xaudio2/xaudio2_audio_driver.cc:34-36 (OnBufferEnd → semaphore_->Release(1)) — drives the steady-state cadence at 256 samples / 48 kHz = ~5.33 ms.

Implementation:

XAudioRegisterRenderDriverClient: reads callback_ptr[0..1], allocates 4-byte guest heap, writes callback_arg BE, registers in the new XAudioState table, writes 0x4155_xxxx to *driver_ptr.
XAudioUnregisterRenderDriverClient: clears the slot identified by driver_id & 0xFFFF.
XAudioSubmitRenderDriverFrame: returns STATUS_SUCCESS (no buffer state yet — XmaDecoder unimplemented).
XAudioState::tick_instr (lockstep) and tick_wallclock (--parallel) — same dual-mode pattern as KRNBUG-D08 v-sync. XAUDIO_INSTR_PERIOD = 48_000 and XAUDIO_PERIOD = 5.333 ms approximate canary's frame rate.
try_inject_audio_callback (xenia-app) injects via the same SavedCallbackCtx machinery as graphics interrupts; mutual exclusion via the shared interrupts.saved slot. r3 is set to wrapped_callback_arg per canary processor_->Execute.

Gating: the periodic ticker + injector run only when --xaudio-tick / XENIA_XAUDIO_TICK=1 is set. Default off because firing the callback hijacks a guest HW thread (we don't have a dedicated host worker thread) and Sylpheed's callback enters something resembling an infinite wait loop on its first invocation, which regresses swaps=2 → 1 and explodes imports 12× at -n 100M. Default-off preserves all existing lockstep goldens (sylpheed_n50m.json etc.).

Verification:

Workspace tests: 562 → 576 green (10 in xaudio.rs + 4 in exports.rs).
--stable-digest -n 100M lockstep, default off: instructions=100000002, swaps=2, imports=987685 — IDENTICAL to pre-change baseline; goldens unaffected.
--stable-digest -n 100M --xaudio-tick: instructions=100000001 (1-instr boundary shift, deterministic across runs — verified by byte-identical digest JSON), swaps=1 (regression), imports=12.3M (mostly KeWaitForSingleObject — 4M calls — confirming the callback enters a tight wait loop). 1 audio callback fires (xaudio.callback.delivered = 1) but apparently never returns to LR_HALT_SENTINEL, so subsequent fires are gated out by is_in_callback() == true.
--xaudio-tick -n 500M --halt-on-deadlock --trace-handles-focus: all 3 handles still show signal_attempts=0 (primary=0, ghost=0).

Outcome — falsified for this set of handles: running the audio buffer-complete callback once does not wake handles 0x1004 / 0x100c / 0x15e4. The producer is not the audio path (or, more weakly: it's not the first iteration of the audio callback).

Side effects worth noting for the next session:

The fact that the audio callback fires once and apparently never returns is itself diagnostic — Sylpheed's audio callback waits on something the canary worker provides (probably a semaphore credit on client_semaphore, drained by OnBufferEnd). Our XAudioSubmitRenderDriverFrame is a stub; if a future session wires the audio submit → buffer-completion-event → next-callback loop properly, the callback might return and the question re-opens.
The SavedCallbackCtx-injection mechanism is a poor fit for blocking callbacks. Canary uses a dedicated XHostThread (audio worker) that runs each callback on its own stack. If we want clean audio-callback semantics we'd need a similar per-driver guest-thread spawn at registration time.

Recommended next producer candidate (post-APUBUG-PRODUCER-001)

Per the producer-hunt charter the remaining strong candidates are Timer DPC delivery (KeSetTimer / KeInsertQueueDpc — exports.rs has stubs/partials) and file I/O completion event routing. Timer DPC is the next-strongest because the parked handles are explicit Event/Manuals with no current waker, and Xbox 360 timer-driven DPCs are a common signal source.

KRNBUG-AUDIT-002 — multi-frame stack capture at handle creation

Status: landed (diagnostic only; no behaviour change). Walker verified end-to-end against the analysed call graph for every captured frame.

Site: crates/xenia-kernel/src/audit.rs (new record_create_with_stack, new created_stack: Vec<(u32,u32)> on HandleAuditTrail); crates/xenia-kernel/src/state.rs (new audit_create_with_ctx helper + free function walk_guest_back_chain(sp, lr, mem, max)); nt_create_event / nt_create_semaphore / nt_create_timer / xam_task_schedule now route through the new helper. Dump in crates/xenia-app/src/main.rs prints created stack (N frames) under the per-handle FOCUS report.

Why it exists: KRNBUG-AUDIT-001 told us the producer is missing for handles 0x1004 / 0x100c / 0x15e4 (later corrected to 0x15e0 — see below) but couldn't tell us which subsystem owns each handle. The wrapper at lr=0x824a9f6c is the same silph::Event ctor for 83 unique callers, so the immediate LR is useless for subsystem identification. The new walker captures up to 6 stack frames at create time, gated on the focus set so the cost is one HashSet::contains on the unfocused hot path.

Walker correctness: PPC EABI back-chain ([r1] = prev_sp, saved-LR-of-prev-frame at [prev_sp - 8]). Frame 0 is the live (ctx.gpr[1], ctx.lr) since the wrapper hasn't spilled its own LR yet. Sentinels: 0, 0xFFFFFFFF, self-loop. Read-only via MemoryAccess::read_u32 — guest memory and CPU state are not mutated, so lockstep determinism is unaffected.

Verification:

Workspace tests: 576 → 581 (+5: 2 new in audit.rs exercising the record_create_with_stack path including the disabled-no-op case; 3 new in state.rs exercising synthetic 3-level back-chain, self-loop sentinel, zero sentinel).
--stable-digest -n 50M lockstep oracle (sylpheed_n50m): bit-identical to checked-in golden (re-confirmed twice).
End-to-end: every captured frame's saved-LR matches a bl instruction one address earlier in the named function (cross-checked against sylpheed.db's instructions table for all 18 captured PCs across handles 0x1004 / 0x100c / 0x15e0).

Producer-trace finding (KRNBUG-AUDIT-002 deliverable)

Run: exec sylpheed.iso --halt-on-deadlock --trace-handles-focus=0x1004,0x100c,0x15e0,0x42450b5c -n 500_000_000.

0x1004 (tid=10 waiter): static C++ ctor → 8-instance pool

[0] sub_824A9F18 +0x54   silph::Event ctor wrapper (83 callers)
[1] sub_821783D8 +0x120  per-instance subsystem-init (RtlInitializeCSAndSpinCount + Event ctor)
[2] sub_8217C850 +0x58   single per-pool-element bridge ctor
[3] (no func) +0x14      static ctor at 0x8280F810; calls sub_8217C850 EIGHT times
[4] sub_824ACB38 +0xb8   the CRT static-init driver (walks 0x82870010..0x828708d4)
[5] entry_point +0x60    the standard CRT entry stub

The 8-instance call from frame 3 is the smoking gun: 0x8280F810 is a single C++ static constructor that builds an 8-element array of objects, each of which gets its own Critical Section + Event + worker thread. This is a thread pool, constructed before main() runs.

0x100c (tid=2 waiter): runtime init in main() → singleton

[0] sub_824A9F18 +0x54   silph::Event ctor wrapper
[1] sub_82181750 +0x70   per-instance subsystem-init (same shape: CS + Event)
[2] sub_821800D8 +0x3c   single-call bridge ctor
[3] sub_82181C20 +0x38   subsystem driver
[4] sub_8216EA68 +0x3c   (top-level main; called from entry_point + 0x194 with r3=r4=r5=0)
[5] entry_point +0x198   right after `bl 0x8216EA68`

Different code cluster (0x82181xxx), single instance, constructed inside main() itself — not from C++ static init. This is a runtime-allocated singleton subsystem.

0x15e0 (tid=16 waiter): runtime init via a third distinct cluster

[0] sub_824A9F18 +0x54   silph::Event ctor wrapper
[1] sub_821701C8 +0x48   per-instance subsystem-init (CS + Event, callees mirror 0x100c's path)
[2] sub_8216F618 +0x44   bridge
[3] sub_821707C0 +0x38   driver
[4] (no func) +0x?       0x821C5418 — analyser missed this function entry
[5] sub_82172BA0 +0x1ec  upper-level subsystem driver

Third distinct C++ class in cluster 0x82170xxx. Same per-instance shape (CS + Event + worker thread); different call site than 0x100c.

Cross-check on the project memory list: the prior memory listed the third handle as 0x15e4; the actual handle on this run is 0x15e0 (off-by-4 in the prior session's transcription). The parked-waiter set as of HEAD 9d45efe is:

Handle	Tid	Waits via	Trail status	Note
0x1004	10	`do_wait_single`	primary	static-ctor pool (8 entries)
0x100c	2	`do_wait_single`	primary	runtime singleton
0x15e0	16	`do_wait_single`	primary	runtime singleton, distinct class
0x12f4	13,14	`do_wait_single`	primary	shared Semaphore — 2 waiters
0x15f8	18	`do_wait_multiple`	primary	Event/Auto
0x1038	4	`do_wait_multiple`	primary	Event/Auto, in WaitAny[0x1038, 0x103c]
0x10b0	5	`do_wait_multiple`	primary	Event/Auto, in WaitAny[0x10b0, 0x10b4]
0x42450b5c	6	(non-`do_wait_single`)	`<UNCREATED>` `<AUDIT_BLIND>`	guest-memory addr (heap range), bypasses Nt-side audit entirely

0x42450b5c is qualitatively different. Address >= 0x40000000 is the guest user heap range, not a kernel handle table value (which start at 0x1000 and increment by 4). tid=6 is parking on a guest pointer — almost certainly an embedded KEVENT reached via KeWaitForSingleObject(*PDISPATCHER_HEADER) rather than via a handle. Our audit didn't see the wait either (waits=0 while waiter_count=1), so the wait path itself bypasses do_wait_single. Treat as a separate bug class.

Subsystem identification

All three Event/Manual creators (sub_821783D8, sub_82181750, sub_821701C8) follow the identical 4-callee pattern:

RtlInitializeCriticalSectionAndSpinCount (init the per-instance CS)
sub_824A9F18 (the silph::Event ctor wrapper → NtCreateEvent)
- 1-2 silph internal helpers (sub_82172370, sub_824AA3E0, sub_8217E948, sub_82274C70, etc.) which initialize a queue and spawn a worker thread

Each parked worker does the same prologue: silph::Thread::SetProcessor(CURRENT, 5) (via sub_824AA658(r3=-2, r4=5)), then either an lwarx/stwcx CAS-spinlock + RtlEnterCriticalSection + check for queued work.

This is the canonical work-queue worker pattern: a producer posts a message to an instance's queue (under the CS) and signals the Event; the worker wakes, drains, parks again. The producer that should call Nt/KeSetEvent(handle) is never executed within 500M instructions for any of the 3 handles.

The PE's RTTI string table lists thread-related class names in the SilpheedSCS namespace: WorkHudThread2, WorkHudThreadTaskCaller, COLLISION_THREAD_PARAM, UPDATER_THREAD_PARAM, CRenderCommandQueue, CTaskUpdater. The 8-element static-init pool for 0x1004 most plausibly maps to one of the multi-instance worker classes in this list (likely WorkHudThread2 family — the only Thread-suffixed multi-instance candidate); the singletons 0x100c and 0x15e0 most plausibly map to two of CTaskUpdater / CRenderCommandQueue / similar singletons. Without a live debugger pass to read the vtable+RTTI block at the this pointer of each worker, the exact class assignment is heuristic.

Recommended next session — surgical producer hunt

The producer for each Event is the call site that owns the matching message-push code path. Steps:

For each Event handle (0x1004, 0x100c, 0x15e0), dump the first 12 bytes of the this pointer to read the vtable address (the this is in r3 at the worker's ABI entry — captured in the wait diagnostic as the first arg). Then read vtable[-1] to resolve the RTTI Type Descriptor, which gives the class name. This pinpoints exactly which SilpheedSCS::* class each subsystem is.
Then xref the class name in the binary to find the producer-side method (Push*, Submit*, EnqueueMessage*, Notify*). That method's signal call (likely silph::Event::Set → NtSetEvent thunk) is what should fire.
Trace that producer's caller chain to find the upstream gate. Two failure modes are equally likely:
- (A) The producer DOES run but signals via KeSetEvent on an embedded KEVENT field (not the handle-table side), and our HLE KeSetEvent doesn't route to the handle-table waiter list. Same family as 0x42450b5c. Smoking gun: kernel.calls metric for KeSetEvent is non-zero but the audit shows zero signals.
- (B) The producer is gated by an upstream condition that doesn't trigger — e.g. UI-system message that never arrives, timer-DPC that never fires, vsync interrupt with the wrong APC routing. Smoking gun: kernel.calls{KeSetEvent} is zero for that handle.
0x42450b5c is a separate bug. Add a parallel audit_create_with_ctx hook to whichever wait path tid=6 takes (it's NOT do_wait_single); the function span at PC=0x824cd4f4 isn't even in the analyser's functions table. Likely the KeWaitForSingleObject(*PDISPATCHER_HEADER) wrapper. Once the wait path is audited, repeat the producer-trace.

The walker is reusable: any handle added to --trace-handles-focus will get a 6-frame stack at creation time. Add new candidates freely — cost on the unfocused hot path is one HashSet::contains.

KRNBUG-AUDIT-003 — vtable/RTTI class probe + dispatcher identification

Status: landed (diagnostic only; no behaviour change). Verified end-to-end against 5 unit tests + the producer-trace pass at -n 500M.

Site: crates/xenia-kernel/src/state.rs — new ClassReadout enum + read_class_at_this(this, mem) + probe_create_stack_classes( ctx, frames, mem) + private helpers (is_likely_guest_heap_ptr, is_likely_image_ptr, read_ascii_cstring). crates/xenia-kernel/src/audit.rs — extended HandleAuditTrail with created_class_probes: Vec<String> + new record_create_with_stack_and_probes. crates/xenia-app/src/main.rs — dump_thread_diagnostic now takes &GuestMemory; FOCUS report prints WAIT-THREAD blocks with per-frame back-chain + saved register slots + class probes.

Why it exists: AUDIT-002 gave us back-chain frames at handle creation. AUDIT-003's promise was "recover the dispatcher's MSVC C++ class name via vtable[-4] → COL → TypeDescriptor" so the producer hunt could read "who should call Class::Submit but doesn't" instead of "who should signal handle X."

Probe correctness: MSVC RTTI traversal (vtable[-4] = COL, COL+0x0c = TypeDescriptor, TypeDescriptor+8 = NUL-terminated mangled name starting .?A). False-positive guard: at least the first two vtable slots must be image-range function pointers. This rejects the CRT static-init iterator pattern where r31 holds a pointer into the init-fn array and [r31] is a function PC, not a vtable.

Verification:

Workspace tests: 581 → 586 (+5: 4 new in state.rs exercising RTTI-intact / RTTI-stripped / non-object / read_ascii_cstring termination + 1 integration test for probe_create_stack_classes).
--stable-digest -n 100M lockstep oracle: instructions=100000002 (unchanged).
sylpheed_n50m golden: passes.
End-to-end: 500M producer-trace run captured at audit-runs/audit-003/run-500m-v4.txt. RC=0.

KRNBUG-AUDIT-003 finding — dispatcher addresses + decisive xref audit

Run: exec sylpheed.iso --halt-on-deadlock --trace-handles-focus= 0x1004,0x100c,0x15e0,0x42450b5c -n 500_000_000.

Handle 0x100c — dispatcher at 0x828F3D08:

Confirmed three ways:

Per-frame saved-r31 capture at handle creation:

frame=1 lr=0x821817c0 saved-r31=0x828f3d08      ← per-instance ctor
frame=2 lr=0x82180114 saved-r31=0x828f3d08      ← bridge ctor (same value)

Disassembly of sub_82181750 at +0x14: addis r11, r0, 0x828F; addi r31, r11, 15624 ⇒ r31 = 0x828F3D08 (the this for the per-instance ctor).
Field-level write tracking via xrefs.kind=write: pc=0x82181778 in sub_82181750 — stw r11, 0(r31) writes -1 to [this+0].

[this+0] = -1 is decisive: this is a hand-rolled POD job-queue struct, not a C++ polymorphic class. No vtable means no RTTI; "class name" doesn't exist in MSVC mangled form. The probe correctly rejected 0x828F3D08 as a class candidate.

Field layout (from sub_82181750 disasm):

[this+ 0] = -1                 ; sentinel (not a vtable)
[this+ 4..12] = 0
[this+20] = 0  (halfword)
[this+36] = 0
[this+40] = 7                  ; count or version
[this+44..(44+256)]            ; sub-region init by `bl 0x8284DCEC`
[this+72] = thread_handle      ; set after thread spawn
[this+76] = event_handle       ; = 0x100c, set after silph::Event ctor
[this+88..104] = 0

Worker is sub_82181830: receives r3=this, copies r28=this and r29=&this[44], does silph::Thread::SetProcessor(CURRENT, 5), then lwarx/stwcx. on &this[80]. Wait-side telemetry confirms: the parked thread's spilled r28-r31 area has 0x828F3D08 (=r28 base) and 0x828F3D34 (= base+44 = r29).

Handle 0x15e0 — dispatcher at 0x828F4070:

Confirmed via xref table. Same shape as 0x100c (POD job queue, not a C++ class). Constructed by sub_821701C8 + sub_8216F618.

Handle 0x1004 — 8-instance pool, member addresses still TBD.

The MSVC ctors for the per-instance and bridge functions did not preserve this in r31 across the call into silph::Event::Ctor, so the saved-r31 chain captured at create time shows stack-relative pointers (frames 1, 2, 5) and the CRT init-fn iterator pointer 0x82870180 (frames 3, 4) instead of the pool member's this. Recovering the 8 pool addresses requires hooking sub_8217C850's entry to capture r3 at each of its 8 calls from the static ctor at 0x8280F810.

Handle 0x42450b5c — separate bug class. Heap-allocated (0x4xxxxxxx is user-heap range), parks via non-do_wait_single path. AUDIT-003's image-rdata-focused probe doesn't apply. Track under a separate audit ID.

Decisive xref audit — producer is unreached:

0x828F3D08 (handle 0x100c) — 4 references in static analysis:
  pc=0x82180100  in sub_821800D8  (kind=ref)   — bridge ctor
  pc=0x8218176c  in sub_82181750  (kind=ref)   — per-instance ctor
  pc=0x82181778  in sub_82181750  (kind=write) — per-instance ctor init
  pc=0x8284caa4  in sub_8280C2C0  (kind=ref)   — CRT init driver

0x828F4070 (handle 0x15e0) — 5 references:
  pc=0x8216f650  in sub_8216F618  (kind=ref)   — bridge ctor
  pc=0x8216f674  in sub_8216F618  (kind=ref)   — bridge ctor
  pc=0x821701e4  in sub_821701C8  (kind=ref)   — per-instance ctor
  pc=0x82170330  in sub_821701C8  (kind=ref)   — per-instance ctor
  pc=0x8284c9a4  in sub_8280C2C0  (kind=ref)   — CRT init driver

Every xref is in a ctor or the CRT. No producer code references either dispatcher base. Confirms AUDIT-001/002's signal_attempts=0: the producer is unreached, not broken. The static analysis would miss producers that operate via a this register passed through a function arg (no constant-load), but the simple "load_const dispatcher_addr; call submit(this, work)" pattern is not present in the binary for 0x828F3D08 / 0x828F4070.

Recommendation for next session (no implementation here):

Investigate the call-chain main() → sub_82181C20 → sub_82181750. sub_82181C20 is a subsystem driver — it constructs the queue and should ALSO wire it into a feeder. If the feeder is itself a static-init that's never invoked, the trail leads back to the CRT init array driver (sub_824ACB38, walks 0x82870010..0x828708D4) and whatever scheduling subsystem is supposed to drive those.
Hook sub_8217C850 entry under --trace-handles-focus=0x1004 to capture r3 at each of its 8 calls — those are the pool member this addresses for handle 0x1004's 8-instance pool.
Treat 0x42450b5c independently. AUDIT-002's hook missed it because the parking site (PC=0x824cd4f4) isn't routed through do_wait_single. Open KRNBUG-AUDIT-004 for that wait path.

KRNBUG-AUDIT-004 — `--ctor-probe` PC hook + `--dump-addr` struct dump; producer-indirection layer identified; "8-instance pool" hypothesis falsified

Status: landed on master (no-ff merge of feature branch dispatcher-probe-audit/p0-ctor-probe-and-struct-dump). Diagnostic- only, read-only, lockstep-preserved (instructions=100000002 at -n 100M --stable-digest).

Tests: 586 → 588.

What landed (crates/xenia-kernel/src/state.rs):

pub ctor_probe_pcs: HashSet<u32> field on KernelState (default empty).
pub fire_ctor_probe_if_match(hw_id, mem) — fast-rejects when set is empty; on match prints a one-shot record CTOR-PROBE pc=... tid=... hw=... cycle=... sp=... r3=... lr=... plus an 8-frame back-chain with saved-r31/r30 per frame. Pure read.
pub dump_addrs: Vec<u32> field for end-of-run struct dumps.
2 unit tests: empty-set no-op, set-membership invariant.

What landed (crates/xenia-app/src/main.rs):

--ctor-probe=0x8217C850,0x82181750,... CLI flag (and XENIA_CTOR_PROBE). Parsed into kernel.ctor_probe_pcs at cmd_exec_inner startup.
--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,... CLI flag (and XENIA_DUMP_ADDR). Each address gets a 128-byte hex+be32+ASCII dump at end-of-run, after the per-handle FOCUS report.
worker_prologue calls fire_ctor_probe_if_match after reading pc and before any thunk-dispatch / step-block branch. dump_thread_diagnostic consumes kernel.dump_addrs.

Decisive findings (corrects KRNBUG-AUDIT-002/003):

The "8-instance pool" hypothesis for handle 0x1004 is FALSE. Probe ran at -n 50M --halt-on-deadlock with PCs [0x821783D8, 0x82181750, 0x821701C8] (the per-instance ctors for handles 0x1004 / 0x100c / 0x15e0 respectively). Each fired EXACTLY ONCE:
```
CTOR-PROBE pc=0x821783d8 tid=1 hw=0 cycle=1401430 r3=0x828f3ec0  ← handle 0x1004
CTOR-PROBE pc=0x82181750 tid=1 hw=0 cycle=5363599 r3=0x828f3d08  ← handle 0x100c
CTOR-PROBE pc=0x821701c8 tid=1 hw=0 cycle=9203618 r3=0x828f4070  ← handle 0x15e0
```
Handle 0x1004 has a SINGLE dispatcher at 0x828F3EC0, not 8 pool members. The earlier "called 8 times" claim came from counting raw entries to the OUTER getter sub_8217C850, but sub_8217C850 is a Meyers-style singleton-getter — its inner bl 0x821783D8 (the per-instance ctor) is gated on a one-shot init flag at [0x828F48D8] bit 0. Subsequent sub_8217C850 calls just return the existing slot pointer.

The producer indirection layer IS the singleton-getter itself. Static byte-scans of .rdata and .data show 0 hits for the dispatcher addresses 0x828F3D08 / 0x828F4070 — so no registry table holds them. But the xrefs table for the OUTER getters reveals:

sub_821800D8 (outer for 0x828F3D08, handle 0x100c): 6 callers
  0x821802d8 (sub_82180158+0x180)   ← non-create-chain
  0x821806e0 (sub_821805C8+0x118)   ← non-create-chain
  0x82180b28 (sub_82180A10+0x118)   ← non-create-chain
  0x82180ea0 (sub_82180D90+0x110)   ← non-create-chain
  0x82181254 (sub_821810E0+0x174)   ← non-create-chain
  0x82181c54 (sub_82181C28+0x2C)    ← create-chain ONLY

sub_8216F618 (outer for 0x828F4070, handle 0x15e0): 5 callers
  0x8216f9d4 (sub_8216F818+0x1BC)   ← non-create-chain
  0x8216fc08 (sub_8216F9F0+0x218)   ← non-create-chain
  0x821700b8 (sub_8216FF70+0x148)   ← non-create-chain
  0x821700f4 (sub_821700E0+0x14)    ← non-create-chain
  0x821707f4 (sub_821707C0+0x34)    ← create-chain ONLY

The non-create-chain consumers all share the canonical producer pattern:

bl outer_singleton_getter   ; r3 = dispatcher ptr
lwz r3, OFFSET(r3)          ; r3 = an event handle / queue field
bl 0x824AA1D8               ; signal/wake function

For 0x100c the offset is 80 (= 0x50); for 0x15e0 the offset is 36 (= 0x24).

So interpretation (2) of the audit charter is confirmed: producers reference the dispatchers via a function-call layer of indirection, not through direct address materialization. The xref-table audit in AUDIT-003 (which only catches direct constant-loads of the dispatcher base) was necessary but not sufficient — it correctly saw "no direct producer references" but missed the singleton-getter indirection.

Dispatcher struct layouts (128-byte dumps at -n 50M --halt-on-deadlock):

0x828F3D08 (handle 0x100c, per-instance ctor sub_82181750):
  +0x00 = 0xFFFFFFFF                  ; queue head/tail sentinel
  +0x28 = 0x00000007                  ; capacity = 7
  +0x2C = 0x01000000                  ; init flag
  +0x3C = 0xFFFFFFFF                  ; secondary sentinel
  +0x48 = 0x00001010                  ; thread_handle (worker thread)
  +0x4C = 0x0000100C                  ; event_handle (= self handle 0x100c)
  +0x50 = 0x00000000                  ; producer reads this — currently 0
  +0x70 = 0x00000001                  ; refcount?
  +0x74 = 0x828F3D08                  ; self-pointer

0x828F4070 (handle 0x15e0, per-instance ctor sub_821701C8):
  +0x00 = 0x01000000                  ; init flag
  +0x10 = 0xFFFFFFFF                  ; queue sentinel
  +0x1C = 0x000015E4                  ; sibling-handle (NOT in our parked
                                      ; set — possibly a thread handle)
  +0x20 = 0x000015E0                  ; event_handle (= self handle 0x15e0)
  +0x24 = 0x00000000                  ; producer reads this — currently 0
  +0x40 = 0xFFFFFFFF                  ; secondary sentinel

0x828F3EC0 (handle 0x1004, per-instance ctor sub_821783D8):
  +0x00 = 0x01000000                  ; init flag
  +0x10 = 0xFFFFFFFF                  ; queue sentinel
  +0x20 = 0x40541BC0                  ; heap pointer (sub-buffer #1)
  +0x30 = 0x00000014                  ; size 20
  +0x34 = 0x0000002F                  ; size 47
  +0x38 = 0x414F5F60                  ; heap-range payload (or two halfwords)
  +0x3C = 0x40211CA0                  ; heap pointer (sub-buffer #2)
  +0x44 = 0x405418C0                  ; heap pointer (sub-buffer #3)
  +0x50 = 0x40111840                  ; heap pointer (sub-buffer #4)
  +0x58 = 0xFFFFFFFF                  ; sentinel
  +0x5C = 0xFFFFFFFF                  ; sentinel
  +0x76 = 0x000012AC                  ; possibly thread id
  +0x78 = 0x00001004                  ; event_handle (= self handle 0x1004)

The 0x1004 dispatcher is noticeably different: it owns 4 guest-heap sub-buffers in 0x4xxxxxxx range, suggesting it manages a more complex resource than the other two (which are pure POD job queues). The +0x78 location of the event_handle differs from 0x100c's +0x4C and 0x15e0's +0x20, so each subsystem has its own struct layout (no shared base class).

Reproduce:

cargo run --release -p xenia-app -- exec 'sylpheed.iso' \
  --halt-on-deadlock \
  --trace-handles-focus=0x1004,0x100c,0x15e0 \
  --ctor-probe=0x821783D8,0x82181750,0x821701C8 \
  --dump-addr=0x828F3D08,0x828F4070,0x828F3EC0 \
  -n 50000000

Trace files saved at:

audit-runs/audit-004/run-50m-probe.txt (outer-getter probes)
audit-runs/audit-004/run-50m-probe-v2.txt (inner-ctor probes — singleton hypothesis confirmed)

Recommendation for next session (do not implement a fix):

Hook the entry of each non-create-chain consumer site for handle 0x100c (5 sites: 0x821802d8, 0x821806e0, 0x82180b28, 0x82180ea0, 0x82181254) and for handle 0x15e0 (4 sites: 0x8216f9d4, 0x8216fc08, 0x821700b8, 0x821700f4) using --ctor-probe=.... If any fire, then the producer DOES execute and the failure mode is in the wake/signal chain (probably lwz r3, OFFSET(r3) reads zero — see dispatcher dump [+0x50] = 0 for 0x100c, [+0x24] = 0 for 0x15e0 — and the wake function 0x824AA1D8 is then called with handle=0). If none fire, the producer chain is gated upstream (likely a feature flag, init phase, or RPC handler that never fires). Either way, the next diagnostic narrows the bug surface dramatically.

KRNBUG-AUDIT-005 — `--pc-probe` extended syntax + canary kernel-call diff; `XexCheckExecutablePrivilege` stub gates init flow

Status: landed on master (no-ff merge of feature branch canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade). Diagnostic- only, read-only, lockstep-preserved (run digest matches golden at -n 50M --stable-digest).

Tests: 588 → 588 (unchanged; existing ctor-probe tests cover the shared infrastructure).

What landed (crates/xenia-kernel/src/state.rs):

pub pc_probe_consumers: HashMap<u32, (u32, u32)> field on KernelState (default empty). Maps a probe PC to a (dispatcher_addr, offset) pair; on hit the helper additionally logs [disp+off] — what the producer's lwz r3, OFFSET(r3) is about to read after bl outer_getter returns the dispatcher in r3.
fire_ctor_probe_if_match extended to read+print the consumer field when present. Pure load — does not mutate guest state.

What landed (crates/xenia-app/src/main.rs):

--pc-probe clap alias on --ctor-probe (semantically clearer name; both share parser/storage).
Extended token syntax PC@DISPATCHER:OFFSET parsed via existing parse_hex_u32. Plain PC form still works (backward compatible).
XENIA_PC_PROBE env var as alias for XENIA_CTOR_PROBE.

What landed (audit-runs/audit-005/): one-shot diagnostic artifacts — not part of the repo build:

canary.log — copy of /home/fabi/xenia_canary_windows/xenia.log from a Lutris launch of Sylpheed; oracle for what should happen
ours.log — our trace at -n 500M with the 9-PC probe + probe_calls=trace filter (838 MB, 5.6 M lines)
diff.py — kernel-call sequence diff (set-diff + first-divergence window); deletable after the audit
probe-test-10m.log — initial smoke test confirming probe wiring

Reproduce:

cargo run --release -p xenia-app -- \
  --log-filter='probe_calls=trace,xenia=warn' \
  exec sylpheed.iso \
  --halt-on-deadlock \
  --trace-handles-focus=0x1004,0x100c,0x15e0 \
  --pc-probe=0x821802D8@0x828F3D08:80,0x821806E0@0x828F3D08:80,\
0x82180B28@0x828F3D08:80,0x82180EA0@0x828F3D08:80,\
0x82181254@0x828F3D08:80,0x8216F9D4@0x828F4070:36,\
0x8216FC08@0x828F4070:36,0x821700B8@0x828F4070:36,\
0x821700F4@0x828F4070:36 \
  -n 500_000_000 \
  2> audit-runs/audit-005/ours.log

python3 audit-runs/audit-005/diff.py --max 100 --window 30

Decisive findings:

Failure mode (α) for KRNBUG-AUDIT-004 confirmed. All 9 non-create-chain producer call sites for handles 0x100c (5 sites at 0x821802D8 / 0x821806E0 / 0x82180B28 / 0x82180EA0 / 0x82181254) and 0x15e0 (4 sites at 0x8216F9D4 / 0x8216FC08 / 0x821700B8 / 0x821700F4) fire 0× at -n 500M (grep -c CTOR-PROBE ours.log == 0). The producer code path is not reached. Rules out failure mode (B: lwz reads zero) and (3: wake function called with stale handle). The bug is upstream, in the control-flow that should lead the guest to those producer functions.

Upstream control-flow divergence located: XexCheckExecutablePrivilege stub returning 0. Set-diff of kernel-call sequences across our 500M-instruction run vs canary's full Sylpheed boot (canary.log, ~5.3K lines, post-swaps=2 boot loop reached) identifies 11 exports that canary calls and we don't:

ExTerminateThread (×2)
KeReleaseSemaphore (×268)        ← we use Nt* equivalents
KeResetEvent (×1)
NtDeviceIoControlFile (×2)
ObCreateSymbolicLink (×1)
XGetAVPack (×1)                  ← gated by priv-10 check
XamTaskCloseHandle (×1)
XamTaskSchedule (×1)             ← AUDIT-002 producer candidate
XamUserReadProfileSettings (×2)
XeCryptSha (×1)
XeKeysConsolePrivateKeySign (×1)

XGetAVPack has exactly one caller (xrefs table): site 0x824AB5A0 inside sub_824AB578. The 4 instructions immediately preceding it are:

824ab58c addi    r3, r0, 10            ; privilege bit 10
824ab590 addi    r31, r0, 0
824ab594 bl      0x8284DEFC            ; XexCheckExecutablePrivilege
824ab598 cmpli   0, r3, 0x0
824ab59c bc      12, eq, 0x824AB724    ; if r3==0, skip whole block
                                       ; (XGetAVPack + crypto + Nt writes)

Our impl crates/xenia-kernel/src/exports.rs:193:

state.register_export(Xboxkrnl, 0x0194, "XexCheckExecutablePrivilege",
                      stub_return_zero);

stub_return_zero returns r3=0 unconditionally → guest takes the bc 12, eq, 0x824AB724 branch and skips the entire AV/crypto/save-data init block.

The OTHER call site (sub_824A9710, 0x824A99A0) queries privilege bit 11:

824a999c addi    r3, r0, 11
824a99a0 bl      0x8284DEFC            ; XexCheckExecutablePrivilege(11)
824a99a4 cmpli   0, r3, 0
824a99a8 bc      4, eq, 0x824A9A60     ; bne — skip block if priv set

Different polarity (this one gates XamTaskSchedule etc. on the privilege-NOT-set path). With both stubs returning 0, the guest walks the wrong arm of every privilege-gated branch.

Cascade reaches the parked-waiter handles. Trace evidence: our probe_calls log shows lr=0x824A97E4 (a hit from the error path inside sub_824A9710 after sub_824ABA98 returned negative NTSTATUS). The canary log shows all 11 missing exports firing in a single contiguous boot phase between XexCheckExecutablePrivilege and the worker-thread spawn — i.e. the init phase that sets up the dispatcher data structures is exactly the phase we skip. This explains why the dispatcher fields read zero (AUDIT-004 dump: [0x828F3D08+0x50] = 0, [0x828F4070+0x24] = 0): the ctors run (we counted those), but the producers that would populate those fields with a non-zero handle never execute, because the upstream init flow that registers them is gated by the privilege checks.
Note on the diff: canary's log is filtered. Canary's config has log_high_frequency_kernel_calls = false, which suppresses most Rtl*, Mm*, Ke*-internal calls from the log. The "called in OURS but not canary" set (23 entries, headed by NtWaitForSingleObjectEx ×1.5M) is dominated by this filter difference — it is not a bug surface. The directionally meaningful side of the diff is "called in CANARY but not OURS" (above): canary's log includes every low-frequency call, so any absence on our side is a real divergence.

Stop conditions check:

Canary itself does NOT stall at swaps=2 — it reaches a steady frame loop with XamInputGetCapabilities polling, texture loads, KeReleaseSemaphore ticks. The diff was informative.
First divergence is dense early-CRT noise (~3 entries in), but the meaningful divergence anchored to a concrete export (XGetAVPack, deterministically gated by a one-line stub) was recoverable via set-diff. Did not need to narrow scope further.

Recommendation for next session (do not implement a fix here — this is the read-only audit deliverable):

Replace stub_return_zero for XexCheckExecutablePrivilege with a real implementation. The XEX header's privilege bitmask is parsed during XEX load (see crates/xenia-xex/); KernelState already holds the loaded image_base. Implementation outline:

Parse XEX_HEADER_EXECUTION_INFO / privilege bits at load time into KernelState (or surface via Vfs already-loaded XEX metadata).
xex_check_executable_privilege(priv_id) -> u32: return 1 if bit priv_id is set in the title's privilege bitmask, else 0. Match canary's encoding (privilege IDs are 0..7F; canary reads PrivilegeFlags[i/8] & (1 << (i%8)) from the XEX execution info).

Validation after the fix:

Re-run audit-runs/audit-005/diff.py — XGetAVPack, XamTaskSchedule, XeCryptSha, etc. should appear in our sequence and the divergence should advance several hundred calls past the priv-check.
Re-run with the 9-PC probe armed at -n 500M — at minimum, the ctor-probe firings change, and ideally one or more of the 9 producer sites starts firing.
If producer sites fire, dispatcher fields [0x828F3D08+0x50] / [0x828F4070+0x24] become non-zero (use --dump-addr).
Lockstep golden crates/xenia-app/tests/golden/sylpheed_n50m.json will likely change (imports count goes up, swaps may advance); regenerate the golden under --stable-digest and treat that as the new lockstep anchor.

If after the fix the producer is reached and dispatcher fields populate, the parked-waiter deadlock should resolve — or surface the next layer of bugs (e.g. signaling code reads non-zero handle but wake_eligible_waiters fails).

KRNBUG-XEX-001 — `XexCheckExecutablePrivilege` real impl (P0 fix landed)

Branch: xex-check-privilege/p0-real-impl (no-ff merged to master). Status: LANDED 2026-05-04. Closes the priv-stub side of KRNBUG-AUDIT-005.

Implementation. Replaced stub_return_zero at crates/xenia-kernel/src/exports.rs:193 with a real implementation that reads the XEX XEX_HEADER_SYSTEM_FLAGS (key 0x00030000) bitmap. Mirrors canary's XexCheckExecutablePrivilege_entry xboxkrnl_modules.cc:22-39: (flags >> priv) & 1 for priv < 32, else 0.

Plumbing:

xenia-xex/src/header.rs: added header_keys::SYSTEM_FLAGS = 0x00030000.
xenia-xex/src/loader.rs: added get_system_flags(&Xex2Header) -> u32.
xenia-kernel/src/state.rs: added pub xex_system_flags: u32 (init 0)
- xex_priv_logged: HashSet<u32> (one-shot log gate per priv).
xenia-app/src/main.rs: wired kernel.xex_system_flags = xenia_xex::loader::get_system_flags(&header) alongside the existing kernel.image_base = base line in cmd_exec_inner.

Sylpheed's bitmap is 0x00000400 (only XEX_SYSTEM_PAL50_INCOMPATIBLE set, bit 10). So priv 10 → 1, priv 11 → 0. Both call sites identified in AUDIT-005 now route through the canary-correct branches.

Validation chain (Step 3 of the hand-off):

step	outcome
(a) `cargo test --workspace --release`	588 → 589 (new test `xex_check_executable_privilege_reads_system_flags_bitmap`); all prior green
(b) `--stable-digest -n 100M` lockstep	`instructions=100000013` (was `100000002`). 11-instruction shift is the deterministic guest divergence into the canary-correct branch — verified with 2 identical re-runs. NOT nondeterminism.
(c) AUDIT-005 9-PC probe at -n 500M	All 9 producer probe sites still 0×. BUT `kernel.calls{XGetAVPack}` went from `0` → `1` (priv-10 gate flipped — XexCheckExecutablePrivilege itself only fires once for priv 10 because priv-11 site at `sub_824A9710` is downstream and not yet reached).
(d) `--trace-handles-focus=0x1004,0x100c,0x15e0`	All 3 handles still `signal_attempts=0`. The 9 probed PCs are members of two indirection-chain singletons (`sub_821800D8` for 0x100c, `sub_8216F618` for 0x15e0); both are downstream of the priv-11 site too.
(e) Canary kernel-call diff	10 of the 11 missing exports remain absent. Only `XGetAVPack` was unlocked. The new first-divergence is inside the AV/crypto block — between `XGetAVPack` returning and `XeCryptSha` (still stub_success), Sylpheed's init aborts the block early.
(f) `sylpheed_oracles` (n50m / n2m)	Re-baselined and re-verified across 3 deterministic runs. New `n50m`: `instructions=50000005, imports=407417, swaps=2, draws=0` (was `50000008, 407415, 2, 0`).

Decisive interpretation. The fix is correct but partial. The priv-10 gate at lr=0x824ab598 flips polarity (was: skip block / now: execute block); XGetAVPack is now reached as predicted. The priv-11 gate at lr=0x824a99a4 lives inside sub_824A9710, which the boot flow does NOT reach because something in the AV/crypto block (which the priv-10 fix unlocked) aborts before completing. So:

XGetAVPack: ✅ reached (was missing, now fires once)
XeCryptSha / XeKeysConsolePrivateKeySign / ObCreateSymbolicLink / XamUserReadProfileSettings: ❌ still missing → AV/crypto block aborts early
sub_824A9710 (priv-11 caller) and downstream XamTaskSchedule / XamTaskCloseHandle / ExTerminateThread / etc.: ❌ still unreached
Parked-handle producers (the 9 PCs): ❌ still 0× (they live in the init flow gated on priv-11 or post-priv-11 — same blast radius)

Next-frontier bug (the new gate identified by this fix). Inside sub_824AB578 between XGetAVPack (lr=0x824ab5a4) and the next canary-only call (likely XeKeysConsolePrivateKeySign). The candidates are:

XGetAVPack returns wrong value. Our impl returns 0x16 (crates/xenia-kernel/src/xam.rs:382-384). Canary returns cvars::avpack (default 8 = HDMI). Sylpheed comment in canary xam_info.cc:250-251: "if the result is not 3/4/6/8 they explode with errors". 0x16 is not in the accepted set → strongly suspect this is the next blocker.
XeCryptSha / XeKeysConsolePrivateKeySign are stub_success (exports.rs:188-189). Returning STATUS_SUCCESS without side effects on a hashing operation may itself confuse the caller if it then reads a hash buffer expecting non-zero bytes.

Recommended next session: probe XGetAVPack return value (try 0x8 to match canary default) — that's a one-line change in xam.rs:383. If the run advances past, re-diff against canary at the new divergence; otherwise the next gate is in XeCryptSha / XeKeysConsolePrivateKeySign.

Trace artifacts: audit-runs/post-priv-fix/ours.log (5.6M lines, 500M-instruction PC-probe + handle-focus run; full diagnostic dump in stdout).

KRNBUG-XAM-001 — `XGetAVPack` returned non-canary `0x16`; canary default is `8` (HDMI)

Status: LANDED 2026-05-04. Closes the first follow-up of KRNBUG-XEX-001 (the XGetAVPack arm flipped 0→1 by the priv-10 fix exposed this as the next gate).

One-line fix. crates/xenia-kernel/src/xam.rs:382-384:

fn xget_avpack(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) {
    ctx.gpr[3] = 8;
}

Was 0x16. Canary's XGetAVPack_entry returns cvars::avpack (xam_info.cc:252); the cvar is DEFINE_int32(avpack, 8, ...) (xam_info.cc:35). Canary's inline comment at xam_info.cc:250-251: "Games seem to use this as a PAL check — if the result is not 3/4/6/8 they explode with errors if not in PAL mode." 0x16 (=22) is not in {3, 4, 6, 8}, so Sylpheed's caller treated the response as invalid.

Tests. 589 → 590. New unit test xget_avpack_returns_hdmi asserts r3 == 8. Constant-return change; one assertion is enough.

Validation chain (Step 3 of the hand-off):

step	outcome
(a) `cargo test --workspace --release`	589 → 590; all green.
(b) `--stable-digest -n 100M` lockstep	`instructions=100000010, import_calls=987686, swaps=2`. 3-run identical (counter sets bit-equal). Was `100000013, 407417, 2`. The +2.4× import-call jump is the deterministic guest divergence into the canary-correct branch (the AV/crypto block now executes more thunks). NOT non-determinism.
(c) AUDIT-005 9-PC probe at -n 500M	All 9 producer probe sites still 0× (`grep -c CTOR-PROBE = 0`).
(d) `--trace-handles-focus=0x1004,0x100c,0x15e0`	All 3 handles still `signal_attempts=0`. The producers live deeper in the init flow than what `XGetAVPack` alone unlocks.
(e) Canary kernel-call diff (set-diff `audit-runs/post-fix/ours-500m.log` vs `audit-runs/audit-005/canary.log`)	11 → 10 canary-only exports. The single match unlocked is `XGetAVPack` (canary=1, ours=1). Remaining absent: `ExTerminateThread ×2`, `KeReleaseSemaphore ×268`, `KeResetEvent ×1`, `NtDeviceIoControlFile ×2`, `ObCreateSymbolicLink ×1`, `XamTaskCloseHandle ×1`, `XamTaskSchedule ×1`, `XamUserReadProfileSettings ×2`, `XeCryptSha ×1`, `XeKeysConsolePrivateKeySign ×1`.
(f) `sylpheed_oracles` (n50m)	Re-baselined: `instructions=50000004, imports=407416, swaps=2, draws=0` (was `50000005, 407417, 2, 0`). 3 deterministic re-runs. Orphan golden `sylpheed_n2m.json` (no test refers to it) also re-baselined for hygiene.

Decisive interpretation. The fix is correct but partial. The XGetAVPack value returns are now in the canary-accepted set, and the call site at 0x824ab5a0 reaches it; the rest of the AV/crypto block at sub_824AB578 between XGetAVPack returning (lr=0x824ab5a4) and XeCryptSha does not execute. The cascade walked exactly one step.

Telemetry signal — lr=0x824a97e4 post-fix. A new RtlNtStatusToDosError(r3=0xc0000011 ...) (STATUS_NOT_IMPLEMENTED) fires from lr=0x824a97e4 immediately after XGetAVPack returns. That PC is inside sub_824A9710 (the priv-11 site), so the priv-11-caller IS being entered (probably via fall-through from a caller of sub_824AB578's post-AV block), but the priv-11 query itself never fires — there's a precondition between block entry and priv-11 that fails. Almost certainly a downstream sub of the AV/crypto block (one of sub_824ABA98 and friends from AUDIT-005's disasm) returns negative NTSTATUS, which routes here.

Next-frontier bug (the new gate identified by this fix). Between XGetAVPack (lr=0x824ab5a4) and XeCryptSha. Two candidates:

The 4 unreached siblings inside sub_824AB578 — XeCryptSha, XeKeysConsolePrivateKeySign, NtDeviceIoControlFile ×2, ObCreateSymbolicLink. Currently all stubs (stub_success for the crypto, real for NtDeviceIoControlFile but the caller may not be reached). Need to diff sub_824AB578 step-by-step from 0x824ab5a4 onward to find the failing precondition.
sub_824ABA98 returning negative NTSTATUS (the AUDIT-005 call site referenced from lr=0x824a97e4). If the AV/crypto block calls sub_824ABA98 and gets a negative return, control transfers to the error path that triggers the RtlNtStatusToDosError(c0000011) we observe. That PC is the tail signal — finding what's upstream of it is the next probe.

What did NOT change (per the AUDIT-004 diagnosis chain):

The 9 producer-callsite PCs for handles 0x100c (5 sites) and 0x15e0 (4 sites): still 0× hits.
The 3 parked-waiter handles 0x1004 / 0x100c / 0x15e0: still signal_attempts=0.
swaps=2 plateau, draws=0: unchanged.

Trace artifacts: audit-runs/post-fix/ours-500m.log (5.6M lines, 500M-instruction PC-probe + handle-focus run, post-AV-pack-fix). Same probe configuration as KRNBUG-AUDIT-005's audit-runs/audit-005/ours.log, re-runnable with the command in that finding's "Reproduce" block.

Reproduce the canary set-diff:

python3 - <<'PY'
import re
from pathlib import Path
from collections import Counter
HERE = Path('audit-runs/audit-005')
CR = re.compile(r'^d>\s+[0-9A-Fa-f]+\s+([A-Z][A-Za-z0-9_]+)\(')
OR_ = re.compile(r'probe_calls.*?call=([A-Za-z_][A-Za-z0-9_]*)')
def extract(p, rx):
    out = Counter()
    with open(p, errors='replace') as f:
        for line in f:
            m = rx.search(line)
            if m: out[m.group(1)] += 1
    return out
canary = extract(HERE/'canary.log', CR)
ours = extract('audit-runs/post-fix/ours-500m.log', OR_)
for n in sorted(set(canary) - set(ours)):
    print(f'  {canary[n]:>5}  {n}')
PY

293 KiB Raw Blame History Unescape Escape

PPC Instruction Audit — Findings Tracker

Conventions

Cross-cutting recommendation

Batch 1 — integer ALU (groups 1-5)

PPCBUG-001 — addi sign-extension, no truncation

PPCBUG-002 — addic untruncated writeback + 64-bit CA compare

PPCBUG-003 — addicx untruncated writeback + 64-bit CA + CR0 regression

PPCBUG-004 — mulli untruncated 64-bit signed product

PPCBUG-005 — subficx untruncated writeback + 64-bit CA compare

PPCBUG-006 — negx active GPR poisoning + 64-bit OE overflow check

PPCBUG-007 — subfcx CA via 64-bit unsigned compare

PPCBUG-008 — subfex CA via 64-bit unsigned compare + !ra poisons writeback

PPCBUG-009 — mullwx untruncated 64-bit signed product

PPCBUG-010 — divwx quotient sign-extended to 64 bits

PPCBUG-011 — divwx CR0 update breaks after PPCBUG-010 fix

PPCBUG-012 — addx writeback not truncated (latent)

PPCBUG-013 — addcx writeback not truncated (latent)

PPCBUG-014 — addex writeback not truncated (latent)

PPCBUG-015 — addzex writeback not truncated (latent)

PPCBUG-016 — addmex writeback not truncated (latent + edge case)

PPCBUG-017 — subfx writeback not truncated (latent)

PPCBUG-018 — subfzex writeback not truncated + !ra poisons

PPCBUG-019 — subfmex writeback poisoning + always-true CA edge

PPCBUG-020 — CR0 update uses 64-bit signed compare in all sub-register ops

PPCBUG-021 — OE overflow checks at bit 63 in all sub-register ops

PPCBUG-022 — mulld_ov missing INT_MIN * -1 edge case

PPCBUG-023 — andisx CR0 update uses 64-bit signed compare; should use 32-bit

Batch 2 — logical immediate (group 6)

Batch 3 — word rotate-and-mask (group 9)

PPCBUG-024 — rlwinmx CR0 update uses 64-bit signed compare; should use 32-bit

PPCBUG-025 — rlwimix CR0 update uses 64-bit signed compare; should use 32-bit

PPCBUG-026 — rlwnmx CR0 update uses 64-bit signed compare; should use 32-bit

PPCBUG-027 — rlwimix zeroes upper 32 bits of RA instead of preserving them (ISA deviation, LOW)

Batch 2 — logical register (group 7) [renumbered from collision]

PPCBUG-028 — orcx active GPR poisoning

PPCBUG-029 — norx active GPR poisoning (the not simplified mnemonic)

PPCBUG-030 — nandx active GPR poisoning

PPCBUG-031 — eqvx active GPR poisoning

PPCBUG-032 — andx / orx / xorx writeback not truncated (latent)

PPCBUG-033 — andcx active poisoning via !rb sub-expression

Batch 2 — sign-extend / count-leading-zeros (group 8) [renumbered]

PPCBUG-034 — extsbx writeback sign-extends to 64 bits

PPCBUG-035 — extshx writeback sign-extends to 64 bits

PPCBUG-036 — extsbx CR0 coupling

PPCBUG-037 — extshx CR0 coupling

PPCBUG-038 — extswx ISA-correct, document asymmetry

PPCBUG-039 — cntlzdx counts upper 32 always-zero bits in 32-bit ABI

Clean opcodes from group 8

Batch 2 — shift (group 11) [renumbered]

PPCBUG-040 — DECODER BUG: sh64() wrong bit order for sradi (HIGH)

PPCBUG-041 — srawx writeback sign-extends to 64 bits

PPCBUG-042 — srawix writeback sign-extends to 64 bits

PPCBUG-043 — srawx / srawix CR0 coupling

PPCBUG-044 — slwx / srwx CR0 misclassifies negative 32-bit results

PPCBUG-045 — Zero unit tests for any shift opcode

Clean opcodes from group 11

Batch 2 — doubleword rotate (group 10) [renumbered]

PPCBUG-046 — DECODER BUG: wrong bit position for MB[5] in all 6 doubleword-rotate opcodes (HIGH)

PPCBUG-047 — Zero execution tests for any doubleword-rotate opcode

What's correct in group 10

Batch 3 — branch (group 13)

PPCBUG-053 — CTR zero-test uses 64-bit compare; should use 32-bit in bcx/bclrx

PPCBUG-054 — mtspr CTR writeback not truncated to 32 bits

PPCBUG-055 — Severely inadequate test coverage for all four branch opcodes

Batch 3 — trap + system call (group 14)

PPCBUG-063 — ctx.pc already at CIA+4 when StepResult::Trap returns

PPCBUG-064 — sc ignores LEV field; sc 2 (HVcall) silently misdispatched

PPCBUG-065 — twi 31, r0, IMM typed-trap not handled; SIMM type code discarded

PPCBUG-066 — Stale frozen snapshots in ppc-manual for td/tdi/tw/twi

PPCBUG-067 — Test gaps for trap and sc

Batch 3 — SPR / MSR / TB / FPSCR / VSCR moves (group 16)

PPCBUG-078 — mtmsrd L=1 partial-MSR-write not modelled

PPCBUG-079 — mtspr silent drop of unknown-SPR writes without value logging

PPCBUG-080 — mfvscr does not zero the upper 96 bits of VD per ISA

PPCBUG-081 — Zero unit tests for mfcr / mtcrf

PPCBUG-082 — Minimal unit tests for mfspr / mtspr

PPCBUG-083 — Zero unit tests for mftb

PPCBUG-084 — Zero interpreter-level round-trip tests for FPSCR move instructions

PPCBUG-085 — Zero unit tests for mfvscr / mtvscr

293 KiB

Raw Blame History

PPCBUG-008 — subfex CA via 64-bit unsigned compare + `!ra` poisons writeback

PPCBUG-018 — subfzex writeback not truncated + `!ra` poisons

PPCBUG-029 — norx active GPR poisoning (the `not` simplified mnemonic)

PPCBUG-033 — andcx active poisoning via `!rb` sub-expression

PPCBUG-040 — DECODER BUG: `sh64()` wrong bit order for sradi (HIGH)

PPCBUG-053 — CTR zero-test uses 64-bit compare; should use 32-bit in `bcx`/`bclrx`

PPCBUG-054 — `mtspr CTR` writeback not truncated to 32 bits

PPCBUG-063 — `ctx.pc` already at CIA+4 when `StepResult::Trap` returns

PPCBUG-064 — `sc` ignores `LEV` field; `sc 2` (HVcall) silently misdispatched

PPCBUG-065 — `twi 31, r0, IMM` typed-trap not handled; SIMM type code discarded

PPCBUG-078 — `mtmsrd` L=1 partial-MSR-write not modelled

PPCBUG-079 — `mtspr` silent drop of unknown-SPR writes without value logging

PPCBUG-080 — `mfvscr` does not zero the upper 96 bits of VD per ISA

PPCBUG-081 — Zero unit tests for `mfcr` / `mtcrf`

PPCBUG-082 — Minimal unit tests for `mfspr` / `mtspr`

PPCBUG-083 — Zero unit tests for `mftb`

PPCBUG-085 — Zero unit tests for `mfvscr` / `mtvscr`

PPCBUG-088 — sync disasm ignores L field; `lwsync` (L=1) shows as "sync"

PPCBUG-068 — `mcrfs` does not recompute VX summary bit after clearing VX* exception bits

PPCBUG-069 — `mcrfs` test comment claims OX(so)=0 but OX is set in the test

PPCBUG-070 — Zero execution tests for all 8 CR logical ops and `mcrf`

PPCBUG-128 — lfs/lfsu/lfsx/lfsux silently quieten SNaN via `as f64` Rust float cast

PPCBUG-095 — `lha`: GPR writeback sign-extends to 64 bits

PPCBUG-096 — `lhax`: GPR writeback sign-extends to 64 bits

PPCBUG-097 — `lhau`: GPR writeback sign-extends to 64 bits

PPCBUG-098 — `lhaux`: GPR writeback sign-extends to 64 bits

PPCBUG-099 — `lhau`/`lhaux`: rD==rA invalid-form silently destroys load result

PPCBUG-107 — `invalidate_for_write` never called from stores; lwarx/stwcx. atomicity broken under `--parallel` (HIGH)

PPCBUG-115 — `ldbrx` byte-swap confirmed correct (informational)

PPCBUG-116 — `ld`/`ldx`/`ldu`/`ldux` as 32-bit-ABI poison sources (documentation)

PPCBUG-117 — Stale frozen snapshot in `ppc-manual/memory/ldarx.md`

PPCBUG-118 — Zero functional tests for `ld`, `ldx`, `ldu`, `ldux`, `ldbrx`

PPCBUG-123 — `lswx` XER TBC field not modeled; always loads 0 bytes

PPCBUG-124 — `set_xer()` discards TBC on `mtspr XER` (structural coupling to PPCBUG-123)

PPCBUG-125 — `lmw` missing RA-in-destination-range skip

PPCBUG-126 — `lswi` uses `instr.rb()` instead of `instr.nb()` for the NB field

PPCBUG-130 — All 9 store-byte/halfword opcodes missing `invalidate_for_write` (HIGH)

PPCBUG-140 — stw: missing `invalidate_for_write` call (HIGH)

PPCBUG-141 — stwu: missing `invalidate_for_write` call (HIGH)

PPCBUG-142 — stwx: missing `invalidate_for_write` call (HIGH)

PPCBUG-143 — stwux: missing `invalidate_for_write` call (HIGH)

PPCBUG-144 — stwbrx: missing `invalidate_for_write` call (HIGH)

PPCBUG-145 — stwcx: stale manual snapshot uses `reserved_addr` (LOW)

PPCBUG-160 — stmw, stswi, stswx missing `invalidate_for_write`; multi-line atomicity exposure (HIGH)

PPCBUG-161 — `stswx` is a permanent no-op: XER TBC not modeled (HIGH)

PPCBUG-162 — `stswi` uses `instr.rb()` instead of `instr.nb()` for NB field (MEDIUM)

PPCBUG-150 — `std`/`stdu`/`stdx`/`stdux`/`stdbrx` do not call `invalidate_for_write` (scope extension of PPCBUG-107)

PPCBUG-151 — `stdcx.`/`stwcx.` reservation width not discriminated: cross-width pair silently succeeds

PPCBUG-152 — `stdu`/`stdux` no invalid-form guard for RS==RA (LOW)

PPCBUG-167 — All 9 FP store arms missing `invalidate_for_write` (PPCBUG-107 class)

PPCBUG-168 — stfs* SNaN narrowing: `as f32` quietens SNaN without raising FPSCR.VXSNAN