Files
xenia-rs/audit-findings.md
MechaCat02 609f586ed8 chore: backfill audit-findings.md with entries from audits 023-057
Accumulated diagnostic notes from prior sessions that had stayed in the
working tree without being committed. Spans 20 audit entries (KRNBUG-AUDIT-023
through KRNBUG-AUDIT-057) plus VERIFY-A and TRACK-1/TRACK-2 sub-audits, all
read-only investigations dated 2026-05-06 through 2026-05-10.

No code or schema changes. Pure documentation backfill so future sessions can
cross-reference the full chain without depending on the auto-memory directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:35:21 +02:00

8015 lines
462 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PPC Instruction Audit — Findings Tracker
**Started**: 2026-04-29 (single session, audit-only)
**Trigger**: `addis` 32-bit-ABI sign-extension fix surfaced a likely systemic class of bugs.
**Status**: in flight. Per-group reports live in `audit-out/`. This file is the consolidated, stable-ID index.
**Workflow**: audit only this session; fix session(s) reference these IDs.
## Conventions
- Every finding has an ID `PPCBUG-NNN` for cross-referencing.
- **Status**: `open` (audit found it, not yet fixed) | `applied` (fix landed) | `wontfix` (intentional) | `dup-of:NNN` (collapsed into another finding).
- **Severity**:
- **HIGH** = wrong arithmetic / control flow on plausible Xbox 360 user code.
- **MEDIUM** = wrong status flag / latent under broken upstream invariants / edge case.
- **LOW** = test gap / cosmetic / dead-code-only.
- All file:line refs are `xenia-rs/crates/xenia-cpu/src/interpreter.rs` unless otherwise noted.
- Suggested fixes are written as one-line patches where possible; see the per-group report for full context.
## Cross-cutting recommendation
The single recurring root cause is **violating the 32-bit ABI invariant that all GPR writes truncate to 32 bits**. The cleanest fix is to systematically apply `as u32 as u64` at every GPR writeback in every integer ALU op. The existing CA/CR0/OE helpers will then be correct without further changes (because their inputs become guaranteed-clean). The audit reports list each fix individually; the fix session may choose to apply them as one sweep or one-at-a-time.
A defensive secondary recommendation: even after the writeback truncation, instructions whose CA computation does its own internal arithmetic on 64-bit operands (`subfcx`, `subfex`, `addic`, `addicx`, `subficx`) should additionally truncate their compare operands. This guards against any future regression that re-pollutes the GPR file.
---
## Batch 1 — integer ALU (groups 1-5)
Per-group reports: `audit-out/group-01-add-imm.md`, `group-02-add-reg.md`, `group-03-sub-reg.md`, `group-04-multiply.md`, `group-05-divide.md`.
### PPCBUG-001 — addi sign-extension, no truncation
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:114-118
- **Symptom**: `addi rT, r0, -1` (= `li rT, -1`) writes `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`. Identical shape to addis.
- **Fix**:
```rust
ctx.gpr[instr.rd()] = ra_val.wrapping_add(instr.simm16() as i64 as u64) as u32 as u64;
```
- **Test gap**: existing `test_addi` only covers positive simm16. Add a test for `li rT, -1` and verify the upper 32 bits are zero.
### PPCBUG-002 — addic untruncated writeback + 64-bit CA compare
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:133-140
- **Symptom**: (a) GPR writeback not truncated (same shape as addi). (b) CA computed via 64-bit `result < ra` — Canary's `AddDidCarry` explicitly truncates both operands to int32 first.
- **Fix**:
```rust
let ra32 = ra as u32;
let imm = instr.simm16() as i32 as u32;
let result32 = ra32.wrapping_add(imm);
ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;
```
- **Test gap**: zero unit tests for addic.
### PPCBUG-003 — addicx untruncated writeback + 64-bit CA + CR0 regression
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:141-150
- **Symptom**: same as PPCBUG-002 plus a CR0 regression: live code uses `update_cr_signed(0, result as i64)` (64-bit signed). The frozen snapshot in `ppc-manual/alu/addicx.md` shows the previously-correct `result as i32 as i64` form. Live code has drifted.
- **Fix**: PPCBUG-002 fix plus `update_cr_signed(0, result32 as i32 as i64)`.
- **Test gap**: zero unit tests.
- **Note**: confirms the manual's frozen snapshots are useful drift detectors — see if other opcodes have similarly regressed.
### PPCBUG-004 — mulli untruncated 64-bit signed product
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:159-164
- **Symptom**: RA read as full `i64`, product stored as `u64` without truncation. Per ISA in 32-bit ABI, both factors should be i32 and product should fit in 32 bits (overflow silently wraps per ISA).
- **Fix**:
```rust
let ra = ctx.gpr[instr.ra()] as i32 as i64;
let imm = instr.simm16() as i64;
ctx.gpr[instr.rd()] = (ra.wrapping_mul(imm) as u32) as u64;
```
- **Test gap**: zero unit tests.
### PPCBUG-005 — subficx untruncated writeback + 64-bit CA compare
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:151-158
- **Symptom**: (a) `imm.wrapping_sub(ra)` on 64-bit values writes poisoned upper bits; sign-extended `imm` for negative SIMM has bits 32-63 set. (b) CA `imm >= ra` is 64-bit unsigned compare; wrong relative to Canary's 32-bit form.
- **Fix**:
```rust
let ra32 = ra as u32;
let imm32 = instr.simm16() as i32 as u32;
let result32 = imm32.wrapping_sub(ra32);
ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;
```
- **Test gap**: zero unit tests.
### PPCBUG-006 — negx active GPR poisoning + 64-bit OE overflow check
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:319-330
- **Symptom**: (a) `(!ra).wrapping_add(1)` unconditionally sets upper 32 bits to all-ones because `!ra` flips them. Even a clean `r3 = 5` produces `0xFFFFFFFF_FFFFFFFB` instead of `0x00000000_FFFFFFFB`. **This is active, not latent — every neg in 32-bit-ABI code poisons the GPR.** (b) `neg_ov_64` overflow predicate tests `ra == 0x8000_0000_0000_0000` (64-bit INT_MIN) instead of `ra == 0x0000_0000_8000_0000` (32-bit INT_MIN).
- **Fix**:
```rust
let result = (!(ra as u32)).wrapping_add(1);
ctx.gpr[instr.rd()] = result as u64;
if instr.oe() {
overflow::apply(ctx, (ra as u32) == 0x8000_0000);
}
if instr.rc_bit() { ctx.update_cr_signed(0, result as i32 as i64); }
```
- **Test gap**: existing `nego_sets_ov_only_on_int_min` tests 64-bit INT_MIN — add a 32-bit INT_MIN case.
### PPCBUG-007 — subfcx CA via 64-bit unsigned compare
- **Severity**: HIGH (defensive — same shape as the compare that broke addis)
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:258
- **Symptom**: `if rb >= ra { 1 } else { 0 }` is the exact 64-bit unsigned compare that the addis bug exploited. Wrong CA when either operand has poisoned upper 32 bits. Apply defensively even if all upstream sources are cleaned, because a wrong CA bit is unrecoverable downstream.
- **Fix**:
```rust
let ra32 = ra as u32;
let rb32 = rb as u32;
let result32 = rb32.wrapping_sub(ra32);
ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;
```
- **Test gap**: zero dedicated unit tests for subfcx — the most critical opcode in Group 3 had no coverage. Add 6+ tests including the exact 0x828F3F98 / 0x828F3F68 case from the addis incident.
### PPCBUG-008 — subfex CA via 64-bit unsigned compare + `!ra` poisons writeback
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:268-284
- **Symptom**: (a) CA `if rb > ra || (rb == ra && ca != 0)` is 64-bit; same shape as PPCBUG-007. (b) Writeback uses `(!ra).wrapping_add(rb).wrapping_add(ca)` — `!ra` always sets upper 32 bits, guaranteed GPR poison even with clean inputs (same shape as PPCBUG-006).
- **Fix**:
```rust
let ra32 = ra as u32;
let rb32 = rb as u32;
let ca = ctx.xer_ca as u32;
let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca);
ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64;
```
### PPCBUG-009 — mullwx untruncated 64-bit signed product
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:331-344
- **Symptom**: 32x32 multiply produces 64-bit signed `i64` product, written to GPR via `as u64` without truncation. When product overflows i32 (which `mullw_ov` correctly detects), upper 32 bits are non-zero and corrupt downstream 64-bit unsigned compares — same class as addis.
- **Fix** (one line; OE handler unchanged):
```rust
ctx.gpr[instr.rd()] = product as u32 as u64;
```
### PPCBUG-010 — divwx quotient sign-extended to 64 bits
- **Severity**: HIGH
- **Status**: open (must be applied in same commit as PPCBUG-011)
- **Location**: interpreter.rs:373
- **Symptom**: `(ra / rb) as i64 as u64` sign-extends a negative i32 quotient. `-10 / 3 = -3` writes `0xFFFFFFFF_FFFFFFFD` instead of `0x00000000_FFFFFFFD`. Canary's `InstrEmit_divwx` uses `f.ZeroExtend(v, INT64_TYPE)` — explicit zero-extension.
- **Fix**: `ctx.gpr[instr.rd()] = (ra / rb) as u32 as u64;`
### PPCBUG-011 — divwx CR0 update breaks after PPCBUG-010 fix
- **Severity**: MEDIUM (coupled to PPCBUG-010 — must land together)
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:379
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.rd()] as i64)` accidentally works today because the sign-extended GPR has consistent sign in i64 view. After PPCBUG-010, GPR holds `0x00000000_FFFFFFFD` for `-3` and `as i64` reads positive — CR0.LT will be wrong for negative quotients.
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64);`
### PPCBUG-012 — addx writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:167-179
- **Symptom**: 64-bit `wrapping_add` result written to GPR untruncated. Latent: only triggers if upstream operands have poisoned upper 32 bits. With PPCBUG-001 etc. unfixed, that invariant is broken — addx amplifies the poison.
- **Fix**: `ctx.gpr[instr.rd()] = result as u32 as u64;`
### PPCBUG-013 — addcx writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:180-193
- **Fix**: same shape as PPCBUG-012.
### PPCBUG-014 — addex writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:194-209
- **Fix**: same shape as PPCBUG-012.
### PPCBUG-015 — addzex writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:210-224
- **Fix**: same shape as PPCBUG-012.
### PPCBUG-016 — addmex writeback not truncated (latent + edge case)
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:225-240
- **Symptom**: same writeback issue plus the `wrapping_sub(1)` produces all-ones upper 32 bits when low 32 bits underflow — guaranteed poison even if inputs are clean (same shape as PPCBUG-006/008).
- **Fix**: truncate operands and result to 32 bits.
### PPCBUG-017 — subfx writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:241-253
- **Fix**: same shape as PPCBUG-012.
### PPCBUG-018 — subfzex writeback not truncated + `!ra` poisons
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:285-302
- **Symptom**: `(!ra).wrapping_add(ca)` flips upper 32 bits — guaranteed poison.
- **Fix**: truncate ra to u32, do arithmetic on u32, write `as u64`.
### PPCBUG-019 — subfmex writeback poisoning + always-true CA edge
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:303-318
- **Symptom**: (a) writeback poisoned via `(!ra)`. (b) CA predicate `(!ra) != 0` is always true when ra has clean upper 32 bits (because `!ra` flips them) — so CA is always 1, even in the documented edge case where 32-bit `ra == 0xFFFFFFFF && ca == 0` should yield CA=0.
- **Fix**: operate on u32, then `xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 }`.
### PPCBUG-020 — CR0 update uses 64-bit signed compare in all sub-register ops
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Locations**: interpreter.rs:250, 264, 281, 299, 315, 327, 341, 379, 396, 410, 419, 428, 445, 462 (every Rc=1 path in groups 2-5)
- **Symptom**: `update_cr_signed(0, result as i64)` views result as 64-bit signed. In 32-bit ABI, bit 31 determines LT/GT, not bit 63. A result like `0x00000000_80000000` is negative in 32-bit but positive in 64-bit — CR0.LT inverted.
- **Fix (catch-all)**: change to `result as u32 as i32 as i64` everywhere. Once PPCBUG-001..-019 truncate writebacks, the upper 32 bits of `result` are zero and this distinction becomes moot — but applying both is cheap and provides defense in depth.
- **Note**: this is one logical fix duplicated across all rc paths; the fix session should grep `update_cr_signed(0, .* as i64)` to find them all.
### PPCBUG-021 — OE overflow checks at bit 63 in all sub-register ops
- **Severity**: LOW
- **Status**: open
- **Locations**: throughout — `add_ov_64`, `sub_ov_64`, `sum_overflow_64`, `mullw_ov`, etc. (defined in `xenia-cpu/src/overflow.rs`)
- **Symptom**: signed-overflow check operates on 64-bit boundary. For 32-bit-ABI ops (`addo`, `subfo`, `subfco`, etc.), should check at bit 31. With PPCBUG-006 a tighter form was given for `negx`. The pattern probably needs systematic review across overflow.rs.
- **Fix**: open a follow-up audit of overflow.rs after batch B completes.
### PPCBUG-022 — mulld_ov missing INT_MIN * -1 edge case
- **Severity**: LOW
- **Status**: open
- **Location**: `xenia-cpu/src/overflow.rs` (`mulld_ov` helper)
- **Symptom**: 64-bit signed multiply overflow check doesn't handle `i64::MIN * -1`.
- **Fix**: add the special case to the helper.
### PPCBUG-023 — andisx CR0 update uses 64-bit signed compare; should use 32-bit
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:475
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` interprets the result as 64-bit signed. The `andisx` result is bounded by `0x0000_0000_FFFF_0000`, which is always non-negative in 64-bit view. In 32-bit ABI, bit 31 is the sign bit — results with bit 31 set (e.g. `andis. rA, rS, 0x8000` with rS=0x80000000 → result=0x80000000) should yield CR0.LT=1, but xenia-rs gives CR0.GT=1. The ppc-manual frozen snapshot for `andisx` shows the correct `as i32 as i64` form; the live code has drifted. Common trigger: `andis. rA, rS, 0x8000` to test the sign bit of a 32-bit word.
- **Fix**:
```rust
ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);
```
- **Test gap**: zero tests for `andisx`. Add at minimum: result with bit 31 set (expect LT=1), result with bits 030 set (expect GT=1), result=0 (expect EQ=1).
---
## Batch 2 — logical immediate (group 6)
Per-group report: `audit-out/group-06-logic-imm.md`.
Group 6 summary: only 1 new bug found. The `simm16` sign-extension pattern does not apply (all ops use `uimm16`). `ori`, `oris`, `xori`, `xoris`, and `andix` are ISA-correct; `andisx` has a CR0 interpretation bug (PPCBUG-023). All 6 opcodes have inadequate test coverage (LOW gaps for 5 of them, MEDIUM gap for `andisx` tied to the bug).
---
## Batch 3 — word rotate-and-mask (group 9)
Per-group report: `audit-out/group-09-word-rotate.md`.
Group 9 summary: core arithmetic is clean — `rlw_mask`, rotate logic, and result write are all ISA-correct. The single recurring defect is the Rc=1 CR0 path using `as i64` instead of `as u32 as i32 as i64` (instances of PPCBUG-020 specific to these three opcodes). `rlwimix` zeroes the upper 32 bits of RA instead of preserving them per ISA, but this is safe under 32-bit ABI invariant and classified LOW. Test coverage is poor: 1 partial test for `rlwinmx`, zero for the other two.
### PPCBUG-024 — rlwinmx CR0 update uses 64-bit signed compare; should use 32-bit
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:667
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` — result is a zero-extended u32, so bit 31 set yields +2147483648 in 64-bit signed view but -2147483648 in 32-bit ABI. CR0.LT/GT inverted for results with bit 31 set. `rlwinm.` is the most common dot-form instruction in compiler output (all `slwi.`, `srwi.`, `clrlwi.`, bitfield-test-and-branch idioms).
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
- **Test gap**: `test_rlwinm` exists but non-Rc only, result has bit 31 clear. Add Rc=1 tests with bit 31 set in result.
### PPCBUG-025 — rlwimix CR0 update uses 64-bit signed compare; should use 32-bit
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:679
- **Symptom**: same class as PPCBUG-024. `rlwimi.` is compiler-generated for struct bitfield writes; when the inserted value occupies or sets bit 31 of RA, CR0.LT is wrong.
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
- **Test gap**: zero tests for `rlwimix`. Add basic insert (non-Rc) + Rc=1 with bit-31-set case.
### PPCBUG-026 — rlwnmx CR0 update uses 64-bit signed compare; should use 32-bit
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:690
- **Symptom**: same class as PPCBUG-024. `rlwnm.` is less frequent but used in variable-shift normalisation patterns.
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
- **Test gap**: zero tests for `rlwnmx`.
### PPCBUG-027 — rlwimix zeroes upper 32 bits of RA instead of preserving them (ISA deviation, LOW)
- **Severity**: LOW
- **Status**: open (no fix action required for 32-bit ABI emulation)
- **Location**: interpreter.rs:677-678
- **Symptom**: `let ra = ctx.gpr[instr.ra()] as u32` discards upper 32 bits; result written as `as u64` zero-extends. Per ISA, `(RA) & ¬MASK(MB+32, ME+32)` preserves upper 32 bits of RA. Canary confirms: `f.And(f.LoadGPR(i.M.RA), f.LoadConstantUint64(~m))` with `~m` non-zero in upper half.
- **Impact**: under 32-bit ABI, if the 32-bit GPR invariant holds, upper 32 bits of RA are already zero before `rlwimix`, so both behaviours are identical. The deviation is only observable if an upstream bug (PPCBUG-001..023) has leaked non-zero upper bits into RA — in which case `rlwimix` would silently clean them (beneficial side-effect). No isolated fix needed; resolves automatically when upstream bugs are fixed.
- **Note**: if 64-bit mode support is ever added, this will become a HIGH bug.
---
## Batch 2 — logical register (group 7) [renumbered from collision]
Per-group report: `audit-out/group-07-logic-reg.md` (note: report uses original IDs PPCBUG-023..029 from the subagent's local numbering; tracker uses PPCBUG-028..033 here to avoid collision with groups 6 and 9).
The group 7 subagent also flagged a CR0 regression across all 8 opcodes — that is an extension of PPCBUG-020 (catch-all for CR0 64-bit-signed regressions). Adding andx, andcx, orx, orcx, xorx, norx, nandx, eqvx Rc=1 paths to PPCBUG-020's scope rather than creating a new ID.
### PPCBUG-028 — orcx active GPR poisoning
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:509-513
- **Symptom**: writes `rs | !rb`. Rust's `!` on `u64` flips all 64 bits — the upper 32 bits of `!rb` are unconditionally all-ones, OR'd into the result. With clean inputs `orc r5, r3, r4` writes `0xFFFFFFFF_xxxxxxxx`. Active poisoning, same shape as PPCBUG-006/008.
- **Fix**: operate on u32, write `as u64`:
```rust
let result = (ctx.gpr[instr.rs()] as u32) | !(ctx.gpr[instr.rb()] as u32);
ctx.gpr[instr.ra()] = result as u64;
```
- **Test gap**: zero tests.
### PPCBUG-029 — norx active GPR poisoning (the `not` simplified mnemonic)
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:519-523
- **Symptom**: writes `!(rs | rb)` — outer `!` flips upper 32 bits unconditionally. **`nor rA, rS, rS` is the canonical `not` simplified mnemonic** used pervasively in PPC code; every `not` in 32-bit-ABI Xbox 360 binaries actively poisons the GPR.
- **Fix**: u32 arithmetic, write `as u64`.
### PPCBUG-030 — nandx active GPR poisoning
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:524-528
- **Symptom**: writes `!(rs & rb)` — same shape as norx. The simplified mnemonic `nand` is also `nand rA, rS, rS` (= `nor . . .` in some assemblers).
- **Fix**: u32 arithmetic.
### PPCBUG-031 — eqvx active GPR poisoning
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:529-533
- **Symptom**: writes `!(rs ^ rb)` — same shape. The idiom `eqv rA, rS, rS` "set rA to all-ones (i.e. -1 in 32-bit ABI)" produces `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`.
- **Fix**: u32 arithmetic.
### PPCBUG-032 — andx / orx / xorx writeback not truncated (latent)
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Locations**: interpreter.rs:494-498 (andx), 504-508 (orx), 514-518 (xorx)
- **Symptom**: 64-bit bitwise on full GPR values. Latent — clean if both operands are clean; pollutes if either is poisoned upstream.
- **Fix**: `as u32 as u64` truncation at writeback. Once all upstream poison sources are fixed, these become unnecessary; until then, defensive truncation.
### PPCBUG-033 — andcx active poisoning via `!rb` sub-expression
- **Severity**: MEDIUM (the `!rb` always poisons; outer `&` masks it away when rs is clean — fully active when rs is poisoned)
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:499-503
- **Symptom**: writes `rs & !rb`. The `!rb` always has all-ones upper bits; if rs has clean upper bits (zero), the result is clean. If rs is poisoned upstream, the poison propagates AND the always-set bits in `!rb` make it look "guaranteed". This is closer to active than latent.
- **Fix**: `(rs as u32) & !(rb as u32)` then `as u64`.
## Batch 2 — sign-extend / count-leading-zeros (group 8) [renumbered]
Per-group report: `audit-out/group-08-extend-clz.md` (report uses local IDs PPCBUG-023..030; tracker uses PPCBUG-034..039).
### PPCBUG-034 — extsbx writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:537
- **Symptom**: `as i8 as i64 as u64` — a byte with high bit set (0x80) writes `0xFFFFFFFF_FFFFFF80` instead of `0x00000000_FFFFFF80`. Active poisoning on every negative byte. `extsb` is emitted by compilers to canonicalize signed-byte arguments — common code path.
- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i8 as i32 as u32 as u64;`
- **Test gap**: zero unit tests.
- **Note**: Canary's JIT does the same sign-extension but is rescued by x86's 32-bit-write zeroing the upper 32 of host registers. Pure interpreter has no such escape.
### PPCBUG-035 — extshx writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:542
- **Symptom**: `as i16 as i64 as u64` — same shape as PPCBUG-034 for halfwords.
- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i16 as i32 as u32 as u64;`
### PPCBUG-036 — extsbx CR0 coupling
- **Severity**: MEDIUM (must land in same commit as PPCBUG-034)
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:538
- **Symptom**: `update_cr_signed(0, ra as i64)` — currently latent because the unfixed sign-extended value's i64 sign matches bit 7 of the byte. After PPCBUG-034 lands, the truncated value's i64 view becomes always non-negative — CR0.LT will never fire for negative byte results.
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);` — must land with PPCBUG-034.
### PPCBUG-037 — extshx CR0 coupling
- **Severity**: MEDIUM (must land with PPCBUG-035)
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:543
- **Symptom**: same coupling shape as PPCBUG-036 for halfwords.
### PPCBUG-038 — extswx ISA-correct, document asymmetry
- **Severity**: LOW (informational / wontfix)
- **Status**: wontfix
- **Location**: interpreter.rs:547
- **Symptom**: `as i32 as i64 as u64` produces full 64-bit sign-extension. This IS the documented purpose of extsw — argument-register canonicalization in 64-bit mode. Behavior is intentional. After PPCBUG-034/035 land, document the asymmetry with extsb/extsh in a comment.
### PPCBUG-039 — cntlzdx counts upper 32 always-zero bits in 32-bit ABI
- **Severity**: LOW
- **Status**: open (probably dead code in Xbox 360 binaries)
- **Location**: interpreter.rs:556-562
- **Symptom**: counts leading zeros in full 64. If a 32-bit-ABI binary emits cntlzd, the result is `32 + cntlzw(low32)` not `cntlzw(low32)`. ISA-correct for 64-bit mode; only matters if the binary actually emits it.
- **Test gap**: zero tests.
#### Clean opcodes from group 8
- `cntlzwx` (interpreter.rs:551-555) — `(rs as u32).leading_zeros()` reads only low 32 bits, result range 0..=32, upper 32 zero. CR0 path benign because result is small. **Test gap only**, LOW.
- `extswx` CR0 path is correct per ISA (PPCBUG-038 wontfix).
## Batch 2 — shift (group 11) [renumbered]
Per-group report: `audit-out/group-11-shift.md` (uses local IDs PPCBUG-050..055; tracker uses PPCBUG-040..045).
### PPCBUG-040 — DECODER BUG: `sh64()` wrong bit order for sradi (HIGH)
- **Severity**: HIGH (this is a decoder-level bug, file:line is in `decoder.rs` not `interpreter.rs`)
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `xenia-rs/crates/xenia-cpu/src/decoder.rs:91-93` (the `sh64()` accessor method on `DecodedInstr`)
- **Symptom**: the XS-form `sradix` (sradi) shift amount is assembled as `SH[4:0] << 1 | SH[5]` instead of the correct `SH[5] << 5 | SH[4:0]`. **Every `sradi rA, rS, N` instruction where N is not 0 or 63 executes with a completely wrong shift count.** Example: `sradi rA, rS, 32` shifts by 1 instead. This is a silent, structural mis-decoding — none of the interpreter changes can paper over it.
- **Cross-reference**: Canary's `(i.XS.SH5 << 5) | i.XS.SH` pattern is the correct ISA encoding.
- **Fix**: in `decoder.rs:sh64()` body, swap the bit order:
```rust
pub fn sh64(&self) -> u32 {
// SH5 is at bit 30 of the encoded word; SH[4:0] is at bits 16-20.
let sh_lo = extract_bits(self.raw, 16, 20);
let sh_hi = extract_bits(self.raw, 30, 30);
(sh_hi << 5) | sh_lo
}
```
- **Impact**: `sradi` is used by compilers for arithmetic right shifts on 64-bit values. In Xbox 360 32-bit-ABI binaries it should not be common, but it's emitted by some compilers for sign-magnitude conversions and 64-bit fixed-point arithmetic. **This is the kind of silent decoder bug the user explicitly wanted the audit to catch.**
- **Test gap**: no decoder unit test pins `sh64()` for non-trivial SH values. Add fixture cases in `disasm_goldens.rs` for `sradi rA, rS, 1`, `sradi rA, rS, 32`, `sradi rA, rS, 63`.
- **Note**: any other instruction that uses the same XS-form SH split-encoding is suspect. Phase C decoder audit must verify `sradi` and `sradix` are the only consumers of `sh64()`.
### PPCBUG-041 — srawx writeback sign-extends to 64 bits
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Locations**: interpreter.rs:583, 588 (two writeback paths for the count<32 and count>=32 branches)
- **Symptom**: `result as i64 as u64` violates the 32-bit-ABI zero-extension convention. A negative shifted value writes `0xFFFFFFFF_xxxxxxxx` instead of `0x00000000_xxxxxxxx`.
- **Fix**: `result as u32 as u64` in both writeback paths.
- **Note**: subagent verified the CA computation is **independently correct** — uses `(rs as u32) << (32 - sh) != 0` which is the canonical ISA shifted-out-bits test on 32-bit operands. **Do not change CA logic.**
### PPCBUG-042 — srawix writeback sign-extends to 64 bits
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Locations**: interpreter.rs:600, 605 (same shape as PPCBUG-041 for srawi)
- **Fix**: `result as u32 as u64`.
### PPCBUG-043 — srawx / srawix CR0 coupling
- **Severity**: MEDIUM (must land with PPCBUG-041 and PPCBUG-042)
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Locations**: interpreter.rs:593, 607
- **Symptom**: currently masked by the sign-extended writeback (sign-extension makes the 64-bit and 32-bit sign agree). After truncating the writeback, `as i64` will misread the sign for negative results.
- **Fix**: `as u32 as i32 as i64` in both Rc=1 paths, applied with PPCBUG-041/042.
### PPCBUG-044 — slwx / srwx CR0 misclassifies negative 32-bit results
- **Severity**: LOW (zero-extended results have bit 31 set in low 32, but always positive in i64 view → CR0.LT never fires for slw/srw with bit-31-set results)
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Locations**: interpreter.rs:568, 576
- **Fix**: `as u32 as i32 as i64`.
### PPCBUG-045 — Zero unit tests for any shift opcode
- **Severity**: LOW (test gap only)
- **Status**: open
- **Locations**: interpreter.rs:563-658 (entire shift group: slwx, srwx, srawx, srawix, sldx, srdx, sradx, sradix)
- **Recommendation**: add at least one functional test per opcode. Especially: `srawix r3, r3, 1` with rs=0xFFFFFFFE (CA should be 0), `srawix r3, r3, 1` with rs=0x80000001 (CA should be 1, result=0xC0000000); `sradix r3, r3, 32` (currently wrong per PPCBUG-040).
#### Clean opcodes from group 11
- `slwx` writeback at line 568 (zero-ext 32-bit result via `(rs as u32 << count) as u64`) — clean.
- `srwx` writeback at line 576 — clean.
- `sldx`, `srdx`, `sradx` — 64-bit ops, ISA-correct (probably dead in 32-bit-ABI binaries).
- `sradix` body logic is structurally correct; failure is solely from PPCBUG-040 giving it a wrong shift count.
## Batch 2 — doubleword rotate (group 10) [renumbered]
Per-group report: `audit-out/group-10-dword-rotate.md` (uses local IDs PPCBUG-027/028; tracker uses PPCBUG-046/047).
### PPCBUG-046 — DECODER BUG: wrong bit position for MB[5] in all 6 doubleword-rotate opcodes (HIGH)
- **Severity**: HIGH (decoder-level; impacts the canonical zero-extend-to-32 idiom)
- **Status**: applied (52b05b1, 2026-05-01)
- **Locations**: interpreter.rs — every arm of `rldiclx`, `rldicrx`, `rldicx`, `rldimix`, `rldclx`, `rldcrx` (lines 693-754)
- **Symptom**: each arm computes `let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1)`. The bit at `(instr.raw >> 1) & 1` is **PPC bit 30**, which in MD form is `sh[0]` (the low bit of the shift amount) — NOT `mb[5]`. The high bit of the 6-bit MB field lives at PPC bit 26 = `(instr.raw >> 5) & 1`.
As written, the code computes `(mb[4:0] << 1) | sh[0]`. Ironically `disasm.rs:1256` (the `mb_md()` helper) has the correct formula. The interpreter was written independently with the wrong bit position — probably a copy-error from `sh64()` where bit 30 really is the split bit.
- **Concrete impact**:
- `clrldi r3, r4, 32` is the canonical "zero-extend low 32 bits" idiom emitted constantly in 32-bit-ABI PPC code. Encoded as `rldicl r3, r4, 0, mb=32`. With mb=32, `mb[5]=1, mb[4:0]=0`. The interpreter decodes mb=0 → mask is all-ones → instruction becomes a no-op. Any downstream 64-bit compare (subfcx CA, cmpld) on that register sees a polluted 64-bit value instead of a clean 32-bit zero-extended one. **This is the same class of bug that caused the addis/BST incident.**
- For `rldcr` (MDS form), the XO field's LSB at bit 30 is always 1 (Rc=0 opcode), so `me[5]` is forcibly set to 1 for every non-record-form invocation — effectively adding 32 to all me values.
- **Fix** (one line per opcode):
```rust
// Replace in all 6 arms:
let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1);
// With:
let mb = instr.mb() | (((instr.raw >> 5) & 1) << 5);
```
Or, cleaner: expose `mb_md()` (currently in disasm.rs:1256) as a method on `DecodedInstr` in `decoder.rs` and have the interpreter call `instr.mb_md()` — single source of truth for MD-form mb extraction.
- **Test gap**: zero execution tests for any of the 6 opcodes; only disasm-golden string-output tests.
- **Note**: this is the second decoder bug found by the audit (PPCBUG-040 / `sh64()` for `sradi` is the first). Phase C decoder audit must verify whether other MD/MDS/XS form accessors have similar bit-position errors.
### PPCBUG-047 — Zero execution tests for any doubleword-rotate opcode
- **Severity**: LOW (test gap)
- **Status**: open
- **Locations**: interpreter.rs:693-754 (all 6 opcodes)
- **Recommendation**: at minimum, a `clrldi r3, r4, 32` test verifying the result is exactly the low 32 bits of r4. After PPCBUG-046 lands, this test would have caught the MB-reconstruction bug.
#### What's correct in group 10
- `sh64()` accessor — correctly reconstructs 6-bit shift from MD split encoding (cross-check: `disasm.rs` agrees).
- `rld_mask_left()` / `rld_mask_right()` mask helpers — verified against Canary's XEMASK.
- `rldicx`/`rldimix` mask formulas (`63 - sh` for right edge) — correct.
- `rldimix` read-modify-write merge — correct 64-bit mask-insert.
- CR0 `as i64` — correct here because these ARE genuine 64-bit ops (unlike word rotate).
- `rldcl`/`rldcr` register-shift extraction (`gpr[rb] & 0x3F`) — correct.
- No 32-bit writeback truncation needed: these are intentionally 64-bit; 32-bit-ABI compilers only emit them with masks that yield 32-bit-clean results.
## Batch 3 — branch (group 13)
Per-group report: `audit-out/group-13-branch.md`.
Group 13 summary: the branch implementation is substantively correct. All BO/BI bit masks,
CTR decrement-before-test ordering, AA absolute vs relative dispatch, LK unconditional write
(including not-taken path in `bcx`), LR-read-before-LR-write atomicity in `bclrx`, and
`get_cr_bit()` field indexing are all ISA-correct and match Canary. The only execution bugs
are a latent 64-bit CTR zero-test (PPCBUG-053/054, active under current GPR-pollution environment)
and severely thin test coverage (PPCBUG-055).
### PPCBUG-053 — CTR zero-test uses 64-bit compare; should use 32-bit in `bcx`/`bclrx`
- **Severity**: MEDIUM (effectively HIGH given unfixed PPCBUG-001..031 GPR pollution)
- **Status**: applied (3d8e2ce, 2026-05-02)
- **Locations**: `interpreter.rs:849` (`bcx` `ctr_ok`), `interpreter.rs:879` (`bclrx` `ctr_ok`)
- **Symptom**: `ctx.ctr != 0` compares all 64 bits. In 32-bit ABI the CTR is logically 32-bit.
Canary explicitly truncates to 32 bits: `ctr = f.Truncate(ctr, INT32_TYPE)`. When CTR upper
32 bits are non-zero (due to upstream GPR pollution flowing through `mtspr CTR, rN`), the
64-bit test disagrees with the 32-bit ISA semantic. Most dangerous with `neg; mtctr; bdnz`:
`negx` (PPCBUG-006) always sets upper 32 bits, so the 32-bit CTR counter can reach zero
while the 64-bit CTR is still non-zero → infinite loop.
- **Fix**:
```rust
// Replace in both bcx and bclrx:
let ctr_ok = (bo & 0b00100) != 0
|| (((ctx.ctr as u32) != 0) ^ ((bo & 0b00010) != 0));
```
Or, alternatively, truncate at decrement:
```rust
if bo & 0b00100 == 0 {
ctx.ctr = ctx.ctr.wrapping_sub(1) as u32 as u64;
}
```
- **Test gap**: zero tests for CTR-decrement branches (bdnz, bdz, bdnzt, bdnzf, bdzt, bdzf).
### PPCBUG-054 — `mtspr CTR` writeback not truncated to 32 bits
- **Severity**: MEDIUM
- **Status**: applied (3d8e2ce, 2026-05-02)
- **Location**: `interpreter.rs:1411`
- **Symptom**: `crate::context::spr::CTR => ctx.ctr = val` writes the full 64-bit GPR to CTR.
Acts as a firewall gap: any upstream 64-bit GPR pollution flows directly into CTR, where it
will be tested by PPCBUG-053's 64-bit comparison. Defensive fix prevents CTR from ever
acquiring non-zero upper 32 bits independently of the GPR-pollution fix.
- **Note**: the `bcctrx` branch-target read (`(ctx.ctr as u32) & !3`) already truncates
correctly; the bug is confined to the `ctr != 0` zero-test in `bcx`/`bclrx`.
- **Fix**: `crate::context::spr::CTR => ctx.ctr = val as u32 as u64,`
- **Cross-reference**: Group 16 (SPR/MSR) subagent should verify this write-point.
### PPCBUG-055 — Severely inadequate test coverage for all four branch opcodes
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Locations**: `interpreter.rs` test module (lines 44554491)
- **Current coverage**: `bx` forward (1 test), `bl` LR update (1 test), `bcx` taken beq (1 test via `test_cmp_and_bc`). Zero tests for: `bclrx`, `bcctrx`, any CTR-decrement variant, not-taken path, backward branch, AA=1 absolute, `bcl` LR-write-on-not-taken.
- **Recommended minimum**: blr, bctr, bdnz (taken and not-taken at boundary CTR=1), bclrl old-LR-as-target, bcl LK-write-on-not-taken. See per-group report for concrete encoding patterns.
---
## Batch 3 — trap + system call (group 14)
Per-group report: `audit-out/group-14-trap-sc.md`.
Group 14 summary: the core trap evaluation (`trap.rs`) is correct — TO bit constants, signed/unsigned
comparison dispatch, and word-vs-doubleword width handling are all ISA-conformant. The live interpreter
arm properly evaluates the TO field (replacing the old unconditional-trap stub). Three MEDIUM issues
found: PC ordering on trap return, missing LEV dispatch for `sc`, and the Xbox 360 typed-trap
convention (`twi 31, r0, IMM`) not handled. Two LOW findings for stale manual snapshots and test gaps.
### PPCBUG-063 — `ctx.pc` already at CIA+4 when `StepResult::Trap` returns
- **Severity**: MEDIUM
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: interpreter.rs:1543 (`ctx.pc += 4`) before interpreter.rs:1549 (`return StepResult::Trap`)
- **Symptom**: any trap handler that reads `ctx.pc` to find the faulting instruction sees CIA+4 instead
of CIA. The existing `tracing::warn!` compensates with `.wrapping_sub(4)`, confirming the asymmetry.
On real hardware, SRR0 = CIA (trapping instruction address). Current risk LOW (no handler inspects
pc), but HIGH if any SEH/exception-delivery path is added (critical for the C++ throw investigation).
- **Fix**: save CIA before incrementing, restore it when firing the trap:
```rust
let trap_pc = ctx.pc;
ctx.pc += 4;
if fired { ctx.pc = trap_pc; return StepResult::Trap; }
```
Alternatively store CIA in a separate `ctx.srr0`-equivalent field and leave `ctx.pc` at NIA.
- **Note**: `sc` correctly leaves `ctx.pc` at NIA (the return address) — that is a different and
correct design choice. The inconsistency between sc and trap is the bug.
### PPCBUG-064 — `sc` ignores `LEV` field; `sc 2` (HVcall) silently misdispatched
- **Severity**: MEDIUM
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: interpreter.rs:915-918
- **Symptom**: `sc 2` (Xbox 360 hypervisor call) returns `StepResult::SystemCall` identically to
`sc 0`. Canary dispatches LEV=0 to `syscall_handler` and LEV=2 to `f.function()` (the HVcall
path). For pure game-title code (LEV=0 only) this is invisible; XDK kernel-mode components and
some HV-aware titles may use `sc 2`.
- **Fix**: decode the 7-bit LEV field (bits 20-26 of SC-form encoding), add a `HypervisorCall`
variant to `StepResult`, and dispatch accordingly.
### PPCBUG-065 — `twi 31, r0, IMM` typed-trap not handled; SIMM type code discarded
- **Severity**: MEDIUM
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: interpreter.rs:1532-1551 (trap arm)
- **Symptom**: `twi 31, r0, IMM` (TO=31=unconditional, RA=r0) is used by the Xbox 360 CRT/kernel
to encode typed C++ exceptions — the 16-bit SIMM carries the exception type discriminator. xenia-rs
fires the trap correctly but discards SIMM. The caller sees a generic `StepResult::Trap` with no
type information, preventing correct C++ SEH dispatch.
- **Canary reference**: `ppc_emit_control.cc:611-616` special-cases `RA==0 && TO==31` and calls
`f.Trap(type)` with the SIMM as the type code.
- **Fix**: add a `trap_type: Option<u16>` payload to `StepResult::Trap`. Detect `twi` with `to()==31`
and `ra()==0` and populate it with `instr.simm16() as u16`.
- **Note**: directly relevant to the Sylpheed `std::runtime_error` throw investigation
(project_xenia_rs_sylpheed_throw_2026_04_28.md) — the typed-trap SIMM carries the CRT exception
class that the kernel uses to route to the correct handler.
### PPCBUG-066 — Stale frozen snapshots in ppc-manual for td/tdi/tw/twi
- **Severity**: LOW
- **Status**: applied (P7 manual regen, 2026-05-02)
- **Location**: `ppc-manual/branch/td.md`, `tdi.md`, `tw.md`, `twi.md`
- **Symptom**: all four show the old unconditional-trap stub (`// For now, just trace and continue`)
instead of the current TO-field-evaluating implementation.
- **Fix**: regenerate after PPCBUG-063 and PPCBUG-065 are resolved.
### PPCBUG-067 — Test gaps for trap and sc
- **Severity**: LOW
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs `#[cfg(test)] mod tests`
- **Missing coverage**: `sc` smoke test (fires SystemCall, advances PC); `td` vs `tw` on 64-bit-clean
operands (width discrimination); `tdi`/`td` signed/unsigned LT/GT conditions; `tw 31, r0, r0`
unconditional `trap` encoding; `twi 31, r0, N` typed-trap; negative simm16 in `twi`.
---
## Batch 3 — SPR / MSR / TB / FPSCR / VSCR moves (group 16)
Per-group report: `audit-out/group-16-spr-msr.md`.
Group 16 summary: the core paths are clean — `mfcr`, `mtcrf`, `mfspr`, `mtspr`, `mftb`, `mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`, `mfvscr`, `mtvscr` are all functionally ISA-correct. The `spr()` decoder accessor correctly inverts the PPC XFX half-swap encoding. The one MEDIUM finding is `mtmsrd` silently ignoring the `L=1` partial-MSR-write semantics. Five LOW test-gap findings cover near-total absence of unit tests for this entire group.
### PPCBUG-078 — `mtmsrd` L=1 partial-MSR-write not modelled
- **Severity**: MEDIUM
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: `interpreter.rs:1458-1461`
- **Symptom**: xenia-rs merges `mtmsr` and `mtmsrd` into a single body that unconditionally writes `ctx.msr = ctx.gpr[instr.rs()]`. PowerISA specifies that `mtmsrd` with instruction bit 15 (`L`) = 1 performs a partial update: only `MSR[EE]` (u64 bit 15) and `MSR[RI]` (u64 bit 0) are modified; all other MSR bits preserved. Kernel code using `mtmsrd L=1` to re-enable external interrupts silently corrupts the entire MSR in xenia-rs. Canary acknowledges the same TODO.
- **Fix**:
```rust
PpcOpcode::mtmsrd => {
let l = (instr.raw >> (31 - 15)) & 1;
if l == 1 {
let mask: u64 = (1u64 << 15) | 1u64;
let rs = ctx.gpr[instr.rs()];
ctx.msr = (ctx.msr & !mask) | (rs & mask);
} else {
ctx.msr = ctx.gpr[instr.rs()];
}
ctx.pc += 4;
}
```
- **Test gap**: zero tests for `mtmsr` or `mtmsrd`.
### PPCBUG-079 — `mtspr` silent drop of unknown-SPR writes without value logging
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:1430-1433`
- **Symptom**: Unknown SPR writes are silently discarded with only a `tracing::warn!()` that omits the value being written. Reduces debuggability; no correctness impact for known Xbox 360 titles.
- **Fix** (optional): `tracing::warn!("mtspr: unimplemented SPR {} <= 0x{:016x}", spr, val)`.
### PPCBUG-080 — `mfvscr` does not zero the upper 96 bits of VD per ISA
- **Severity**: LOW
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: `interpreter.rs:2198-2201`
- **Symptom**: ISA requires `mfvscr VD` to place VSCR in the rightmost word of VD and zero bytes 0-11. xenia-rs copies the full 128-bit `ctx.vscr` into `ctx.vr[VD]`, leaving stale data in bytes 0-11 if `ctx.vscr` was populated from a non-zeroed vector. Canary explicitly zero-extends.
- **Fix**:
```rust
PpcOpcode::mfvscr => {
let vscr_word = ctx.vscr.as_u32x4()[3];
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array([0, 0, 0, vscr_word]);
ctx.pc += 4;
}
```
### PPCBUG-081 — Zero unit tests for `mfcr` / `mtcrf`
- **Severity**: LOW
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs:1436-1453`
- **Recommended additions**: full mfcr round-trip; `mtcrf 0xFF`; `mtcrf 0x80` (CR0 only); `mtcrf 0x38` (ABI CR2|CR3|CR4 restore).
### PPCBUG-082 — Minimal unit tests for `mfspr` / `mtspr`
- **Severity**: LOW
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs:1376-1435`
- **Note**: only DEC and TBL_WRITE covered; add LR, CTR, XER, TBL/TBU, VRSAVE.
### PPCBUG-083 — Zero unit tests for `mftb`
- **Severity**: LOW
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs:1462-1470`
### PPCBUG-084 — Zero interpreter-level round-trip tests for FPSCR move instructions
- **Severity**: LOW
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs:2678-2720`
- **Note**: `fpscr.rs` helper-level tests exist; interpreter dispatch (`mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`) is untested end-to-end.
### PPCBUG-085 — Zero unit tests for `mfvscr` / `mtvscr`
- **Severity**: LOW
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs:2198-2205`
IDs PPCBUG-086 and PPCBUG-087 are unallocated — reserved for group 16 follow-up findings.
---
## Batch 3 — cache + sync (group 17)
Per-group report: `audit-out/group-17-cache-sync.md`.
Group 17 summary: the cleanest group audited so far. Both `dcbz` and `dcbz128` have correct EA computation (ra=0 special case, 64-bit→u32 truncation, alignment masks `& !31` / `& !127`, byte counts 32/128). The nine no-op opcodes (dcbf, dcbi, dcbst, dcbt, dcbtst, icbi, sync, eieio, isync) are all listed in one arm and complete. The `dcbz128` Xbox 360 specific opcode (RT=1 bit distinguishes from dcbz) dispatches correctly. **0 HIGH, 0 MEDIUM, 2 LOW** findings.
### PPCBUG-088 — sync disasm ignores L field; `lwsync` (L=1) shows as "sync"
- **Severity**: LOW
- **Status**: open
- **Location**: `xenia-rs/crates/xenia-cpu/src/disasm.rs:364`
- **Symptom**: The `PpcOpcode::sync` disasm arm outputs `"sync"` unconditionally regardless of the L field (PPC bit 10). When L=1 (word `0x7C2004AC`), the instruction should disassemble as `"lwsync"`. The `extended_mnemonics.json` golden already accepts `"sync"` as output for the lwsync case, meaning the test currently passes with the wrong string.
- **Impact**: Disassembly output for `lwsync` (very common in Xbox 360 acquire-barrier idioms) shows as `sync`. No interpreter impact; both L=0 and L=1 are correctly treated as no-op PC advance.
- **Fix**:
```rust
PpcOpcode::sync => {
// L field at PPC bit 10
if extract_bits(instr.raw, 10, 10) == 1 {
base("lwsync", String::new(), 0)
} else {
base("sync", String::new(), 0)
}
}
```
Update `extended_mnemonics.json` golden to add `"ext_mnemonic": "lwsync"` for that entry.
### PPCBUG-089 — Zero interpreter execution tests for group 17
- **Severity**: LOW
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `xenia-rs/crates/xenia-cpu/src/interpreter.rs` (test module)
- **Symptom**: No `#[test]` covers `dcbz`, `dcbz128`, or any no-op (sync/isync/eieio/dcbf/icbi). A regression in dcbz byte count or alignment would go undetected.
- **Recommended additions**: `dcbz` with misaligned address (verifies 32-byte aligned zero), `dcbz128` with misaligned address (verifies 128-byte aligned zero), both ra=0 and ra!=0 cases, `sync`/`isync`/`dcbf` no-op PC-advance smoke tests.
---
## Batch 3 — CR logical + CR moves (group 15)
Per-group report: `audit-out/group-15-cr-logical.md`.
Group 15 summary: **cleanest group audited to date**. All 8 CR logical ops (`crand`, `crandc`,
`creqv`, `crnand`, `crnor`, `cror`, `crorc`, `crxor`), `mcrf`, and `mcrxr` are ISA-correct.
The `cr_logical` helper's use of `fn(bool, bool) -> bool` prevents the `!u64` bit-pollution class
(PPCBUG-028031 in group 7). CR bit indexing in `get_cr_bit`/`set_cr_bit` is correct (bit/4 =
field, bit%4 = within-field sub-index matching PPC MSB-0 numbering, with sub `{0=LT, 1=GT, 2=EQ,
3=SO}`). `mcrxr` correctly maps XER{SO,OV,CA} to CR{LT,GT,EQ} with SO=false and unconditionally
clears the XER bits. `mcrfs` nibble extraction, field shift formula (`28 - crfs*4`), and
CLEARABLE_MASK (all 14 ISA-clearable exception bits, no FEX/VX) are all correct. One MEDIUM ISA
violation: `mcrfs` omits VX summary recomputation. Two LOW findings: a misleading test comment and
zero coverage for all 8 CR logical ops + `mcrf`.
### PPCBUG-068 — `mcrfs` does not recompute VX summary bit after clearing VX* exception bits
- **Severity**: MEDIUM
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: `interpreter.rs:4250` (`ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK)`)
- **Symptom**: When `mcrfs` clears VX* exception bits (VXSNAN, VXISI, VXIDI, VXZDZ, VXIMZ,
VXVC, VXSOFT, VXSQRT, VXCVI) from any source field, the VX summary bit (FPSCR[2], `fpscr::VX
= 1<<29`) is left stale. If those VX* bits were the only contributors to VX, it should become
0 but remains 1. A subsequent `mcrfs cr0, 0` will then report VX=1 in CR0.EQ, misleading the
caller into thinking an invalid-operation exception is still active.
- **Fix**:
```rust
// After ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK); add:
if (ctx.fpscr & fpscr::VX_ALL) != 0 {
ctx.fpscr |= fpscr::VX;
} else {
ctx.fpscr &= !fpscr::VX;
}
// FEX recomputation omitted — xenia doesn't model enabled-exception dispatch.
```
- **Test gap**: existing test only covers crfS=0 (FX+OX) — no VX* bits involved. Add a test
that sets only VXSNAN, runs `mcrfs cr0, 1`, then verifies VX is now 0.
### PPCBUG-069 — `mcrfs` test comment claims OX(so)=0 but OX is set in the test
- **Severity**: LOW (cosmetic; the assert is correct, only the comment is wrong)
- **Status**: open
- **Location**: `interpreter.rs:5402`
- **Symptom**: Comment reads `"FX(lt)=1 and OX(so)=0"`. FPSCR was set to `(1<<31)|(1<<28)`,
which sets both FX and OX. The nibble is `0b1001`, so `so=true`. The assert `cr[2].as_u8()
== 0b1001` is correct; only the comment is wrong.
- **Fix**: `// FX(lt)=1, FEX(gt)=0, VX(eq)=0, OX(so)=1 → 0b1001 = 9`
### PPCBUG-070 — Zero execution tests for all 8 CR logical ops and `mcrf`
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Locations**: `interpreter.rs:14731484`
- **Missing minimum**: `crclr` idiom (`crxor BT,BT,BT`, BT=1 → 0), `crset` idiom
(`creqv BT,BT,BT`, BT=0 → 1), `crmove` idiom (`cror BT,BA,BA`), `crnot` idiom
(`crnor BT,BA,BA`, BA=1 → 0), cross-field `crand`/`crandc`, and a full `mcrf
cr0, cr3` field-copy + source-field-intact test.
---
## Pre-pass hints REFUTED by audit
These were flagged by the orchestrator's regex scan but the subagents found them to be safe:
- **`divwux` writeback** (interpreter.rs:390) — both operands cast to `u32` before division, `as u64` zero-extends correctly. **Clean.**
- **`mulhwx` intermediate cast** (interpreter.rs:349) — `((result >> 32) as i32 as i64 as u64) & 0xFFFF_FFFF` is redundant but the trailing mask saves correctness. Cosmetic only.
- **`mulhwux` writeback** (interpreter.rs:359) — `(result >> 32) & 0xFFFF_FFFF` clean unsigned. Clean.
- **CR0 stale-prepass-claim**: pre-pass document mentioned `result as i32 as i64`; live code actually uses `result as i64` — so the *claim that the live form is i64* is **correct**, but the prepass implied an i32 form was already there. PPCBUG-020 is the real finding.
---
## Batch 4 — load float (group 23)
Per-group report: `audit-out/group-23-load-float.md`.
Group 23 summary: the double-precision load family (`lfd`, `lfdu`, `lfdux`, `lfdx`) is fully
ISA-correct — EA computation, endianness, update-form writeback, and bit-pattern fidelity are
all clean. The single-precision family (`lfs`, `lfsu`, `lfsux`, `lfsx`) has one HIGH bug:
Rust's `as f64` float cast compiles to x86 `CVTSS2SD` which unconditionally sets the IEEE quiet
bit in the output, silently converting f32 SNaN loads to f64 QNaN. The ISA requires the SNaN
to pass through unchanged. FPSCR.NI does not apply to loads (correct by omission). One LOW
test-gap finding. **2 IDs used (PPCBUG-128, PPCBUG-129). 8 IDs unallocated (PPCBUG-130..137).**
### PPCBUG-128 — lfs/lfsu/lfsx/lfsux silently quieten SNaN via `as f64` Rust float cast
- **Severity**: HIGH
- **Status**: open
- **Locations**: interpreter.rs:1064 (lfs), 1070 (lfsx), 1087 (lfsu), 1093 (lfsux)
- **Symptom**: All four single-precision load arms use `mem.read_f32(ea) as f64` where
`read_f32` = `f32::from_bits(read_u32(ea))`. The `as f64` Rust float cast compiles to x86
`CVTSS2SD`, which unconditionally sets bit 51 of the f64 mantissa (the IEEE quiet/signalling
discriminator bit) for any NaN input. An f32 SNaN (e.g. `0x7F800001`) is loaded and written
to the FPR as the f64 QNaN `0x7FF8000002000000` instead of the SNaN `0x7FF0000002000000`.
**ISA requirement**: "A signalling NaN passes through unchanged into the FPR — it will signal
at the next FP arithmetic instruction." (lfs.md Special Cases). The FPR must hold the SNaN;
VXSNAN fires at the consuming arithmetic op, not at the load.
**Impact**: (a) Game code storing f32 SNaN sentinels (physics engines mark unset float slots
with SNaN) and then loading+inspecting them: `fpscr::is_snan(ctx.fpr[rd])` returns false
after the load, breaking sentinel detection. (b) Arithmetic ops consuming the loaded value
see a QNaN rather than SNaN, so VXSNAN is never set; games relying on VXSNAN to detect
uninitialized-read bugs get false negatives.
- **Canary parity**: Canary's JIT also uses CVTSS2SD via `f.Convert()`. Both emulators share
this deviation. The bug is a structural consequence of using semantic float widening rather
than a bit-pattern-preserving widening routine.
- **Fix**: replace the float cast with a bit-manipulation widening that preserves the SNaN bit:
```rust
fn widen_f32_bits_to_f64(raw32: u32) -> u64 {
let sign = ((raw32 >> 31) as u64) << 63;
let exp32 = ((raw32 >> 23) & 0xFF) as i32;
let mant32 = (raw32 & 0x007F_FFFF) as u64;
if exp32 == 0xFF {
// NaN or Infinity — propagate mantissa left-shifted by 29 bits.
// SNaN (bit22=0) stays SNaN (bit51=0); QNaN (bit22=1) stays QNaN (bit51=1).
sign | (0x7FFu64 << 52) | (mant32 << 29)
} else if exp32 == 0 {
// ±Zero or subnormal f32.
if mant32 == 0 { return sign; } // ±zero
// Subnormal: normalize by finding leading bit, then adjust exponent.
let shift = mant32.leading_zeros() - (64 - 23);
let exp64 = (1023u64 - 126).wrapping_sub(shift as u64);
let mant64 = (mant32 << (shift + 1 + 29)) & 0x000F_FFFF_FFFF_FFFF;
sign | (exp64 << 52) | mant64
} else {
// Normal f32 → normal f64.
let exp64 = (exp32 as u64) - 127 + 1023;
sign | (exp64 << 52) | (mant32 << 29)
}
}
// In each lfs* arm:
ctx.fpr[instr.rd()] = f64::from_bits(widen_f32_bits_to_f64(mem.read_u32(ea)));
```
This function also correctly handles subnormal f32 → normal f64 widening (which the `as f64`
cast already gets right numerically, but now goes through a consistent code path).
- **Test gap**: add a test loading an f32 SNaN (`0x7F800001`) via `lfs` and asserting
`fpscr::is_snan(ctx.fpr[rd])` is `true` and bit 51 of `ctx.fpr[rd].to_bits()` is 0.
### PPCBUG-129 — Zero interpreter execution tests for all 8 float-load opcodes
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Locations**: interpreter.rs test module; `tests/disasm_goldens.rs:249-250` (disasm-only)
- **Symptom**: No `#[test]`-decorated function exercises any float-load interpreter arm.
A regression in EA computation, endianness, f32→f64 widening, or update-form writeback
would go undetected. The SNaN bug (PPCBUG-128) was undetected partly due to this gap.
- **Recommended minimum**:
1. `lfs` normal: `0x3F800000` (1.0f32) → assert `fpr[rd] == 1.0f64` exact.
2. `lfs` negative displacement: base minus 4.
3. `lfs` ra=0 path (absolute addressing).
4. `lfd` normal: store PI bits, assert exact bit equality via `.to_bits()`.
5. `lfd` SNaN: store `0x7FF0_0000_0000_0001u64`, assert exact bit equality after load.
6. `lfsu` / `lfsux` / `lfdu` / `lfdux`: verify loaded FPR value AND rA update address.
7. After PPCBUG-128 fix: `lfs` SNaN round-trip test.
IDs PPCBUG-130 through PPCBUG-137 are unallocated — no further bugs found in group 23.
---
## Files modified by the audit
- `xenia-rs/audit-prepass-findings.md` — Phase A pre-pass red flags (orchestrator regex output).
- `xenia-rs/audit-out/group-01-add-imm.md` — Group 1 report (Sonnet subagent).
- `xenia-rs/audit-out/group-02-add-reg.md` — Group 2 report.
- `xenia-rs/audit-out/group-03-sub-reg.md` — Group 3 report.
- `xenia-rs/audit-out/group-04-multiply.md` — Group 4 report.
- `xenia-rs/audit-out/group-05-divide.md` — Group 5 report.
- `xenia-rs/audit-out/group-06-logic-imm.md` — Group 6 report.
- `xenia-rs/audit-out/group-09-word-rotate.md` — Group 9 report.
- `xenia-rs/audit-out/group-13-branch.md` — Group 13 report.
- `xenia-rs/audit-out/group-14-trap-sc.md` — Group 14 report.
- `xenia-rs/audit-out/group-15-cr-logical.md` — Group 15 report.
- `xenia-rs/audit-out/group-16-spr-msr.md` — Group 16 report.
- `xenia-rs/audit-out/group-17-cache-sync.md` — Group 17 report.
- `xenia-rs/audit-out/group-18-load-byte.md` — Group 18 report.
- `xenia-rs/audit-out/group-19-load-halfword.md` — Group 19 report.
- `xenia-rs/audit-out/group-21-load-doubleword.md` — Group 21 report.
- `xenia-rs/audit-out/group-22-load-mlsr.md` — Group 22 report.
- `xenia-rs/audit-out/group-23-load-float.md` — Group 23 report.
- `xenia-rs/audit-out/group-24-store-byte-half.md` — Group 24 report.
- `xenia-rs/audit-out/group-26-store-doubleword.md` — Group 26 report.
- `xenia-rs/audit-findings.md` — this consolidated tracker.
**No source code under `xenia-rs/crates/` has been modified.**
---
## Batch 4 — load byte (group 18)
Per-group report: `audit-out/group-18-load-byte.md`.
Group 18 summary: **cleanest group audited to date — zero HIGH or MEDIUM bugs.** All four opcodes
(`lbz`, `lbzu`, `lbzx`, `lbzux`) are ISA-correct: EA computation (rA=0 special case, D-field
sign-extension, 32-bit EA truncation), zero-extension of the byte result to 64 bits, and
update-form writeback all match the ISA spec and Canary cross-reference. Two LOW findings only.
### PPCBUG-090 — lbzu/lbzux: rD==rA "invalid form" silently misloads rD
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
- **Status**: open
- **Location**: interpreter.rs:951-956 (lbzu), 963-968 (lbzux)
- **Symptom**: When `rD == rA` (invalid form, UISA undefined), the byte load into `gpr[rD]` at
line 953/965 is immediately overwritten by the EA writeback at line 954/966. Net result:
`gpr[rD]` holds the EA, not the loaded byte. Canary has the same behaviour. No practical impact
under normal compiler output.
- **Recommendation**: add `debug_assert!(instr.rd() != instr.ra())` in debug builds.
### PPCBUG-091 — Zero interpreter execution tests for all four lbz* opcodes
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs test module; disasm_goldens.rs:247 (disasm-only, no execution)
- **Symptom**: No `#[test]` exercises lines 945-968. A regression in EA computation,
zero-extension, or the update writeback would go undetected.
- **Recommended minimum**: `lbz` with ra=0 + negative displacement; `lbzu` normal case (verify
both byte result and rA update); `lbzx` with ra=0; `lbzux` normal case. Each test should
assert `gpr[rD] <= 0xFF` to catch any future accidental sign-extension.
IDs PPCBUG-092, PPCBUG-093, PPCBUG-094 are unallocated — no further bugs found in group 18.
---
## Batch 4 — load halfword (group 19)
Per-group report: `audit-out/group-19-load-halfword.md`.
Group 19 summary: **4 HIGH bugs confirmed — all pre-pass flags validated.** The four `lha*` opcodes
(`lha`, `lhax`, `lhau`, `lhaux`) all use `as i16 as i64 as u64`, sign-extending a negative halfword
to 64 bits in violation of the 32-bit ABI. Every negative halfword load (common for `int16_t` PCM
samples, packed vertex deltas, `short[]` arrays) actively poisons the upper 32 bits of the
destination GPR — identical shape to the `addis` bug. The four `lhz*` opcodes and `lhbrx` are all
clean (`as u64` zero-extension; `swap_bytes() as u64` byte-reversal; correct endian handling; correct
EA computation and update writebacks). Two LOW findings: rD==rA invalid-form in update variants,
and zero unit tests for all nine opcodes.
### PPCBUG-095 — `lha`: GPR writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:990
- **Symptom**: `mem.read_u16(ea) as i16 as i64 as u64` — memory `0x8000` writes
`0xFFFFFFFF_FFFF8000` instead of `0x00000000_FFFF8000`. Active GPR poisoning for every
negative halfword. Common trigger: `int16_t` struct fields, PCM samples, packed vertex deltas.
- **Fix**:
```rust
ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64;
```
- **Test gap**: zero unit tests. Add: memory `0x8000` → `gpr[rD] == 0x00000000_FFFF8000`;
memory `0x7FFF` → `gpr[rD] == 0x00000000_00007FFF`.
### PPCBUG-096 — `lhax`: GPR writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:996
- **Symptom**: identical to PPCBUG-095. Indexed form emitted for array access with GPR index.
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
- **Test gap**: zero unit tests.
### PPCBUG-097 — `lhau`: GPR writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:1007
- **Symptom**: identical to PPCBUG-095. Update form emitted for auto-incrementing `short[]` loops;
poison accumulates across all iterations.
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
- **Test gap**: zero unit tests. Add: verify both `gpr[rD]` (upper-32 = 0) and `gpr[rA]` (EA update).
### PPCBUG-098 — `lhaux`: GPR writeback sign-extends to 64 bits
- **Severity**: HIGH
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Location**: interpreter.rs:1013
- **Symptom**: identical to PPCBUG-095, update+indexed form.
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
- **Test gap**: zero unit tests.
- **Note**: PPCBUG-095..098 are the same one-line fix at four sites. Fix session sweep:
`rg -n 'as i16 as i64 as u64' interpreter.rs` finds exactly these four lines.
### PPCBUG-099 — `lhau`/`lhaux`: rD==rA invalid-form silently destroys load result
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
- **Status**: open
- **Location**: interpreter.rs:1005-1016
- **Symptom**: same as PPCBUG-090 (`lbzu`/`lbzux`) — EA writeback overwrites `gpr[rD]` when
`rD == rA`. Net: `gpr[rD]` holds EA, not the loaded value.
- **Recommendation**: `debug_assert!(instr.rd() != instr.ra())` in both arms.
### PPCBUG-100 — Zero execution tests for all nine halfword-load opcodes
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs test module
- **Symptom**: No `#[test]` exercises any of the 9 opcodes. The HIGH sign-extension bug would
have been caught by any test that checks `gpr[rD] <= 0x0000_0000_FFFF_FFFF`.
- **Recommended minimum**: `lha` with negative halfword (assert upper 32 zero), `lhz` same,
`lhau` verify both rD and rA, `lhzux` verify both rD and rA, `lhbrx` verify byte-swap.
IDs PPCBUG-101, PPCBUG-102, PPCBUG-103, PPCBUG-104 are unallocated — no further bugs found in group 19.
---
## Batch 4 — load word (group 20)
Per-group report: `audit-out/group-20-load-word.md`.
Group 20 summary: **1 HIGH bug (reservation invalidation never called), 1 MEDIUM (cross-thread
reservation isolation), 1 MEDIUM (lwa 64-bit sign-extension hazard), 3 LOW test gaps.** The
zero-extending family (`lwz`/`lwzu`/`lwzx`/`lwzux`) is entirely correct — `mem.read_u32(ea) as u64`
cleanly zero-extends; EA computation, update writebacks, and RA0 handling all match ISA and Canary.
`lwbrx` is correct: the double-swap (`from_be_bytes` then `swap_bytes()`) correctly produces a
little-endian word read, zero-extended. The sign-extending family (`lwa`/`lwax`/`lwaux`) is
ISA-correct for 64-bit mode but a 32-bit-ABI hazard — classified MEDIUM because `lwa` is a
64-bit-mode instruction unlikely to appear in Xbox 360 32-bit-ABI binaries. The HIGH finding is
that `ReservationTable::invalidate_for_write` is defined and unit-tested but **never called** from
any store instruction, breaking multi-threaded `lwarx`/`stwcx.` atomicity under `--parallel`.
### PPCBUG-105 — lwa / lwax / lwaux sign-extend to 64 bits; 32-bit-ABI hazard
- **Severity**: MEDIUM
- **Status**: applied (P4 d945aea, 2026-05-02)
- **Locations**: interpreter.rs:1032 (lwa), 1038 (lwax), 1043 (lwaux)
- **Symptom**: `mem.read_u32(ea) as i32 as i64 as u64` — a word with high bit set (e.g. `0x8000_0000`)
writes `0xFFFF_FFFF_8000_0000` to rD. ISA-correct for 64-bit-mode `lwa`. In 32-bit ABI, the poisoned
upper 32 bits produce wrong CA / CR results in downstream 64-bit unsigned compares — same shape as
the `addis` bug.
- **Likelihood**: LOW on real Xbox 360 32-bit-ABI binaries (compilers use `lwz` for word loads; `lwa`
is a 64-bit-mode instruction). Risk elevated if the binary contains 64-bit-mode kernel code.
- **Note**: Canary also uses `SignExtend(..., INT64_TYPE)` — both are ISA-correct. Pre-pass flagged
HIGH; audit downgrades to MEDIUM because `lwa` is unlikely in 32-bit-ABI Xbox 360 code.
### PPCBUG-106 — lwa no-update-form undocumented (LOW / informational)
- **Severity**: LOW
- **Status**: open
- **Location**: interpreter.rs:1029-1034
- **Symptom**: `lwa` arm has no RA writeback. Correct per ISA (no `lwau` in PowerISA). Undocumented.
- **Fix**: add comment `// No lwau in PowerISA; lwa is DS-form non-update only.`
### PPCBUG-107 — `invalidate_for_write` never called from stores; lwarx/stwcx. atomicity broken under `--parallel` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: `reservation.rs:234` (definition, never called from interpreter); `interpreter.rs:1182-1278` (all store arms, none call it)
- **Symptom**: `ReservationTable::invalidate_for_write(addr)` is defined and correctly unit-tested but
no interpreter store arm calls it. Under M3 `--parallel` with the table enabled, a plain `stw` by
thread B to a cache line reserved by thread A does NOT clear thread A's table slot. Thread A's
subsequent `stwcx.` calls `t.try_commit()`, which succeeds — spurious success, violating
store-conditional atomicity. All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic
counters) built on `lwarx`/`stwcx.` are broken in multi-threaded mode.
- **Concrete scenario**: thread A: `lwarx r3, 0, r4` (reserves line). Thread B: `stw r5, 0(r4)`
(same address; should invalidate). Thread A: `stwcx. r6, 0, r4` → should fail (CR0.EQ=0) but
succeeds (CR0.EQ=1). Thread A's store silently overwrites thread B's store.
- **Fix**: in every store arm, before `mem.write_*`, add:
```rust
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
`has_active_reservers()` is a single `Relaxed` atomic load — negligible cost for non-atomic code
(common case returns false immediately). Alternative: inject the table into the memory layer so
`write_u32`/`write_u64` call it automatically.
- **Test gap**: add interpreter-level test: `lwarx` reserve a line, intervening `stw` to the same
line, `stwcx.` must fail (CR0.EQ=0).
### PPCBUG-108 — Legacy per-ctx reservation path: cross-thread invalidation impossible (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: interpreter.rs:1148-1153 (stwcx legacy path)
- **Symptom**: When table is None/disabled, reservation state lives in per-thread `PpcContext` fields.
A store by thread B cannot clear `ctx_A.has_reservation`. Safe in strict lockstep (one host thread).
Broken under real parallelism with the table inadvertently disabled.
- **Fix**: add a `debug_assert!` in `lwarx`/`stwcx.` that table is enabled when multiple host threads
are active. The M3 scheduler should always enable the table before spawning a second host thread.
### PPCBUG-109 — Zero unit tests for lwa / lwax / lwaux
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs test module
- **Recommended minimum**:
- `lwa` with `0x8000_0000` → `gpr[rD] == 0xFFFF_FFFF_8000_0000`.
- `lwa` with `0x7FFF_FFFF` → `gpr[rD] == 0x0000_0000_7FFF_FFFF`.
- `lwax` with ra=0.
- `lwaux`: verify loaded value and rA update.
### PPCBUG-110 — Zero unit tests for lwbrx
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs test module
- **Recommended minimum**: memory `[0x11, 0x22, 0x33, 0x44]` at EA → `gpr[rD] == 0x4433_2211`; ra=0;
assert `gpr[rD] <= 0xFFFF_FFFF`.
### PPCBUG-111 — lwarx / stwcx test suite missing key cases
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs:5167-5207 (two tests exist)
- **Missing**: `lwarx` ra=0; `stwcx.` without prior `lwarx` → CR0.EQ=0; second `lwarx` displaces
first; post-PPCBUG-107-fix store-invalidation test; `lwarx` zero-extension assertion.
IDs PPCBUG-112, PPCBUG-113, PPCBUG-114 are unallocated — reserved for group 20 follow-up.
---
## Batch 4 — load doubleword (group 21)
Per-group report: `audit-out/group-21-load-doubleword.md`.
Group 21 summary: **cleanest load group audited — zero HIGH bugs.** All six instructions (`ld`,
`ldu`, `ldux`, `ldx`, `ldbrx`, `ldarx`) are ISA-correct: 64-bit load, big-endian byte order,
EA computation (RA=0, DS-form, u32 truncation), update-form writebacks, and reservation tracking
all pass scrutiny against Canary and the ISA spec. `ldbrx`'s double-swap pattern was investigated
and confirmed correct (PPCBUG-115 informational). One MEDIUM documentation finding, two LOW findings.
### PPCBUG-115 — `ldbrx` byte-swap confirmed correct (informational)
- **Severity**: LOW (confirmed clean, informational only)
- **Status**: wontfix
- **Location**: `interpreter.rs:4157-4159`
- **Analysis**: `mem.read_u64` uses `u64::from_be_bytes` internally (confirmed in `heap.rs:404`
and interpreter's `TestMem`), so it returns the BE-decoded value. Calling `.swap_bytes()`
re-reverses to give the LE interpretation, which is exactly what `ldbrx` specifies. Canary
achieves the same result by skipping `ByteSwap` at the HIR level. Both approaches are correct.
See per-group report for full byte-level worked example.
### PPCBUG-116 — `ld`/`ldx`/`ldu`/`ldux` as 32-bit-ABI poison sources (documentation)
- **Severity**: MEDIUM (awareness/documentation; no change to load instructions themselves)
- **Status**: open
- **Location**: `interpreter.rs:1017-1058`
- **Symptom**: These instructions correctly write full 64-bit values to the destination GPR.
Xbox 360 32-bit-ABI binaries legitimately emit them for TOC loads, vtable loads, and kernel
structure accesses — all of which may have non-zero upper 32 bits. Until PPCBUG-001..089
arithmetic truncation fixes land, such values can flow into 64-bit compares and corrupt CA
bits and CR fields — the inverse of the `addis` bug (pollution from memory side vs. sign-ext).
- **Key guard already in place**: PPCBUG-007's `subfcx` CA fix truncates operands to u32 before
the compare, correctly handling `ld`-originated 64-bit values. This is the most critical
downstream consumer and the fix is already specified.
### PPCBUG-117 — Stale frozen snapshot in `ppc-manual/memory/ldarx.md`
- **Severity**: LOW
- **Status**: applied (P7 manual regen, 2026-05-02)
- **Location**: `ppc-manual/memory/ldarx.md` (frozen snapshot section)
- **Symptom**: Snapshot uses old field name `ctx.reserved_addr`; live code uses
`ctx.reserved_line = ea & !RESERVATION_MASK` (M3 refactor). Cosmetic only.
- **Fix**: Regenerate snapshot after M3 field names settle.
### PPCBUG-118 — Zero functional tests for `ld`, `ldx`, `ldu`, `ldux`, `ldbrx`
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` test module
- **Symptom**: `test_ldarx_stdcx_pair` covers `ldarx`/`stdcx` only. Five doubleword load
variants are untested. Recommended minimum: `ld` with positive DS, negative DS, and RA=0;
`ldx` basic; `ldu` with RA writeback check; `ldux` with RA writeback check; `ldbrx` with
asymmetric data to distinguish output from plain `ldx`.
IDs PPCBUG-119 through PPCBUG-122 are unallocated — reserved for group 21 follow-up.
---
## Batch 4 — load multiple/string (group 22)
Per-group report: `audit-out/group-22-load-mlsr.md`.
Group 22 summary: one structural HIGH bug (`lswx` is always a no-op due to missing XER TBC field),
one MEDIUM coupling bug (the write path discards TBC on `mtspr XER`), one MEDIUM ISA-form deviation
(`lmw` does not skip RA-in-range stores unlike Canary), and two LOW findings. The `lswi` body itself
is correct; `lmw` core logic (loop bound, zero-extension, byte-packing, register wraparound) is clean.
Zero unit tests across all three opcodes.
### PPCBUG-123 — `lswx` XER TBC field not modeled; always loads 0 bytes
- **Severity**: HIGH
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: `context.rs:235-237` (`xer()` method) + `interpreter.rs:4172`
- **Symptom**: `ctx.xer()` assembles only SO[31], OV[30], CA[29] — bits 028 are always zero.
`lswx` reads `ctx.xer() & 0x7F` expecting the XER TBC byte-count field at bits 06, but always
gets 0. The `while bytes_left > 0` loop never executes; **`lswx` is permanently a no-op** —
no bytes are loaded, no destination registers are written. The companion `stswx` at
`interpreter.rs:4191` has the identical pattern and is equally broken.
- **Root cause**: `PpcContext` has no `xer_tbc` field. Neither `xer()` nor `set_xer()` model
XER[25:31]. Any `mtspr XER, rN` that sets a non-zero byte count silently discards it (PPCBUG-124).
- **Cross-reference**: Canary marks `lswx` as `XEINSTRNOTIMPLEMENTED()` — xenia-rs implemented the
body but left the XER infrastructure incomplete.
- **Fix**:
1. Add `pub xer_tbc: u8` to `PpcContext`.
2. In `xer()`: `| (self.xer_tbc as u32)` for bits 06.
3. In `set_xer()`: `self.xer_tbc = (val & 0x7F) as u8`.
The `lswx` body is then correct as-is.
- **Test gap**: zero unit tests. After fix: `mtspr XER, r3` (r3=4) then `lswx r5, 0, r4` should
write exactly 4 bytes into r5 (high byte = first byte at EA).
### PPCBUG-124 — `set_xer()` discards TBC on `mtspr XER` (structural coupling to PPCBUG-123)
- **Severity**: MEDIUM (must land with PPCBUG-123)
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: `context.rs:239-244`
- **Symptom**: `set_xer()` writes only SO/OV/CA from the 32-bit value, silently discarding bits 028
(including the 7-bit TBC field). Any guest `mtspr XER, rN` with a non-zero byte count loses that
count; subsequent `lswx`/`stswx` see TBC=0. Fix is the same three-line change as PPCBUG-123.
### PPCBUG-125 — `lmw` missing RA-in-destination-range skip
- **Severity**: MEDIUM
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: `interpreter.rs:1515`
- **Symptom**: PowerISA declares `lmw rT, D(rA)` invalid when `rA` is in `[rT..31]`. Canary skips
the store to `rA` in that case (`if (i.D.RT + j == i.D.RA) continue`). xenia-rs pre-computes EA
before the loop (so EA values remain correct), but overwrites `rA` with the loaded word instead of
preserving it. Result differs from Canary for this invalid encoding. Any program that relies on RA
surviving a nominally invalid `lmw` will see the wrong value.
- **Fix**:
```rust
for r in instr.rd()..32 {
if r == instr.ra() { ea = ea.wrapping_add(4); continue; }
ctx.gpr[r] = mem.read_u32(ea as u32) as u64;
ea = ea.wrapping_add(4);
}
```
- **Test gap**: zero tests. Add: `lmw r28, 0(r28)` (RA=RT=28) — after fix, gpr[28] unchanged.
### PPCBUG-126 — `lswi` uses `instr.rb()` instead of `instr.nb()` for the NB field
- **Severity**: LOW (maintenance hazard, not a correctness bug)
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: `interpreter.rs:1340`
- **Symptom**: `instr.rb()` and `instr.nb()` both extract bits 1620 and return identical values.
Using `rb()` misrepresents the operand as a register reference rather than a 5-bit immediate count.
The companion `stswi` at line 1359 has the same pattern. A future `rb()` type-system refactor
could break `lswi`/`stswi` silently.
- **Fix**: `instr.nb()` at both sites.
### PPCBUG-127 — Zero execution tests for lmw, lswi, lswx
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` test module
- **Symptom**: No `#[test]` exists for any of the three opcodes. A regression in loop bounds,
byte-packing, EA computation, or the NB=0 special case would go undetected.
- **Recommended minimum**: `lmw r30, 0(r1)` (2-word load); `lswi r3, r4, 8` (2-word byte pack);
`lswi r31, r4, 8` (register wraparound → r31 and r0); `lswi r3, r4, 0` (NB=0→32 special case);
post-PPCBUG-123 fix: `lswx` with XER TBC=4 (1-word load), TBC=0 (no-op), TBC=5 (partial word).
---
## Batch 5 — store byte/halfword (group 24)
Per-group report: `audit-out/group-24-store-byte-half.md`.
Group 24 summary: **3 findings: 1 HIGH (cross-cutting reservation invalidation), 1 LOW/informational
(update-form zero-extension correct but undocumented), 1 LOW (zero test coverage).** EA computation,
value truncation (`as u8`, `as u16`), RA=0 special cases, update-form writeback zero-extension,
big-endian `mem.write_u16` path, and `sthbrx` byte-reverse logic are all ISA-correct. The single
HIGH finding is the systemic absence of `invalidate_for_write` calls — same class as PPCBUG-107,
now documented for all 9 byte/halfword store opcodes.
### PPCBUG-130 — All 9 store-byte/halfword opcodes missing `invalidate_for_write` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: `interpreter.rs:1207` (stb), `1213` (stbu), `1219` (stbx), `1225` (stbux),
`1231` (sth), `1237` (sthu), `1243` (sthx), `1249` (sthux), `1337` (sthbrx)
- **Class**: same root cause as PPCBUG-107 (stw/stdcx family — `invalidate_for_write` never called
from any store arm).
- **Symptom**: Under `--parallel`, a `stb`, `sth`, or `sthbrx` (or any variant in this group) to a
cache line reserved by another thread via `lwarx`/`ldarx` does NOT clear the table slot.
The reserving thread's subsequent `stwcx.`/`stdcx.` spuriously succeeds even though an
intervening sub-word store has modified the line — violating store-conditional atomicity. Affects
any lock-free protocol that uses byte or halfword stores adjacent to or inside a `lwarx`/`stwcx.`
loop (e.g. byte-level spinlocks, tagged-pointer updates, audio ring-buffer flags).
- **Fix** (per PPCBUG-107 pattern): before each `mem.write_u8/u16`, add:
```rust
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
- **Note**: PPCBUG-107 is the canonical parent finding. PPCBUG-130 documents that the byte/halfword
group must be included in the same fix sweep.
### PPCBUG-131 — Update-form rA zero-extension correct but undocumented (LOW / informational)
- **Severity**: LOW (informational — behavior is correct)
- **Status**: open (documentation gap)
- **Locations**: `interpreter.rs:1216` (stbu), `1228` (stbux), `1240` (sthu), `1252` (sthux)
- **Symptom**: Each update-form arm writes `ctx.gpr[instr.ra()] = ea as u64` where `ea: u32`.
This zero-extends to 64 bits — correct in the 32-bit ABI (addresses are 32-bit; upper half must
be zero). No bug, but there is no comment explaining the deliberate zero-extension. A maintainer
who computes EA as `u64` throughout and drops the `as u32` intermediate would silently
sign-extend negative displacements into rA, mirroring the `addis` bug shape.
- **Fix**: add comment `// EA is u32; zero-extend into rA (32-bit ABI: upper 32 bits must be 0).`
at each update-form writeback line.
### PPCBUG-132 — Zero unit tests for all 9 store-byte/halfword opcodes (LOW)
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` test module
- **Symptom**: No `test_stb*` or `test_sth*` functions exist. Any regression in EA computation,
value truncation, update-form writeback order, or `sthbrx` byte-swap logic would be invisible.
- **Recommended minimum**: `stb` basic + ra=0; `stbu`/`stbux` with rA writeback check; `stbx`
ra=0; `sth` big-endian byte check (`0xDEAD` → `[0xDE, 0xAD]`); `sthu`/`sthux` writeback;
`sthbrx` byte-reversed check (`0xDEAD` → `[0xAD, 0xDE]`); post-PPCBUG-130 fix: `lwarx` + `stb`
to same line + `stwcx.` → CR0.EQ=0.
IDs PPCBUG-133 through PPCBUG-139 are unallocated — reserved for group 24 follow-up.
---
## Batch 5 — store word (group 25)
Per-group report: `audit-out/group-25-store-word.md`.
Group 25 summary: **8 findings: 4 HIGH (reservation invalidation per opcode), 0 MEDIUM, 4 LOW.**
Core arithmetic and semantics are entirely clean for all 6 opcodes. EA computation (RA=0 guards,
simm16 sign-extend, u32 truncation), value truncation (`as u32`), update-form writebacks
(`ea as u64` zero-extension), big-endian `mem.write_u32`, `stwbrx` byte-reversal, and `stwcx`
conditional-store logic (cache-line reservation check, CAS, CR0 update, reservation always
cleared) all match the ISA and Canary exactly. The `stwcx` manual snapshot is stale (uses old
`reserved_addr` field name; live code correctly uses `reserved_line` at cache-line granularity —
actually MORE correct than the snapshot). Dominant finding is the same systemic miss as PPCBUG-107
and PPCBUG-130: `invalidate_for_write` is never called from any plain store arm.
### PPCBUG-140 — stw: missing `invalidate_for_write` call (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:1183-1188`
- **Systemic root cause**: PPCBUG-107
- **Symptom**: Under `--parallel` with the ReservationTable enabled, a plain `stw` by thread B
to a cache line reserved by thread A does not clear thread A's table slot. Thread A's
subsequent `stwcx.` spuriously succeeds (CR0.EQ=1) even though thread B has written the line.
All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic counters) built on
`lwarx`/`stwcx.` are broken in multi-threaded mode. `stw` is the most common store instruction —
every stack write, pointer store, and integer field write is affected.
- **Fix**: Before `mem.write_u32(ea, ...)`:
```rust
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
`has_active_reservers()` is a single `Relaxed` load — zero cost in the common non-atomic case.
### PPCBUG-141 — stwu: missing `invalidate_for_write` call (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:1189-1194`
- **Systemic root cause**: PPCBUG-107
- **Symptom**: Same class as PPCBUG-140. `stwu r1, -N(r1)` is the canonical function-prologue
stack-allocation idiom emitted by every compiled function. A thread holding a reservation on
the stack region would see spurious `stwcx.` success after any prologue store.
- **Fix**: Same pattern as PPCBUG-140, inserted before `mem.write_u32`.
### PPCBUG-142 — stwx: missing `invalidate_for_write` call (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:1195-1200`
- **Systemic root cause**: PPCBUG-107
- **Symptom**: Same class as PPCBUG-140. `stwx` is the indexed store used for array writes and
indirect dereferences — common in loops that may run concurrently with reservation holders.
- **Fix**: Same pattern as PPCBUG-140.
### PPCBUG-143 — stwux: missing `invalidate_for_write` call (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:1201-1206`
- **Systemic root cause**: PPCBUG-107
- **Symptom**: Same class as PPCBUG-140. Less common than stw/stwu but still a plain store
that must participate in reservation invalidation.
- **Fix**: Same pattern as PPCBUG-140.
### PPCBUG-144 — stwbrx: missing `invalidate_for_write` call (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:1568-1573`
- **Systemic root cause**: PPCBUG-107
- **Symptom**: Same class as PPCBUG-140. Byte-reversed stores (used for LE-payload GPU command
buffers, file format fields) are still plain stores with respect to the reservation protocol.
- **Fix**: Same pattern as PPCBUG-140. `ea` is already a `u32` at this point (line 1570).
### PPCBUG-145 — stwcx: stale manual snapshot uses `reserved_addr` (LOW)
- **Severity**: LOW (documentation only; live code is correct)
- **Status**: applied (P7 manual regen, 2026-05-02)
- **Location**: `ppc-manual/memory/stwcx.md` (frozen snapshot section)
- **Symptom**: The frozen snapshot shows `ctx.reserved_addr == ea` (exact-word comparison).
The live code at `interpreter.rs:1137-1153` uses `ctx.reserved_line == line` where
`line = ea & !RESERVATION_MASK` (cache-line comparison). The live code is MORE correct per
ISA (PowerISA 2.07B defines reservation at cache-line granularity). Snapshot reflects an
earlier implementation before M3 introduced `RESERVATION_MASK` and `reserved_line`.
Tests confirm live behavior is correct (`stwcx_succeeds_within_same_cache_line`).
- **Fix**: Regenerate the `stwcx.md` snapshot to show current field names and add a note on
the ISA cache-line granule.
### PPCBUG-146 — Zero unit tests for stwu / stwx / stwux / stwbrx (LOW)
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` test module
- **Symptom**: Four of the six group-25 opcodes have zero dedicated unit tests.
- **Recommended minimum**:
- `stwu r3, -8(r1)`: verify memory at `r1-8` and `gpr[1]` updated to `old_r1 - 8`.
- `stwx ra=0`: store at `gpr[rb]`, verify memory and no RA writeback.
- `stwux`: indexed update — verify store and RA writeback.
- `stwbrx 0x11223344`: bytes at EA should be `[0x44, 0x33, 0x22, 0x11]`.
### PPCBUG-147 — stwcx test suite missing key cases (LOW)
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs:5167-5208` (two existing tests)
- **Missing**:
- `stwcx.` without prior `lwarx` → CR0.EQ=0, memory not written.
- Post-PPCBUG-140-fix: `lwarx` then `stw` to same line then `stwcx.` → CR0.EQ=0.
- RA=0 form: `stwcx. rS, 0, rB`.
- Explicit memory check on failure path (assert memory unchanged).
IDs PPCBUG-148 and PPCBUG-149 are unallocated — reserved for group 25 follow-up.
---
## Batch 5 (continued) — store multiple/string (group 27)
Per-group report: `audit-out/group-27-store-mlsr.md`.
Group 27 summary: **5 findings: 2 HIGH, 1 MEDIUM, 2 LOW.** `stswx` is a permanent no-op (identical
root cause as PPCBUG-123 for `lswx` — XER TBC field not modeled; fixed as side effect of
PPCBUG-123/124). `stmw`, `stswi`, and `stswx` all omit `invalidate_for_write`, aggravated vs.
single-word stores because a single `stmw` can dirty multiple cache lines. `stswi` uses `instr.rb()`
instead of `instr.nb()` (maintenance hazard, same shape as PPCBUG-126 for `lswi`). Zero unit tests
across all three opcodes.
### PPCBUG-160 — stmw, stswi, stswx missing `invalidate_for_write`; multi-line atomicity exposure (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: `interpreter.rs:1521` (stmw), `interpreter.rs:1357` (stswi), `interpreter.rs:4189` (stswx)
- **Extends**: PPCBUG-107. The prior stated range `1182-1278` does not cover these three arms.
Multi-word instructions (stmw up to 128 bytes = 2 lines; stswx up to 127 bytes = ~2 lines) make
the probability of missing a reservation invalidation much higher than single-word stores.
- **Symptom**: thread B's `stmw` saves 18+ non-volatile registers across two cache lines. Thread A's
`lwarx` reservation on the second line is not cleared. Thread A's `stwcx.` spuriously succeeds.
Because `stmw` is the ABI-standard non-volatile register save, this is triggered constantly in
function prologues — any lock-free primitive inside a prologue/epilogue window is at risk.
- **Fix** (same pattern as PPCBUG-107): before each `mem.write_u32`/`mem.write_u8` call, add the
`invalidate_for_write` guard. See group-27 report for per-opcode code snippets.
- **Test gap**: `lwarx` reserve a line, `stmw` across that line, `stwcx.` must return CR0.EQ=0.
### PPCBUG-161 — `stswx` is a permanent no-op: XER TBC not modeled (HIGH)
- **Severity**: HIGH
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: `interpreter.rs:4189` (`stswx` arm) + `context.rs:235-243` (`xer()`/`set_xer()`)
- **Companion**: PPCBUG-123 (lswx), PPCBUG-124 (mtspr XER). This finding covers the store side.
- **Symptom**: `ctx.xer() & 0x7F` always returns 0 (no `xer_tbc` field). `stswx` unconditionally
stores zero bytes. The byte-loop body is otherwise correct and requires no further changes.
- **Fix**: same three-line fix as PPCBUG-123 (add `xer_tbc: u8` to `PpcContext`; update `xer()`
and `set_xer()`). The `stswx` body is correct once TBC is live.
- **Test gap**: `mtspr XER` (TBC=5) + `stswx r3, 0, r4` → 5 bytes written big-endian.
### PPCBUG-162 — `stswi` uses `instr.rb()` instead of `instr.nb()` for NB field (MEDIUM)
- **Severity**: MEDIUM (maintenance hazard; not a runtime correctness bug today)
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: `interpreter.rs:1359`
- **Companion**: PPCBUG-126 (`lswi` identical pattern at line 1340).
- **Symptom**: `instr.rb()` and `instr.nb()` extract the same bits 16-20, so values are equal now.
If `rb()` is ever given a newtype wrapper (e.g. `RegIdx`) to enforce register semantics, the cast
`instr.rb() as u32` will either fail or yield wrong semantics — silently treating a register index
as a byte count.
- **Fix**: `let nb = if instr.nb() == 0 { 32 } else { instr.nb() };`
### PPCBUG-163 — Zero unit tests for stmw, stswi, stswx (LOW)
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` test module
- **Symptom**: No `#[test]` exists for any of the three opcodes. Regressions in loop bounds, byte
order, EA computation, NB=0 handling, or register wraparound are invisible.
- **Recommended minimum**: stmw 2-word and 32-word cases; stswi 4-byte / 0 to 32 / wraparound /
partial; stswx (post PPCBUG-123 fix) TBC=4, TBC=0, TBC=5. See group-27 report for full list.
ID PPCBUG-164 is unallocated — reserved for group 27 follow-up.
---
## Batch 5 (continued) — store doubleword (group 26)
Per-group report: `audit-out/group-26-store-doubleword.md`.
Group 26 summary: **0 HIGH, 2 MEDIUM, 2 LOW.** The core semantics of all six opcodes are
ISA-correct: `ds()` decoder extracts the DS-form displacement correctly; `mem.write_u64` handles
big-endian byte ordering; update-form writebacks are zero-extended and in the right order; `stdcx.`
CR0 encoding, reservation check, and table-path interaction all match the ISA. `stdbrx` correctly
applies `swap_bytes()`. No 32-bit writeback truncation issues (these are store ops, not ALU ops).
Two MEDIUM findings: (1) PPCBUG-150 extends PPCBUG-107 to the doubleword stores (same gap —
`invalidate_for_write` never called); (2) PPCBUG-151 identifies that `stwcx.` and `stdcx.` share
the same reservation slot without a width discriminator, allowing a `lwarx`+`stdcx.` or
`ldarx`+`stwcx.` cross-pair to succeed when it should fail. Four IDs used (PPCBUG-150..153).
### PPCBUG-150 — `std`/`stdu`/`stdx`/`stdux`/`stdbrx` do not call `invalidate_for_write` (scope extension of PPCBUG-107)
- **Severity**: MEDIUM (same classification as PPCBUG-107)
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**:
- `interpreter.rs:1258` (`std`)
- `interpreter.rs:1264` (`stdx`)
- `interpreter.rs:1269` (`stdu`)
- `interpreter.rs:1275` (`stdux`)
- `interpreter.rs:4163` (`stdbrx`)
- **Symptom**: When `--parallel` is active and the `ReservationTable` is enabled, any of these
five stores to an address another HW thread has reserved via `ldarx` will NOT invalidate that
thread's reservation. The `ldarx`-holding thread's `stdcx.` can subsequently succeed even though
the memory was overwritten — a classic LL/SC ABA gap. Fix session for PPCBUG-107 must include
these five sites.
- **Fix**: in each arm, after `mem.write_u64(ea, ...)`, add:
```rust
if let Some(t) = &ctx.reservation_table {
if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
### PPCBUG-151 — `stdcx.`/`stwcx.` reservation width not discriminated: cross-width pair silently succeeds
- **Severity**: MEDIUM
- **Status**: applied (ca5b90b, 2026-05-01)
- **Location**: `interpreter.rs:4119-4155` (`stdcx`) vs `interpreter.rs:1134-1180` (`stwcx`)
- **Symptom**: Both `stwcx.` and `stdcx.` match reservations using only `(has_reservation,
reserved_line)`. A `lwarx` reservation can be spuriously committed by `stdcx.`, or a `ldarx`
reservation by `stwcx.`, as long as the cache line matches. The ISA requires pairing — `lwarx`
must be committed by `stwcx.`, and `ldarx` by `stdcx.`. Cross-width commit reads the wrong width
from memory and writes back the wrong width, with no failure indication (CR0.EQ=1).
- **Fix**: add a `reservation_width: u8` field (4 or 8) to `PpcContext`. `stwcx.` requires
`reservation_width==4`; `stdcx.` requires `reservation_width==8`. In the table path, pack the
1-bit width flag into one of the spare bits of the 64-bit slot (bits 3932 are always zero for
line addresses in the 32-bit guest address space).
### PPCBUG-152 — `stdu`/`stdux` no invalid-form guard for RS==RA (LOW)
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this)
- **Status**: open
- **Location**: `interpreter.rs:1267-1278`
- **Symptom**: When `RA==RS`, the store writes the original RS value, then RA (==RS) is
overwritten with EA, destroying the source. ISA marks this invalid-form. Consistent with
policy of other update-form stores in groups 18-22.
- **Fix**: `debug_assert!(instr.ra() != 0 && instr.ra() != instr.rs())` in debug builds.
### PPCBUG-153 — Zero unit tests for std/stdu/stdx/stdux/stdbrx; stdcx. happy-path only (LOW)
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` test module (only `test_ldarx_stdcx_pair` at line 4629)
- **Missing coverage**: `std` with negative DS; `std` with RA=0; `stdu` update writeback; `stdx`
with RA=0; `stdux` indexed update; `stdbrx` byte-reversed output; `stdcx.` failure path (no
prior reservation or EA mismatch); `stdcx.` `has_reservation` cleared on failure.
- **Recommended minimum**: 6 tests — see per-group report for encodings.
IDs PPCBUG-154 through PPCBUG-159 are unallocated — reserved for group 26 follow-up.
---
## Batch 5 (continued) — store float (group 28)
Per-group report: `audit-out/group-28-store-float.md`.
Group 28 summary: **7 findings: 3 HIGH, 1 MEDIUM, 3 LOW.** EA computation, endianness, update-form
writebacks, and `stfiwx` integer-word extraction are all correct. Critical bugs: (1) `stfs*` never
raises FPSCR exception bits (VXSNAN, XX, OX, UX) required by PowerISA for double→single narrowing;
(2) `stfs*` ignores FPSCR.RN rounding mode, always using round-to-nearest-even; (3) all 9 FP store
arms omit `invalidate_for_write` (same class as PPCBUG-107). The `stfd*` family and `stfiwx` are
clean (bit-pattern stores with no conversion). Zero unit tests across all 9 opcodes.
**7 IDs used (PPCBUG-165..171). 3 IDs unallocated (PPCBUG-172..174).**
### PPCBUG-165 — stfs* does not raise FPSCR exception bits (VXSNAN, XX, OX, UX)
- **Severity**: HIGH
- **Status**: open
- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux)
- **Symptom**: PowerISA requires that `stfs` double→single narrowing raises FPSCR[VXSNAN] for SNaN
input, FPSCR[OX] on overflow to ±∞, FPSCR[UX] on underflow to ±0/denormal, and FPSCR[XX] when the
result is inexact. None of these bits are ever set. The narrowing is done via `ctx.fpr[instr.rs()] as f32`
(x86 `CVTSD2SS`); no FPSCR inspection or update follows. Games that poll FPSCR[OX] to detect
overflow (physics engines clamping large velocities), or FPSCR[VXSNAN] after sentinel SNaN writes,
get false negatives.
- **Canary parity**: Canary also omits these FPSCR updates for `stfs*`. Both share the deviation.
- **Fix**: after the narrowing, check `fpscr::is_snan(src)` → set `VXSNAN`; compare source vs.
f64 round-trip of narrowed value for inexact; compare src.is_finite() && f32.is_infinite() for
overflow. See group-28 report for illustrative code sketch.
### PPCBUG-166 — stfs* ignores FPSCR.RN; always uses round-to-nearest-even
- **Severity**: HIGH
- **Status**: open
- **Locations**: interpreter.rs:1284, 1289, 1296, 1301
- **Symptom**: `ctx.fpr[instr.rs()] as f32` uses the host MXCSR rounding mode, never consulting
`ctx.fpscr & fpscr::RN_MASK`. Any game that configures FPSCR.RN to truncate/ceil/floor and then
stores via `stfs` gets the wrong f32 in memory (wrong by at most 1 ULP). The stfs.md spec
explicitly acknowledges this gap.
- **Canary parity**: Canary also ignores FPSCR.RN for stfs. Both share the deviation.
- **Fix**: read `ctx.fpscr & fpscr::RN_MASK` and set host MXCSR before narrowing, then restore.
Minimum viable: `debug_assert_eq!(ctx.fpscr & fpscr::RN_MASK, 0)` for debug-build visibility.
### PPCBUG-167 — All 9 FP store arms missing `invalidate_for_write` (PPCBUG-107 class)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux),
1308 (stfd), 1313 (stfdu), 1320 (stfdx), 1325 (stfdux), 1333 (stfiwx)
- **Symptom**: Same class as PPCBUG-107. Under M3 `--parallel`, a FP store by thread B to a
cache line reserved by thread A via `lwarx` does not clear thread A's reservation table slot.
Thread A's subsequent `stwcx.` spuriously succeeds. Rendering workers using FP stores to shared
transform/particle buffers co-located with spinlock sites are at risk.
- **Fix**: before each `mem.write_f32`/`write_f64`/`write_u32` in every FP store arm:
```rust
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
if t.has_active_reservers() { t.invalidate_for_write(ea); }
}
```
Recommend a single sweep of all store groups (PPCBUG-107, 130, 160, 167) to avoid further drift.
### PPCBUG-168 — stfs* SNaN narrowing: `as f32` quietens SNaN without raising FPSCR.VXSNAN
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:1284, 1289, 1296, 1301
- **Symptom**: When FRS holds an f64 SNaN (bit 51 = 0), `CVTSD2SS` sets the f32 quiet bit (bit 22),
producing a QNaN in memory, without raising FPSCR[VXSNAN]. The stored memory bytes are correct per
IEEE-754 (narrowing an SNaN produces a QNaN). The bug is the missing FPSCR signal, a subset of
PPCBUG-165. **Contrast with PPCBUG-128** (lfs stores wrong FPR bits — HIGH severity; here memory
bytes are right, only the flag is missing).
- **Note**: fixed as a side effect of the PPCBUG-165 fix. No independent code change needed.
### PPCBUG-169 — stfd* bit-pattern store: confirmed correct (informational)
- **Severity**: LOW (confirmed clean, informational)
- **Status**: wontfix
- **Locations**: interpreter.rs:1305, 1311, 1317, 1323
- **Analysis**: `write_f64(ea, fpr)` → `write_u64(ea, fpr.to_bits())` → `val.to_be_bytes()`. Pure
bit-pattern, correct big-endian. SNaN preserved. EA computation and update-form writebacks all
correct. Canary parity confirmed. No bugs.
### PPCBUG-170 — stfiwx: confirmed correct (informational)
- **Severity**: LOW (confirmed clean, informational)
- **Status**: wontfix
- **Location**: interpreter.rs:1329-1335
- **Analysis**: `write_u32(ea, fpr.to_bits() as u32)` correctly extracts the low 32 bits of the
64-bit FPR as a raw bit pattern (the integer word produced by `fctiw`/`fctiwz`) and stores
big-endian. RA=0 handled correctly. No FPSCR effects required. Canary parity confirmed. No bugs.
### PPCBUG-171 — Zero unit tests for all 9 store-float opcodes
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs test module
- **Symptom**: No `#[test]` covers any of the 9 FP store arms. Regressions in EA computation,
endianness, update-form writeback order, or double→single narrowing are invisible.
- **Recommended minimum** (10 tests): `stfd` normal + SNaN bit-exact; `stfdu` update writeback;
`stfs` round-trip (1.0); `stfs` overflow (→ ±∞); `stfsx` ra=0; `stfsux` update; `stfiwx` integer
word extract; post-PPCBUG-165 fix: SNaN → FPSCR.VXSNAN set; post-PPCBUG-166 fix: RN=truncate.
IDs PPCBUG-172 through PPCBUG-174 are unallocated — reserved for group 28 follow-up.
---
## Batch 6 — FPU single-precision (group 29)
Per-group report: `audit-out/group-29-fpu-single.md`.
**Context**: The live implementation is substantially more capable than the frozen ppc-manual
snapshots indicated. `to_single()` correctly dispatches on FPSCR.RN; `check_invalid_*` helpers
correctly set VXSNAN, VXISI, VXIMZ, VXZDZ, VXIDI, ZX; `update_after_op` sets OX, UX, and
FPRF. The remaining bugs are: (1) XX/FI/FR (inexact) never set anywhere; (2) fmadd/fmsub
*sx variants missing the VXISI check for the add-phase infinity collision (their *x double
siblings have the same gap); (3) fnmadd/fnmsub NaN sign bit incorrectly flipped by Rust `-`;
(4) fresx produces a full IEEE 1/b instead of the ~12-bit hardware estimate; (5) FPSCR.NI
flush-to-zero not modelled; (6) SNaN→QNaN propagation relies on host SSE behavior rather than
the ISA-canonical derivation.
**8 IDs used (PPCBUG-180..187). 12 IDs unallocated (PPCBUG-188..199).**
### PPCBUG-180 — XX / FI / FR bits never set across all FPU *sx opcodes (and double siblings)
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: `fpscr.rs:184-194` (`update_after_op`); affects interpreter.rs:2252-2494
- **Symptom**: `FPSCR[XX]` (inexact) should be set whenever the mathematical result of an
FP operation cannot be represented exactly in the destination format (single or double) and
a rounding step occurs. `FPSCR[FI]` (fraction inexact) and `FPSCR[FR]` (fraction rounded)
encode the direction. `update_after_op` sets `OX` (overflow to ±∞) and `UX` (subnormal
result) but has no inexact-detection logic. Since most `*sx` operations on arbitrary inputs
require rounding to single precision, XX is almost always wrong (false zero). Games using
FPSCR polling to check exactness receive false "exact" results.
- **Canary parity**: Canary's `UpdateFPSCR` also does not set XX/FI/FR. Both share this gap.
- **Fix**: In `update_after_op` (or a post-`to_single` helper), compare the pre-round f64
result with the post-round f64 result. If they differ, set `XX`; inspect the difference sign
to set `FR`; set `FI = FR || (result was not exactly representable)`.
### PPCBUG-181 — fmaddsx / fnmaddsx missing VXISI check for add-phase ±∞ collision
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Locations**: interpreter.rs:2339-2348 (fmaddsx), 2383-2392 (fnmaddsx)
- **Symptom**: When `FRA × FRC = +∞` and `FRB = -∞` (or vice versa), PowerISA §4.3.4
requires `FPSCR[VXISI]` to be set and the result to be a QNaN. The double-precision sibling
`fmaddx` (line 2327) correctly calls `fpscr::check_invalid_add(ctx, a * c, b, false)` after
the multiply-check. `fmaddsx` omits this call entirely — only `check_invalid_mul` runs.
Games using fused-madd in dot-product accumulators that might overflow to ±∞ (e.g. lighting
accumulators with very large normals) lose the VXISI signal.
- **Fix**:
```rust
// inside fmaddsx arm, after check_invalid_mul:
fpscr::check_invalid_add(ctx, a * c, b, false);
```
Same for fnmaddsx (same operand pair, same `false` sense for the add).
### PPCBUG-182 — fmsubsx / fnmsubsx missing VXISI check for subtract-phase ±∞ collision
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Locations**: interpreter.rs:2361-2370 (fmsubsx), 2405-2414 (fnmsubsx)
- **Symptom**: When `FRA × FRC = ±∞` and `FRB = ±∞` with the same sign, `(±∞) (±∞)`
should fire `FPSCR[VXISI]`. Neither `fmsubsx` nor `fnmsubsx` calls `check_invalid_add`.
- **Fix**:
```rust
// inside fmsubsx arm, after check_invalid_mul:
fpscr::check_invalid_add(ctx, a * c, -b, false);
```
Same for fnmsubsx. The negated `b` turns the subtract into the add-form so that
`check_invalid_add(..., false)` uses the correct infinity-sign comparison.
### PPCBUG-183 — fnmaddsx / fnmsubsx NaN sign bit incorrectly flipped by Rust unary `-`
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Locations**: interpreter.rs:2388 (fnmaddsx), 2410 (fnmsubsx)
- **Symptom**: `to_single(ctx, -(a.mul_add(c, b)))` — Rust's unary `-f64` always flips the
IEEE sign bit, including when the value is NaN. PowerISA §4.3.2 specifies that the final
negation in `fnmadd`/`fnmsub` is NOT applied to a QNaN result: if the fused computation
yields a NaN (due to SNaN input, VXIMZ, or VXISI), the negation is skipped and the NaN is
propagated with its canonical sign unchanged. xenia-rs flips the sign bit of any NaN result,
producing a QNaN with the wrong sign. Observable by storing via `stfd` and inspecting bits.
Games using sign-bit NaN tagging (e.g. `0xFFC00000` vs `0x7FC00000` as distinct sentinels)
are affected.
- **Fix**:
```rust
// fnmaddsx arm:
let inner = a.mul_add(c, b);
let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
// fnmsubsx arm:
let inner = a.mul_add(c, -b);
let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
```
### PPCBUG-184 — fresx produces full-precision IEEE 1/b instead of ~12-bit hardware estimate
- **Severity**: HIGH
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: interpreter.rs:2481-2494
- **Symptom**: `fres` on Xenon hardware produces a reciprocal approximation via a 256-entry
LUT with linear interpolation, accurate to roughly 1/4096 relative error (~12 mantissa
bits). xenia-rs computes `to_single(1.0 / b)` — the fully IEEE-754 correctly-rounded
single-precision reciprocal. The result is up to ~4096× more accurate than hardware.
Newton-Raphson refinement code `x = fres(d); x = x*(2 - d*x)` is not broken by this (NR
converges even from an accurate seed), but code that checks the seed's error magnitude for
convergence termination, or that relies on `fres(d)*d ≠ 1.0` to decide whether to refine,
may take the wrong branch. Also, `fres(d)*d` on xenia is much closer to 1.0 than on hardware,
so a "was the estimate good enough?" check based on the residual will give wrong answers.
- **Canary parity**: Canary uses `f.Recip(f.Convert(frB, FLOAT32_TYPE))` — approximates by
first converting to f32 (quantizing the input), then applying the host reciprocal. Still
produces a fully-accurate IEEE single reciprocal rather than the 12-bit table estimate.
Both emulators share the deviation. Canary's conversion-first approach is slightly closer to
hardware (the input is quantized before the reciprocal), so if a future fix is desired,
Canary's approach is the better reference.
- **Fix (minimal viable)**: Pre-convert input to f32 to match Canary's quantization:
`let b32 = b as f32; to_single(ctx, 1.0_f64 / b32 as f64)`. This matches Canary but still
does not emulate the 12-bit LUT. Full fix requires an `fres` LUT matching Xenon's hardware
table (documented in Xbox 360 SDK / GamePPCLisa docs).
### PPCBUG-185 — FPSCR.NI flush-to-zero not modelled; subnormal results propagate through *sx
- **Severity**: MEDIUM
- **Status**: open
- **Location**: All *sx arms in interpreter.rs; fpscr.rs has `NI` not defined as a constant
- **Symptom**: Xenon firmware sets `FPSCR.NI = 1` at boot. With NI=1, the Xenon FPU flushes
subnormal inputs and results to the appropriate signed zero before and after every
floating-point operation. xenia-rs inherits the host x86 IEEE-754 default (NI=0), which
propagates subnormals. Subnormal differences: (a) subnormal FPR inputs are used as-is by
xenia vs. treated as ±0 by hardware; (b) subnormal results are stored by xenia vs. flushed
to ±0 by hardware. `update_after_op` sets `UX` when the result is subnormal, but does NOT
flush it. Games with NI-dependent behavior — most Xbox 360 titles compiled with default
Xenon ABI settings — may see different float results in subnormal-touching paths.
- **Canary parity**: Canary also inherits host IEEE NI=0 semantics. Both share this gap.
- **Fix**: After `to_single` (or the double-precision result), check `ctx.fpscr & fpscr::NI_BIT`
(needs a constant adding) and if set, flush subnormals: `if result.is_subnormal() { result =
result.signum() * 0.0 }`. Apply to inputs as well for strict correctness.
### PPCBUG-186 — SNaN → QNaN propagation relies on host SSE; not ISA-canonical for all *sx
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:2252-2414 (all arithmetic *sx arms without explicit SNaN guard)
- **Symptom**: When an SNaN input reaches `faddsx`/`fsubsx`/`fmulsx`/`fdivsx`, the code calls
`check_invalid_add/mul/div` (correctly sets VXSNAN) but then performs the operation on the
raw SNaN value: `a + b`, `a * c`, etc. On x86-64 SSE2, the hardware `ADDSD`/`MULSD` ops
produce a QNaN from the first SNaN operand (bit 51 set, other mantissa bits preserved). This
matches ISA §4.3.2.2 for the common case. However, for `mul_add` (VFMADD231SD on AVX), the
SNaN propagation priority may differ: the ISA specifies FRA takes priority over FRB, but
hardware FMA may use a different priority for the three-operand form. The `fsqrtsx` and
`fresx` arms handle SNaN explicitly (via `is_snan` check) but do not synthesize the correct
QNaN result — they rely on `b.sqrt()` / `1.0/b` to produce a NaN, which the host does.
This is a latent risk; active wrong-result cases require bit-level NaN inspection.
### PPCBUG-187 — Zero interpreter execution tests for all 10 group-29 opcodes
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs test module (no `#[test]` covers any *sx or fresx)
- **Symptom**: Regressions in rounding, FPSCR side effects, or operand-field decoding are
invisible to CI. The existing fpscr unit tests cover helper functions in isolation; no test
exercises the full `step()` path for any single-precision FPU opcode.
- **Recommended minimum** (12 tests — see group-29 report for encodings):
`fadds` exact; `fadds` VXISI; `fsubs` VXISI; `fmuls` 0×∞; `fdivs` ZX;
`fmadds` VXISI regression (PPCBUG-181); `fmsubs` VXISI regression (PPCBUG-182);
`fnmadds` NaN-sign (PPCBUG-183); `fnmsubs` NaN-sign (PPCBUG-183);
`fsqrts` negative input VXSQRT; `fsqrts` round-trip; `fres` basic reciprocal.
IDs PPCBUG-188 through PPCBUG-199 are unallocated — reserved for group 29 follow-up.
---
## Batch 6 (continued) — FPU arithmetic double (group 30)
Per-group report: `audit-out/group-30-fpu-double.md`.
Group 30 summary: **9 findings (PPCBUG-200..208). 2 MEDIUM cross-cutting, 3 MEDIUM opcode-specific, 4 LOW.** Result arithmetic is correct for all 10 opcodes. FPSCR infrastructure is partially wired: VXSNAN, OX, UX, ZX, VXISI (add/sub), VXIMZ, VXZDZ, VXIDI, VXSQRT all set correctly for the opcodes that need them. Critical gaps: (1) XX/FR/FI bits never set by any opcode — same gap as PPCBUG-180 but now confirmed on the double-precision path; (2) FPSCR.RN not honored for double arithmetic — single-precision has `round_to_single` but double has no equivalent; (3) fmsubx/fnmaddx/fnmsubx omit the VXISI check for ∞-collision in the add step; (4) fnmaddx/fnmsubx flip NaN sign bit via Rust `-` operator but ISA requires NaN sign preserved. frsqrtex uses full-precision 1/sqrt(b) instead of the hardware estimate — acceptable. All FMA forms use `f64::mul_add` for correct single-rounding semantics.
**9 IDs used (PPCBUG-200..208). 11 IDs unallocated (PPCBUG-209..219).**
### PPCBUG-200 — All group-30 opcodes: XX, FR, FI bits never set
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `fpscr.rs:184-194` (`update_after_op`); `interpreter.rs:2248,2268,2289,2310,2335,2357,2379,2401,2463,2510`
- **Symptom**: Same gap as PPCBUG-180 but confirmed for the double-precision path. `update_after_op` only tracks OX (overflow to infinity) and UX (subnormal). FPSCR[XX] (inexact sticky), FPSCR[FR] (round direction), and FPSCR[FI] (inexact for current op) are never updated by any group-30 opcode. Every double-precision arithmetic operation that rounds a non-representable result silently omits these bits.
- **Fix**: Same as PPCBUG-180 — read MXCSR exception flags after each f64 operation and map to FI/XX/FR. For double, no `to_single` step is involved so the comparison must be done via MXCSR or by a post-op bit-level comparison of inputs vs. result.
- **Test gap**: Zero tests verify XX set after any inexact double-precision operation.
### PPCBUG-201 — All group-30 opcodes: FPSCR.RN not honored for double arithmetic
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:2242-2512` (all 10 arms)
- **Symptom**: Host f64 operators always use nearest-even (host MXCSR default). `fpscr.rs` has a complete `rounding_mode(ctx)` helper and directed rounding helpers for single-precision (`round_to_single`), but no equivalent for double arithmetic. Guest `mtfsfi` RN changes have no effect on faddx/fsubx/fdivx/fsqrtx etc.
- **Fix**: Wrap each double-precision arithmetic arm with an MXCSR round-mode set/restore when `ctx.fpscr & fpscr::RN_MASK != 0`. Fast path (RN=0) stays zero-cost.
- **Test gap**: No test changes RN and verifies directed rounding on any double arithmetic opcode.
### PPCBUG-202 — fmaddx: non-FMA `a * c` used in check_invalid_add can spuriously raise/miss VXISI
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `interpreter.rs:2332`
- **Symptom**: `check_invalid_add(ctx, a * c, b, false)` uses a separate two-rounding multiply to approximate the FMA intermediate product. When the true FMA intermediate is finite but the standalone product overflows to ±∞, VXISI fires spuriously. When the true intermediate is ±∞ but the standalone product is finite (extreme cancellation), VXISI is missed.
- **Fix**: Derive VXISI from input-value properties directly: if `(a.is_infinite() || c.is_infinite())` (product is mathematically infinite) and `b.is_infinite()` with opposing sign → VXISI.
- **Test gap**: No test covers the large-value cancellation case in fmaddx.
### PPCBUG-203 — fmsubx, fnmaddx, fnmsubx: VXISI never raised for ∞-collision in add/sub step
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Locations**: `interpreter.rs:2354` (fmsubx), `2376` (fnmaddx), `2398` (fnmsubx)
- **Symptom**: Same pattern as PPCBUG-181/182 for the double-precision variants. These three arms call only `check_invalid_mul` and omit `check_invalid_add`. Per ISA, all four FMA variants must raise VXISI when the add step yields ∞+∓∞. Example for fmsub: `A×C = +∞`, `B = +∞` → `+∞ +∞` → VXISI. Currently the result NaN propagates silently with no FPSCR update. The fnmsub pattern is the canonical Newton-Raphson step — the most common FPU path in Xbox 360 graphics code.
- **Fix**: Add `fpscr::check_invalid_add(ctx, a * c, b, true)` for `fmsubx`/`fnmsubx` and `fpscr::check_invalid_add(ctx, a * c, b, false)` for `fnmaddx` (apply PPCBUG-202 sign-fix simultaneously).
- **Test gap**: Zero tests for VXISI on any of the three opcodes.
### PPCBUG-204 — fmaddx check_invalid_add sub-issue (sign logic reliant on imprecise product)
- **Severity**: LOW (sub-issue of PPCBUG-202)
- **Status**: open
- **Location**: `interpreter.rs:2332`
- **Symptom**: VXISI logic is internally consistent with the passed `a * c` value, but that value can have the wrong sign in extreme overflow/underflow cases. Resolve as part of PPCBUG-202.
### PPCBUG-205 — fnmaddx / fnmsubx: Rust `` flips NaN sign bit; ISA requires NaN sign preserved
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Locations**: `interpreter.rs:2377` (fnmaddx), `interpreter.rs:2399` (fnmsubx)
- **Symptom**: Same pattern as PPCBUG-183 for the double-precision variants. Rust's unary `-` applied to a NaN result flips the IEEE-754 sign bit. PowerISA Book I §4.3.4 states the negation is not applied to NaN results. Title code using NaN sentinels (audio middleware, debug fills) receives sign-flipped NaN payloads.
- **Fix**:
```rust
let fma = a.mul_add(c, b); // fnmaddx
let result = if fma.is_nan() { fma } else { -fma };
// and analogously for fnmsubx
```
- **Test gap**: No test exercises fnmaddx/fnmsubx with NaN-producing inputs to check sign of result NaN.
### PPCBUG-206 — frsqrtex edge cases correct; no code change needed (informational)
- **Severity**: LOW (confirmed clean, informational)
- **Status**: wontfix
- **Location**: `interpreter.rs:2496-2512`
- **Analysis**: ZX fires for ±0. VXSQRT guard correctly excludes -0.0. frsqrte(+∞)=+0 correct. Full-precision is acceptable over-precision.
- **Fix**: Add comment: `// Full-precision: hardware gives ~12-14 bit estimate. NR converges identically.`
- **Test gap**: Zero frsqrtex unit tests — add 4 (±0 inputs, negative input+VXSQRT, SNaN, +∞).
### PPCBUG-207 — FMA opcode OX logic correct, OX edge cases untested (informational)
- **Severity**: LOW (confirmed clean, informational)
- **Status**: wontfix
- **Location**: `interpreter.rs:2335,2357,2379,2401`
- **Analysis**: `inputs_were_finite` correctly suppresses OX when an input is already infinite. OX fires when all inputs are finite but the FMA result overflows — ISA-correct.
- **Test gap**: Zero tests for OX scenario in any FMA opcode.
### PPCBUG-208 — Zero tests for fsubx, fdivx, fmsubx, fnmaddx, fnmsubx, fsqrtx, frsqrtex
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` test module
- **Symptom**: 7 of 10 group-30 opcodes have zero tests. `faddx` has 1 happy-path test; `fmulx` has 1; `fmaddx` has 1. None have FPSCR/Rc=1/edge-case coverage.
- **Recommended minimum** (12 tests): `fsubx` normal; `fsubx` VXISI; `fdivx` normal; `fdivx` ZX; `fdivx` VXZDZ; `fmsubx` normal; `fnmaddx` normal; `fnmsubx` normal; `fnmaddx` NaN-sign regression (PPCBUG-205); `fsqrtx` normal; `fsqrtx` negative+VXSQRT; `frsqrtex` positive.
IDs PPCBUG-209 through PPCBUG-219 are unallocated — reserved for group 30 follow-up.
---
## Pending batches
- Batch 2: groups 6-11 — logical immediate, logical register, sign-extend/CLZ, word rotate, doubleword rotate, shift.
- Batch 3: groups 12-17 — compare, branch, trap+sc, CR logical, SPR/MSR, cache+sync.
- Batch 4: groups 18-23 — loads (byte, halfword, word, doubleword, multiple/string, float).
- Batch 5 (partial): groups 24, 26, 27, 28 done; group 25 (store word) pending.
- Batch 6 (partial): groups 29, 30 done; group 31 (FPU convert/compare) pending.
- Batch 7: groups 32-34 — VMX integer (add/sub, compare/min/max, logical/shift).
- Batch 8: groups 35-38 — VMX permute/pack, VMX float, VMX multiply-sum, VMX load/store.
- Phase C: decoder field extractors, decoder opcode-lookup, disassembler formatter parity.
- Phase D: this file gets re-sorted by severity and finalized.
---
## Batch 6 (continued) — FPU sign/move/compare/convert/round (group 31)
Per-group report: `audit-out/group-31-fpu-misc.md`.
Group 31 summary: **9 findings (PPCBUG-221..231; IDs 220/222/226 retracted after analysis).
1 HIGH, 3 MEDIUM, 5 LOW.** The sign-bit manipulation family (`fabsx`, `fnegx`, `fnabsx`, `fmrx`)
and `fselx` are all ISA-correct — Rust arithmetic maps to bit-level operations that preserve SNaN
payloads. `fcmpu` is correct (FPRF and VXSNAN set; no spurious VXVC). The conversion group is
mostly correct for result values and overflow sentinels; the main gaps are FPSCR inexact/FR/FI
tracking (shared with groups 29/30) and one subtle NearestEven tie-breaking defect in
`round_to_i64` that affects `fctidx`. `fcmpo` silently omits VXSNAN/VXVC despite having a
comment acknowledging the gap.
**9 IDs used (PPCBUG-221, 223, 224, 225, 227, 228, 229, 230, 231). IDs 220/222/226 retracted.
IDs PPCBUG-232..239 unallocated.**
### PPCBUG-221 — `fctidx` / `round_to_i64` NearestEven tie-breaking uses f64::EPSILON; broken for |v| > 2^52
- **Severity**: HIGH
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `fpscr.rs:220238` (`round_to_i64`, `NearestEven` case)
- **Symptom**: The tie-breaking code computes `diff = (v - v.trunc()).abs()` and tests
`(diff - 0.5).abs() < f64::EPSILON` to detect a half-integer. Above `|v| = 2^52`,
`v.trunc() == v` for all representable f64 values (all are exact integers), so `diff == 0.0`
and the tie-breaking branch is never taken — the code falls through to `v.round() as i64`,
which is round-half-away-from-zero instead of round-half-to-even. Every fctid call on a
large odd half-integer (e.g. `(2^52 + 1).5`) produces the wrong integer. In practice these
exact 0.5 cases are rare for large values but can appear in audio sample-count arithmetic
and physics fixed-point pipelines.
- **Fix**: replace the NearestEven arm with a fractional-part-only tie check that is exact for
|v| <= 2^52 and degenerates correctly to truncation above 2^52:
```rust
RoundingMode::NearestEven => {
let t = v.trunc();
let frac = v - t; // exact for |v| <= 2^52; ==0 above (already integer)
let fa = frac.abs();
if fa > 0.5 { t as i64 + if v >= 0.0 { 1 } else { -1 } }
else if fa < 0.5 { t as i64 }
else {
// Exact 0.5 tie — round to even.
let fi = t as i64;
if fi & 1 == 0 { fi } else { fi + if v >= 0.0 { 1 } else { -1 } }
}
}
```
- **Test gap**: add `round_to_i64` tests in `fpscr.rs:tests`: 0.5→0, 1.5→2, 2.5→2, 3.5→4,
-0.5→0, -1.5→-2. Existing tests cover 2.5→2 and 3.5→4 (currently accidentally correct).
### PPCBUG-223 — `fcmpo` omits FPSCR[VXSNAN] and FPSCR[VXVC] on NaN operands
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `interpreter.rs:26452675`
- **Symptom**: `fcmpo` body is identical to `fcmpu` — it sets FPRF and the CR field correctly
but calls no `fpscr::set_exception`. PowerISA requires: QNaN → `FPSCR[VXVC, VX, FX]`;
SNaN → additionally `FPSCR[VXSNAN]`. `fcmpu` correctly sets VXSNAN for SNaN; `fcmpo` does
not. A comment in the source acknowledges "not modeled yet."
- **Impact**: `fcmpo.` (Rc=1) checking CR1.FX after a NaN compare will see FX=0 instead of
FX=1. `mffsx` after `fcmpo` will not reflect VXVC. Xbox 360 CRT comparison primitives
(`islessgreater`, ordered relational operators) use `fcmpo`.
- **Fix**:
```rust
if fra.is_nan() || frb.is_nan() {
ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: false, so: true };
if fpscr::is_snan(fra) || fpscr::is_snan(frb) {
fpscr::set_exception(ctx, fpscr::VXSNAN | fpscr::VXVC);
} else {
fpscr::set_exception(ctx, fpscr::VXVC);
}
}
```
### PPCBUG-224 — `fcfidx` does not set FPSCR[XX/FX] for inexact i64→f64 conversion
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `interpreter.rs:25282536`
- **Symptom**: Only FPRF is updated. Per ISA, `fcfid` sets `FPSCR[XX, FX]` (and FR/FI) when
the i64 value has more than 53 significant bits and precision is lost. Any i64 with
`|v| > 2^53` triggers inexact. Common trigger: large frame/sample counters, address values.
- **Fix**: after the conversion, compare `(result as i64) != (bits as i64)` and call
`fpscr::set_exception(ctx, fpscr::XX)` if inexact.
### PPCBUG-225 — `frspx` does not set FPSCR[XX/FX/FR/FI] on inexact rounding
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `interpreter.rs:25162527`
- **Symptom**: `update_after_op` sets OX/UX only. The ISA requires FR/FI/XX/FX on any f64→f32
rounding that is not exact. `frsp` is the canonical double→single-precision narrowing idiom
in compiler output — virtually every call is inexact.
- **Fix**: after `to_single`, compare result vs b; if different and both finite, call
`fpscr::set_exception(ctx, fpscr::XX | fpscr::FI | ...)` with FR set if magnitude increased.
### PPCBUG-227 — `fctiwx` rounding: `round_to_i32` inherits NearestEven defect via `round_to_i64`
- **Severity**: LOW
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `fpscr.rs:241243`
- **Symptom**: `round_to_i32` calls `round_to_i64` then clamps. The PPCBUG-221 defect in
`round_to_i64` does not manifest for i32-range values (the epsilon check accidentally works
at this scale), but the structural fragility is inherited. Fixing PPCBUG-221 cures this.
- **Recommendation**: add unit tests `round_to_i32(0.5)==0`, `round_to_i32(1.5)==2`,
`round_to_i32(2.5)==2` to verify correct round-to-even behavior.
### PPCBUG-228 — Zero interpreter execution tests for fabsx/fnegx/fnabsx/fmrx/fselx/fcmpo/fcfidx/fctidx/fctidzx/frspx
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` `#[cfg(test)]` module
- **Symptom**: 10 of the 13 group-31 opcodes have zero dedicated tests. `test_fcmpu` covers
only the ordered comparison `5.0 > 3.0`. `test_fctiwzx` covers one positive truncation.
`test_fadd`/`test_fmul` are group-30 tests, not group-31.
- **Recommended minimum**: SNaN-preservation test for fabsx/fnegx/fnabsx; fselx with NaN/0/1;
fcmpo QNaN→VXVC (after PPCBUG-223 fix); fcfidx exact and inexact; fctidx tie cases; frspx
inexact → XX set (after PPCBUG-225 fix); fctiwx nearest-even tie; fctiwzx NaN sentinel.
### PPCBUG-229 — `fctidx` / `fctidzx` do not set FPSCR[XX/FX] for inexact inputs
- **Severity**: LOW
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Locations**: `interpreter.rs:25372574`
- **Symptom**: Per ISA, float-to-integer conversions set `FPSCR[XX, FX]` when the source
value is not an integer (the fractional part is discarded). Neither opcode sets XX.
Shared root cause with PPCBUG-224/225.
### PPCBUG-230 — `fctiwx` / `fctiwzx` do not set FPSCR[XX/FX] for inexact inputs
- **Severity**: LOW
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Locations**: `interpreter.rs:25752612`
- **Symptom**: Same omission as PPCBUG-229 for the word-width conversion pair.
### PPCBUG-231 — `frspx` SNaN input result written as QNaN (host platform dependency)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:25192524`
- **Symptom**: Rust's `as f32` (CVTSD2SS) can set the quiet bit on SNaN input, producing a
QNaN in the FPR. Per ISA, `frsp` on SNaN should quieten it — so the QNaN result is
correct in kind. The risk is that the exact QNaN bit-pattern may differ from PPC's
canonical quietening (which ORs bit 22 into the f32 mantissa). Game code inspecting the
NaN payload after frsp may see a different payload. Same structural root cause as
PPCBUG-128 (`lfs` SNaN quietening), but lower severity because frsp IS arithmetic.
IDs PPCBUG-232 through PPCBUG-239 are unallocated — no further bugs found in group 31.
---
## Batch 7 — VMX integer add/sub (group 32)
Per-group report: `audit-out/group-32-vmx-int-addsub.md`.
**Scope**: `vaddubm`, `vaddubs`, `vadduhm`, `vadduhs`, `vadduwm`, `vadduws`, `vaddsbs`, `vaddshs`,
`vaddsws`, `vaddcuw`, `vsububm`, `vsububs`, `vsubuhm`, `vsubuhs`, `vsubuwm`, `vsubuws`, `vsubsbs`,
`vsubshs`, `vsubsws`, `vsubcuw`.
**Overall verdict**: All 20 opcodes are arithmetically correct. No HIGH-severity bugs found.
Lane indexing (big-endian, PPC element 0 = `Vec128::bytes[0]`), saturation arithmetic, VSCR.SAT
sticky-set, and vaddcuw/vsubcuw carry/borrow semantics are all implemented correctly.
4 LOW-severity findings (2 test gaps, 1 code organization, 1 API hazard).
### PPCBUG-240 — 18 of 20 group-32 opcodes have zero interpreter-level tests
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` `#[cfg(test)]` module
- **Symptom**: Only `test_vaddubs_saturates_and_sets_vscr_sat` covers any group-32 opcode.
`vaddubm`, `vsububm`, `vadduhm`, `vsubuhm`, `vadduwm`, `vsubuwm`, `vaddsbs`, `vsubsbs`,
`vadduhs`, `vsubuhs`, `vaddshs`, `vsubshs`, `vadduws`, `vsubuws`, `vaddsws`, `vsubsws`,
`vaddcuw`, `vsubcuw` — all 18 have no tests. No high risk today but no regression guard.
- **Recommended minimum**: wrap-around test (byte, halfword, word); sat-at-max and sat-at-min tests;
VSCR.SAT sticky-set across two successive saturating instructions; vaddcuw carry lane; vsubcuw
no-borrow lane.
### PPCBUG-241 — `vadduwm` / `vsubuwm` stranded in a separate section from the rest of group-32
- **Severity**: LOW (maintenance hazard)
- **Status**: open
- **Location**: `interpreter.rs:20902104` (stranded) vs. `interpreter.rs:2784` (§4a group-32 section)
- **Symptom**: The two word-modulo opcodes are matched 700 lines above the rest of the group, with
only a comment at line 2819 as a cross-reference. A future sweep of §4a for group-32 changes
would miss them.
- **Fix**: Move both arms into §4a and remove the comment at line 2819.
### PPCBUG-242 — `set_vscr_sat(false)` can non-stickily clear SAT from arithmetic handlers
- **Severity**: LOW (API hazard)
- **Status**: open
- **Location**: `context.rs:252259`
- **Symptom**: `set_vscr_sat(bool)` accepts `false`, which would clear the sticky SAT bit. All
current arithmetic callers pass `true` only (inside `if sat { ... }` guards), so no mis-clear
occurs today. But the API is misleading — a future saturating handler that writes
`set_vscr_sat(lane_sat)` with `lane_sat = false` would silently clear a previously-set bit.
- **Fix**: Rename to `sticky_set_vscr_sat()` (no bool argument, always ORs). Retain
`force_vscr_sat(bool)` for `mtvscr`.
### PPCBUG-243 — `vmx.rs` saturation helpers: u16/i16/u32/i32 variants have zero unit tests
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `crates/xenia-cpu/src/vmx.rs:705799`
- **Symptom**: `vmx.rs` tests cover 5 cases of `sat_add/sub_i8/u8`. The 8 helpers for wider
types (`sat_add_u16`, `sat_sub_u16`, `sat_add_i16`, `sat_sub_i16`, `sat_add_u32`, `sat_sub_u32`,
`sat_add_i32`, `sat_sub_i32`) are mathematically correct but unguarded by any test. Recommended
additions listed in the per-group report.
IDs PPCBUG-244 through PPCBUG-274 are unallocated — no further bugs found in group 32.
---
## Batch 7 — VMX integer compare / min / max / avg (group 33)
Per-group report: `audit-out/group-33-vmx-int-compare.md`.
### PPCBUG-275 — All VC-form vector compare dot forms: `rc_bit()` reads wrong bit; CR6 never updated
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Affected opcodes**: `vcmpequb.`, `vcmpequh.`, `vcmpgtsb.`, `vcmpgtsh.`, `vcmpgtub.`, `vcmpgtuh.`
- **Location**: `decoder.rs:75` + `interpreter.rs:3318`, `3331`, `3344`, `3357`, `3370`, `3383`
- **Symptom**: `rc_bit()` is implemented as `self.raw & 1 != 0` (reads LSB = bit 0 of the word).
For VC-form instructions the Rc flag is at **PPC bit 21 = LSB bit 10**, not bit 0. Bit 0 is
the LSB of the 10-bit XO field. All integer compare XO values are even (XO=6, 70, 518, 774, 582, 838),
so their bit 0 is always 0. The CR6 update block is **unconditionally dead** regardless of
whether the programmer wrote the dot form. `vcmpequb. vMask, vData, vNeedle` + `bc 12,26`
(branch on CR6.LT = all-true) is the canonical AltiVec memchr idiom; it will always fall through.
- **Fix**:
```rust
// decoder.rs — add:
/// Rc bit for VC-form vector compare instructions (PPC bit 21 = LSB bit 10).
#[inline] pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }
```
Replace `instr.rc_bit()` with `instr.vc_rc_bit()` at interpreter.rs:3318, 3331, 3344, 3357,
3370, 3383.
### PPCBUG-276 — `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`: same VC-form Rc bug
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Affected opcodes**: `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`
- **Location**: `interpreter.rs:2237`, `3396`, `3406`
- **Symptom**: Same root cause as PPCBUG-275. XO for vcmpequw=134, vcmpgtuw=646, vcmpgtsw=902 —
all even, bit 0 always 0. Word-compare dot forms never update CR6. `vcmpequw128` uses the
VMX128_R Rc encoding which also likely reads the wrong bit.
- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:2237, 3396, 3406. Separately verify
VMX128_R Rc bit position for `vcmpequw128` (may require its own extractor).
### PPCBUG-277 — Zero tests for all `vcmp*` dot forms and CR6 correctness
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` `#[cfg(test)]` module
- **Symptom**: No test exercises any of the 10 integer vector compare opcodes. Critical missing:
`vcmpequb.` all-true → CR6.LT=1; `vcmpequb.` all-false → CR6.EQ=1; `vcmpgtsb` signed
boundary (0x80 vs 0x7F must yield false, not true); `vcmpgtsh` at 0x8000 vs 0x7FFF.
### PPCBUG-278 — Zero tests for all 12 `vmax*` / `vmin*` opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` module
- **Symptom**: None of vmaxub/uh/uw/sb/sh/sw, vminub/uh/uw/sb/sh/sw are tested. Critical missing:
`vmaxsb(0x80, 0x7F)` = 0x7F (signed max of -128 and +127); `vminsb(0x80, 0x7F)` = 0x80.
Without these, signed vs unsigned confusion in min/max would not be caught.
### PPCBUG-279 — Zero tests for all 6 `vavg*` opcodes; no signed-boundary or rounding coverage
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` module; `vmx.rs` test module
- **Symptom**: `avg_u8` through `avg_i32` helpers have no unit tests. Key rounding case:
`avg_u8(0, 1)` must be 1 (round up), not 0 (truncation). `avg_i32(i32::MIN, i32::MIN)` must
be `i32::MIN` without overflow.
IDs PPCBUG-280 through PPCBUG-314 are unallocated — no further bugs found in group 33.
---
## Batch 6 — VMX integer logical / shift / rotate / select (group 34)
Per-group report: `audit-out/group-34-vmx-logic-shift.md`.
Group 34 summary: the bitwise logical ops (vand/vandc/vor/vxor/vnor and their 128 variants)
are all ISA-correct — Vec128 is `[u8; 16]` with no padding bits, so `!(u32)` flips exactly
32 bits per lane with no upper-bit pollution (the PPCBUG-029/030/031 class does not apply to
VMX register files). The per-lane shifts (vslb/vsrb/vsrab, vslh/vsrh/vsrah, vslw/vsrw/vsraw
and their 128 variants) all correctly mask the shift count to the lane width before shifting;
vsraw uses i32 arithmetic right shift which is correctly defined in Rust for shift-by-31.
The per-lane rotates (vrlb/vrlh/vrlw and 128 variants) are correct. The whole-register bit
shifts (vsl/vsr) and whole-register byte shifts (vslo/vsro and 128 variants) correctly
extract the shift count from VB.b[15] with the proper bit masks. vsel and vsel128 are correct
including the read-before-write ordering on vsel128's vc=vd aliasing.
**One HIGH bug found**: vrlimi128 extracts both the rotate-amount (z) field and the
blend-mask (IMM) field from the wrong bit positions of the instruction word.
**0 MEDIUM bugs with code change needed. 1 HIGH. 10 LOW (test gaps and informational).**
### PPCBUG-315 — vrlimi128 z and IMM fields extracted from wrong bit positions
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: interpreter.rs:35513552
- **Symptom**: `shift = ((instr.raw >> 16) & 0x3)` reads integer bits 1617 — the low 2 bits
of the 5-bit IMM (blend-mask) field — instead of the 2-bit `z` (rotate) field at integer
bits 67. `mask = (instr.raw >> 2) & 0xF` reads integer bits 25 — VD128h extension bits
and a reserved field — instead of the low 4 bits of IMM at integer bits 1619.
**Every `vrlimi128` executes with a wrong rotate amount and a wrong per-word select mask.**
The only benign case is the degenerate encoding where `z == IMM[1:0]` and the garbage mask
happens to equal the intended mask — unlikely in real code.
- **VX128_4 field layout** (LSB-0 integer bit numbering after PPC big-endian byte-swap to host):
- `VD128l : 5` at integer bits 2125 (PPC bits 610)
- `IMM : 5` at integer bits 1620 (PPC bits 1115) — blend mask, 4 bits used
- `VB128l : 5` at integer bits 1115 (PPC bits 1620)
- `z : 2` at integer bits 67 (PPC bits 2425) — rotate amount 0..3
- `VD128h : 2` at integer bits 23 (PPC bits 2829)
- **Fix**:
```rust
let shift = ((instr.raw >> 6) & 0x3) as usize; // z field: integer bits 6-7
let mask = (instr.raw >> 16) & 0xF; // IMM low 4 bits: integer bits 16-19
```
- **Canary reference**: `ppc_decode_data.h:585608` `FormatVX128_4`; `ppc_emit_altivec.cc:1318,1324`.
- **Note**: the rotate logic (`b[(shift + i) % 4]`) and mask-select logic (`(mask >> (3-i)) & 1`)
in the interpreter body are ISA-correct — only the field extraction is wrong.
- **Test gap**: no interpreter execution test for vrlimi128 (PPCBUG-325).
### PPCBUG-316 — Zero interpreter execution tests for vslb/vsrb/vsrab (LOW)
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs:34403463
### PPCBUG-317 — Zero interpreter execution tests for vslh/vsrh/vsrah (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:34723503
### PPCBUG-318 — vslo/vsro byte-shift count max is 15 (correct; informational)
- **Severity**: LOW (informational / wontfix)
- **Status**: wontfix
- `N` is a 4-bit field; max shift is 15 bytes = 120 bits (not 128). VD retains
the 8 LSBs of VA in position [127:120] at N=15. ISA-correct.
### PPCBUG-319 — vsel128 vc=vd read-before-write ordering (correct; informational)
- **Severity**: LOW (informational / wontfix)
- **Status**: wontfix
- `c = ctx.vr[vc]` is read before `ctx.vr[vd] = result`. Correctly sequenced.
### PPCBUG-320 — Zero interpreter execution tests for vslw/vsrw/vsraw/vrlw (+128 variants)
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs:21082155
### PPCBUG-321 — Zero interpreter execution tests for vsl/vsr
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs:35083521
### PPCBUG-322 — Zero interpreter execution tests for vslo/vsro (+128 variants)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:35233541
### PPCBUG-323 — Zero interpreter execution tests for vand/vandc/vor/vxor/vnor (+128 variants)
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: interpreter.rs:19001944
### PPCBUG-324 — Zero interpreter execution tests for vsel/vsel128
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs:19451967
### PPCBUG-325 — Zero interpreter execution tests for vrlb/vrlh/vrlw/vrlimi128 (+128 variants)
- **Severity**: LOW (test gap; fix PPCBUG-315 before writing vrlimi128 tests)
- **Status**: open
- **Location**: interpreter.rs:34643503, 21442155, 35503565
IDs PPCBUG-326 through PPCBUG-354 are unallocated — no further bugs found in group 34.
---
## Batch 8 — VMX permute / merge / splat / pack / unpack (group 35)
Per-group report: `audit-out/group-35-vmx-permute.md`.
**Summary**: 5 HIGH, 3 MEDIUM, 9 LOW. Four VX128_* field-extraction bugs; one missing post-pack permutation logic; VSCR.SAT and pack saturation semantics are all correct. Zero interpreter tests for any group-35 opcode.
### PPCBUG-360 — vperm128: VC register read from wrong field (vd128() instead of VX128_2 VC bits 23-25)
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:1979`
- **Symptom**: `vperm128` uses the VX128_2 instruction form. The permute-control register VC is a 3-bit field at PPC bits 23-25 (LSB integer bits 6-8). The code does `vc = instr.vd128()` which reads PPC bits 6-10 + 21-22 — a completely different set of bits. Every `vperm128` therefore permutes with a control vector read from the wrong register, producing garbage output. `vperm128` is one of the most-used VMX128 ops in Xbox 360 graphics code (texture/vertex data layout).
- **Fix**:
```rust
// decoder.rs — add accessor:
#[inline] pub fn vc128_2(&self) -> usize { ((self.raw >> 6) & 0x7) as usize }
// interpreter.rs:1979 — replace:
vc = instr.vc128_2(); // VX128_2 VC field at PPC bits 23-25
```
- **ISA ref**: `ppc-manual/vmx/vperm.md`, VX128_2 encoding; `ppc_decode_data.h:541-561`; `ppc_emit_altivec.cc:1203-1204` (`VX128_2_VC`).
### PPCBUG-361 — vsldoi128: SH field MSB reads bit 4 (reserved) instead of bit 9
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:2012`
- **Symptom**: VX128_5 SH is a 4-bit field at LSB integer bits 6-9. Code does `((raw >> 6) & 0x7) | (((raw >> 4) & 0x1) << 3)`. This reads bit 4 (a reserved field, always 0 in valid encodings) as the MSB of SH instead of bit 9. Shifts of 8-15 bytes silently resolve as shifts of 0-7 bytes. `vsldoi128` with `SH >= 8` (common in vector rotation patterns) always produces the wrong result.
- **Fix**:
```rust
let sh = ((instr.raw >> 6) & 0xF) as usize; // SH field: integer bits 6-9
```
- **ISA ref**: `ppc-manual/vmx/vsldoi.md`, VX128_5 encoding; `ppc_decode_data.h:609-634`; canary `VX128_5_SH`.
### PPCBUG-362 — vpermwi128: PERMh (high 3 bits of 8-bit PERM immediate) read from VD128l bits instead of bits 6-8
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:4089`
- **Symptom**: VX128_P PERM = `PERMl[4:0] | (PERMh[2:0] << 5)` where PERMl is at integer bits 16-20 and PERMh is at integer bits 6-8. Code does `(raw >> 16) & 0xFF` which reads bits 16-23. Bits 21-23 are VD128l[4:2], not PERMh. The top 3 bits of the 8-bit PERM immediate are wrong; output word lane selections for lanes 0 and 1 are controlled by garbage bits. Same pattern as PPCBUG-315.
- **Fix**:
```rust
let imm = ((instr.raw >> 16) & 0x1F) | (((instr.raw >> 6) & 0x7) << 5); // VX128_P PERM
```
- **ISA ref**: `ppc_decode_data.h:664-686`; `ppc_emit_altivec.cc:1214`.
### PPCBUG-363 — vpkd3d128: post-pack permutation (pack + z fields) entirely absent; output always placed in wrong lane when pack != 0
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:3783-3808`
- **Symptom**: Canary's `vpkd3d128` does three things: (1) pack VB by type, (2) permute the result with the existing VD register using a control determined by `pack` (IMM[1:0]) and `shift` (z field at integer bits 6-7), (3) store. Xenia-rs does only (1) and (3), skipping the entire lane-placement permutation. When `pack != 0`, the packed value must be merged into a specific 32-bit or 64-bit slot of VD — this merge never happens. `pack=0` is the only safe case. Most D3D vertex pack sequences use `pack=1` (32-bit slot) with varying `shift`.
- **Fix**: Extract `pack = uimm & 3` and `shift = (instr.raw >> 6) & 3` (z field), read existing `ctx.vr[vd]`, apply the permutation table from `ppc_emit_altivec.cc:2125-2188`, write back.
- **ISA ref**: `ppc_emit_altivec.cc:2088-2191`.
### PPCBUG-364 — vsldoi (non-128): correct; PPCBUG-365 — vsplt*: correct; informational
- **Severity**: LOW (wontfix)
- **Status**: wontfix
- `vsldoi` correctly extracts SH as `(raw >> 6) & 0xF`. `vspltb/vsplth/vspltw` correctly read UIMM from the VA position (integer bits 16-20, masked to lane width). No bugs.
### PPCBUG-366 — vspltisb / vspltish: sign-extension idiom is correct but non-obvious; future regression risk
- **Severity**: MEDIUM
- **Status**: open (clarity fix recommended)
- **Location**: `interpreter.rs:2059-2060`, `2064-2066`
- **Symptom**: `simm | !0x1F` where `simm` is typed `i8`/`i16` is functionally correct (Rust narrows `!0x1F` to the target type), but the pattern is fragile under refactoring. Recommend:
```rust
let simm = (((instr.raw >> 16) & 0x1F) as i32).wrapping_shl(27).wrapping_shr(27) as i8;
```
### PPCBUG-367 — vupkhpx / vupklpx: channel replication vs zero-extend divergence; canary is unimplemented
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `vmx.rs:318-330`
- **Symptom**: `unpack_pixel_555` replicates 5-bit RGB channels (`r << 3 | r >> 2`) to fill 8 bits. ISA specifies zero-extension into bits 7:3, leaving bits 2:0 as zero. The replicate approach produces slightly different values (and slightly higher values), diverging from hardware.
- **Fix**: `let r8 = r << 3;` (drop the `| r >> 2` replication term).
### PPCBUG-368 — vpkpx: pack_pixel_555 channel assignment unverified against hardware; canary comparison inconclusive
- **Severity**: MEDIUM
- **Status**: open (needs hardware trace or more detailed canary analysis)
- **Location**: `vmx.rs:310-316`
- **Symptom**: The xenia-rs layout comment says R=bits 8-15, G=16-23, B=24-31. Canary's `vkpkx_in_low` uses different shift amounts (`>> 9` for R, `>> 6` for G, `>> 3` for B), suggesting either a different input layout assumption or the channels are swapped. Without a hardware reference, cannot determine which is authoritative.
### PPCBUG-369 — vpkd3d128 z-field not extracted (sub-issue of PPCBUG-363)
- **Severity**: LOW (tracked under PPCBUG-363)
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:3785`
- The `z` field (VX128_4, integer bits 6-7) is never extracted. Correct extraction: `(instr.raw >> 6) & 0x3`.
### PPCBUG-370 — Zero interpreter tests for vperm / vperm128 (test gap)
- **Severity**: LOW
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs:1970-1995`
### PPCBUG-371 — Zero interpreter tests for vsldoi / vsldoi128 (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:1997-2020`
### PPCBUG-372 — Zero interpreter tests for vpermwi128 (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:4087-4099`
### PPCBUG-373 — Zero interpreter tests for vmrghb / vmrglb / vmrghh / vmrglh (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:3570-3600`
### PPCBUG-374 — Zero interpreter tests for vspltb / vsplth / vspltw / vspltw128 (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:2022-2048`
### PPCBUG-375 — Zero interpreter tests for vspltisb / vspltish / vspltisw / vspltisw128 (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:2050-2068`
### PPCBUG-376 — Zero interpreter tests for all vpk* (16 ops) + VSCR.SAT coverage (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:3607-3718`
### PPCBUG-377 — Zero interpreter tests for vupkhsb / vupklsb / vupkhsh / vupklsh (test gap)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:3722-3754`
### PPCBUG-378 — Zero interpreter tests for vpkd3d128 / vupkd3d128 (test gap; blocked on PPCBUG-363)
- **Severity**: LOW
- **Status**: open
- **Location**: `interpreter.rs:3783-3835`
IDs PPCBUG-379 through PPCBUG-419 are unallocated — no further bugs found in group 35.
---
## Batch 9 — VMX float arithmetic / compare / convert / estimate (group 36)
Per-group report: `audit-out/group-36-vmx-float.md`.
Group 36 summary: **21 findings (PPCBUG-420..440). 6 HIGH, 8 MEDIUM, 7 LOW.** The most
critical bugs are: (1) four VMX float compare VC-form opcodes use `rc_bit()` (bit 0) instead
of the correct VC-form Rc bit (bit 10) — CR6 is never updated, same root cause as PPCBUG-275;
(2) vmaddfp128 and vmaddcfp128 have their multiplicand and accumulator operands swapped —
every matrix multiply / Newton-Raphson step using these opcodes produces the wrong result;
(3) VMX128_R dot-form compares (vcmpeqfp128. etc.) decode as Invalid due to missing key4
entries in decode_op6.
**6 HIGH, 8 MEDIUM, 7 LOW. 21 IDs used (PPCBUG-420..440). 39 IDs unallocated (PPCBUG-441..479).**
### PPCBUG-420 — vcmpeqfp / vcmpgefp / vcmpgtfp: `rc_bit()` reads wrong bit; CR6 never updated
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Affected opcodes**: `vcmpeqfp.`, `vcmpgefp.`, `vcmpgtfp.`
- **Location**: `interpreter.rs:1875`, `1885`, `1895`
- **Symptom**: `rc_bit()` = `self.raw & 1` reads LSB bit 0. For VC-form the Rc flag is at
PPC bit 21 = LSB bit 10. All XO values (vcmpeqfp=198, vcmpgefp=454, vcmpgtfp=710) have
bit 0 = 0, so CR6 is never updated for any float compare dot form. `vcmpeqfp.` + `bc 12,24`
(branch all-equal) always falls through.
- **Cross-reference**: PPCBUG-275 (identical root cause for integer vcmp). Canary reads
`i.VXR.Rc` (ppc_emit_altivec.cc:625, 633, 641).
- **Fix**: Add `pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }` to
`decoder.rs` and replace `instr.rc_bit()` at interpreter.rs:1875, 1885, 1895.
### PPCBUG-421 — vcmpbfp: `rc_bit()` reads wrong bit (VC-form); Rc gate permanently dead
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:3428`
- **Symptom**: Same root cause as PPCBUG-420. XO=966, bit 0 = 0; CR6 update never fires
for `vcmpbfp.`. The CR6 value logic (`eq = !any_out`) is correct; only the gate is wrong.
- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:3428.
### PPCBUG-422 — vcmpeqfp128 / vcmpgefp128 / vcmpgtfp128 / vcmpbfp128: `rc_bit()` reads wrong bit (VX128_R-form)
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `interpreter.rs:1875`, `1885`, `1895`, `3428` (shared arms with non-128 forms)
- **Symptom**: For VX128_R-form, Rc is at PPC bit 27 = LSB bit 4 (confirmed from canary's
`VX128_R` bitfield: `uint32_t Rc : 1` at bit 4 from LSB). `rc_bit()` reads bit 0. Fix
PPCBUG-423 first (dot forms decode as Invalid before this even matters).
- **Fix**: Add `pub fn vx128r_rc_bit(&self) -> bool { (self.raw >> 4) & 1 != 0 }` and use
it in the VX128_R compare arms.
### PPCBUG-423 — vcmpeqfp128. / vcmpgefp128. / vcmpgtfp128. / vcmpbfp128.: dot forms decode as `Invalid`
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs:640-648` (decode_op6 VMX128 compare key4 table)
- **Symptom**: decode_op6 extracts `key4 = (bits22-24 << 3) | bit27`. When Rc=1, PPC bit 27
is set, making key4 = non-dot value + 1. Dot-form key4 values (1, 9, 17, 25, 33) are all
absent from the match table. Decoder returns `PpcOpcode::Invalid`. Any game shader using a
VMX128-form float compare dot form traps with unimplemented opcode.
- **Fix**: Add dot-form entries to the key4 match table mapping to the same opcodes (the
interpreter arm uses `instr.vx128r_rc_bit()` to conditionally update CR6):
```rust
0b000001 => return PpcOpcode::vcmpeqfp128,
0b001001 => return PpcOpcode::vcmpgefp128,
0b010001 => return PpcOpcode::vcmpgtfp128,
0b011001 => return PpcOpcode::vcmpbfp128,
0b100001 => return PpcOpcode::vcmpequw128,
```
### PPCBUG-424 — vmaddfp128: operand swap — computes VA×VB+VD instead of VA×VD+VB
- **Severity**: HIGH
- **Status**: applied (52ece4b, 2026-05-02)
- **Location**: `interpreter.rs:1771` (`r[i] = ai.mul_add(bi, di)`)
- **Symptom**: Canary (ppc_emit_altivec.cc:806-809) documents `(VD) <- (VA × VD) + VB` and
routes as `MulAdd(VA, VD, VB)`. Xenia-rs reads VA, VB, VD then computes
`ai.mul_add(bi, di)` = `VA × VB + VD` — VB and VD roles swapped. Every shader using
vmaddfp128 for matrix multiply or Newton-Raphson accumulation accumulates the wrong value.
The existing denorm-flush test aliases vA=vD=v2, making the swap invisible.
- **Fix**: `r[i] = ai.mul_add(di, bi);`
### PPCBUG-425 — vmaddcfp128: operand swap — computes VD×VB+VA instead of VA×VD+VB
- **Severity**: HIGH
- **Status**: applied (52ece4b, 2026-05-02)
- **Location**: `interpreter.rs:4065` (`r[i] = di.mul_add(bi, ai)`)
- **Symptom**: Canary (ppc_emit_altivec.cc:819) documents `(VD) <- (VA × VD) + VB`.
Xenia-rs computes `VD × VB + VA`. Both the first multiplicand and the addend are wrong.
- **Fix**: `r[i] = ai.mul_add(di, bi);`
- **Test gap**: zero tests for `vmaddcfp128`. Add test with distinct VA, VB, VD registers.
### PPCBUG-426 — vnmsubfp: two rounding steps instead of fused FMA; NaN sign may be flipped
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `interpreter.rs:1786` (`r[i] = bi - ai * ci`)
- **Symptom**: `vmaddfp` uses single-rounded `ai.mul_add(ci, bi)`, but `vnmsubfp` uses
`bi - ai * ci` (two operations, two rounding steps). ISA specifies a single fused operation.
Canary acknowledges the same limitation (ppc_emit_altivec.cc:1136). Additionally, the
implicit negation in subtraction may flip the sign bit of a NaN result (see PPCBUG-183).
- **Fix**: `r[i] = -ai.mul_add(ci, -bi);` — single FMA rounding: `-(ai*ci + (-bi))` = `bi - ai*ci`.
### PPCBUG-427 — vnmsubfp128: same two-rounding form as vnmsubfp
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `interpreter.rs:1803` (`r[i] = di - ai * bi`)
- **Symptom**: Same class as PPCBUG-426 for the VMX128 form.
- **Fix**: `r[i] = -ai.mul_add(bi, -di);`
### PPCBUG-428 — vrefp / vrefp128: full-precision 1/x instead of ~12-bit hardware estimate
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:1853` (`r[i] = 1.0 / b[i]`)
- **Symptom**: Same class as PPCBUG-184 (fresx). Xenon vrefp provides ~12-bit accuracy;
xenia-rs computes full IEEE-754 division. Canary also uses full precision in practice.
### PPCBUG-429 — vrsqrtefp / vrsqrtefp128: full-precision 1/sqrt(x) instead of ~12-bit estimate
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:1862` (`r[i] = 1.0 / b[i].sqrt()`)
- **Symptom**: Same class as PPCBUG-428 for reciprocal square root.
### PPCBUG-430 — vexptefp / vexptefp128: full-precision exp2(x) instead of ~12-bit estimate
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:3934` (`r[i] = b[i].exp2()`)
- **Symptom**: Same class as PPCBUG-428. NaN/Inf edge cases may diverge.
### PPCBUG-431 — vlogefp / vlogefp128: full-precision log2(x) instead of ~12-bit estimate
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:3944` (`r[i] = b[i].log2()`)
- **Symptom**: Same class as PPCBUG-428.
### PPCBUG-432 — vrfin / vrfin128: Rust `round()` is round-half-away-from-zero; ISA requires round-to-nearest-even
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `interpreter.rs:2172` (`r[i] = b[i].round()`)
- **Symptom**: `vrfin(0.5)` → ISA = 0.0; Rust = 1.0. `vrfin(2.5)` → ISA = 2.0; Rust = 3.0.
Canary uses SSE2 `ROUNDPS` which is round-to-nearest-even.
- **Fix**: Use `f32::round_ties_even()` (stable since Rust 1.77).
### PPCBUG-433 — vctsxs / vcfpsxws128: NaN input returns 0 instead of saturating to INT_MIN (0x80000000)
- **Severity**: MEDIUM
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `vmx.rs:217` (`if x.is_nan() { return (0, true); }`)
- **Symptom**: AltiVec ISA: NaN in vctsxs saturates to INT_MIN (0x80000000). Xenia-rs returns 0.
- **Fix**: `if x.is_nan() { return (i32::MIN, true); }`
### PPCBUG-434 — vctuxs NaN → 0 is correct; informational
- **Severity**: LOW (wontfix)
- **Status**: wontfix
- **Location**: `vmx.rs:225`
- **Note**: Unsigned NaN saturates to 0 per ISA. Xenia-rs is correct. Add a comment.
### PPCBUG-435 — vaddfp / vsubfp / vmulfp128: subnormal inputs not flushed when VSCR.NJ=1
- **Severity**: MEDIUM (latent — Xbox 360 always boots with NJ=1)
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `interpreter.rs:1713`, `1729`, `1812`
- **Symptom**: VSCR.NJ=1 requires flush-to-zero for subnormal inputs. vmaddfp family correctly
calls `vmx::flush_denorm()`; plain add/sub/mul do not check VSCR.
### PPCBUG-436 — vmsum3fp128 / vmsum4fp128: per-product intermediates not individually flushed
- **Severity**: MEDIUM (latent)
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `interpreter.rs:4076`, `4083`
- **Symptom**: `flush_denorm` on final sum only. Per-lane products can be subnormal and
accumulate before the final flush.
### PPCBUG-437 — vmaddfp / vmaddfp128 / vmaddcfp128 / vnmsubfp128: subnormal output not flushed
- **Severity**: MEDIUM (latent)
- **Status**: applied (P5 d39d0ba, 2026-05-02)
- **Location**: `interpreter.rs:17521754`, `17711773`, `40644067`, `18031805`
- **Symptom**: VSCR.NJ=1 requires flushing subnormal results. Inputs flushed; outputs are not.
### PPCBUG-438 — Zero tests for vcmpeqfp / vcmpgefp / vcmpgtfp / vcmpbfp and dot forms
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` test module
### PPCBUG-439 — Zero tests for vrfiz / vrfin / vrfip / vrfim and 128-bit variants
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs:21582192`
### PPCBUG-440 — Zero tests for vctsxs / vctuxs / vcfsx / vcfux and 128-bit variants
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs:38423923`
IDs PPCBUG-441 through PPCBUG-479 are unallocated — no further bugs found in group 36.
---
## Batch 8 — VMX integer multiply-sum / multiply-half / sums / special (group 37)
Per-group report: `audit-out/group-37-vmx-mulsum.md`.
**Note**: All opcodes in this group are `XEINSTRNOTIMPLEMENTED()` stubs in xenia-canary; correctness is derived from the IBM ISA and `ppc-manual/vmx/` snapshots. `vrlimi128` is already tracked as PPCBUG-315.
### PPCBUG-482 — `vmhaddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)
- **Severity**: WITHDRAWN
- **Status**: no bug
- **Note**: Draft analysis suggested >>16; the spec snapshot `ppc-manual/vmx/vmhaddshs.md`
explicitly shows `prod = (VA[i]*VB[i]) >> 15` and the pathological-case example confirms
`0x8000*0x8000 >> 15 = 32768`. Xenia-rs matches the spec exactly. No code change.
### PPCBUG-483 — `vmhraddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)
- **Severity**: WITHDRAWN
- **Status**: no bug
- **Note**: `ppc-manual/vmx/vmhraddshs.md` explicitly shows `(product + 0x4000) >> 15`.
Xenia-rs matches. No code change needed.
### PPCBUG-487 — vsumsws/vsum2sws/vsum4sbs/vsum4ubs/vsum4shs: VB operand mis-named as "c"/"VC"
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `interpreter.rs:3249-3307`
- **Symptom**: All five vsum* handlers use a VX-form instruction (two operands: VA and VB).
The code names the VB source `c` and the comment references "vC" — implying a non-existent
third register operand. Only `instr.ra()` and `instr.rb()` are valid for VX form; there is
no `rc()`. The arithmetic is correct (rb() is called), but the naming misleads maintainers
into thinking there is a VA-form three-operand encoding.
- **Fix**: Rename `c` → `b` and update comments to say `VB` instead of `vC` in all five
handler bodies.
### PPCBUG-490 — Zero tests for all six vmsum* opcodes
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: `interpreter.rs` `#[cfg(test)]` section
- **Symptom**: No unit test for `vmsumubm`, `vmsummbm`, `vmsumuhm`, `vmsumuhs`, `vmsumshm`,
`vmsumshs`. Critical missing: saturation + VSCR.SAT for `vmsumuhs`/`vmsumshs`; mixed-sign
byte product for `vmsummbm`; modulo wrap for `vmsumshm`.
### PPCBUG-491 — Zero tests for `vmhaddshs` and `vmhraddshs`
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` section
- **Symptom**: No test for either multiply-high-add instruction. Key cases: `VA = 0x8000`,
`VB = 0x8000` (minus-one-times-minus-one saturating case); `VA = VB = 0x7FFF, VC = 0x7FFF`
(add post-shift result to max accumulator). Verify VSCR.SAT is set on saturation and clear
on non-saturating inputs.
### PPCBUG-492 — Zero tests for `vmladduhm`
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` section
### PPCBUG-493 — Zero tests for all eight `vmule*` / `vmulo*` opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` section
- **Symptom**: No test for `vmuleub`, `vmuloub`, `vmulesb`, `vmulosb`, `vmuleuh`, `vmulouh`,
`vmulesh`, `vmulosh`. Key: even vs odd lane distinction (`vmulesh` vs `vmulosh`) is untested.
### PPCBUG-494 — Zero tests for all five vsum* opcodes
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `interpreter.rs` `#[cfg(test)]` section
- **Symptom**: No test for `vsumsws`, `vsum2sws`, `vsum4sbs`, `vsum4ubs`, `vsum4shs`.
Missing: zero-output-lanes verification for `vsumsws` (w[0..2] must be 0) and `vsum2sws`
(w[0], w[2] must be 0); VSCR.SAT on overflow for all signed/unsigned variants.
### PPCBUG-495 — `vsumsws` comment says "vC[3]" should say "VB[3]"
- **Severity**: LOW (cosmetic)
- **Status**: open
- **Location**: `interpreter.rs:3248`
IDs PPCBUG-480, PPCBUG-481, PPCBUG-482 (withdrawn), PPCBUG-483 (withdrawn), PPCBUG-484,
PPCBUG-485, PPCBUG-486, PPCBUG-488, PPCBUG-489, PPCBUG-496, PPCBUG-497, PPCBUG-498 are
either withdrawn (no bug found after re-examination), informational, or references to
existing IDs. IDs PPCBUG-499 through PPCBUG-509 are unallocated — no further bugs found
in group 37.
---
## Batch 8 — VMX load/store (group 38)
Per-group report: `audit-out/group-38-vmx-loadstore.md`.
**Opcodes**: lvebx, lvehx, lvewx, lvewx128, lvlx, lvlx128, lvlxl, lvlxl128, lvrx, lvrx128,
lvrxl, lvrxl128, lvsl, lvsl128, lvsr, lvsr128, lvx, lvx128, lvxl, lvxl128, stvebx, stvehx,
stvewx, stvewx128, stvlx, stvlx128, stvlxl, stvlxl128, stvrx, stvrx128, stvrxl, stvrxl128,
stvx, stvx128, stvxl, stvxl128.
Group 38 summary: The load family (lvx, lvxl, lvlx, lvrx, lvsl, lvsr, lvebx, lvehx, lvewx,
lvewx128 and all 128/LRU-hint variants) is arithmetically correct. EA computation, alignment
masking, big-endian byte ordering, RA=0 special cases, and lane indexing all match the ISA and
the `ea_indexed` helper. **5 HIGH bugs found** — the systemic `invalidate_for_write` gap
(PPCBUG-107 family) applies to ALL 16 VMX store opcodes, and `stvewx128` has an additional
severe memory-corruption bug (writes 16 bytes instead of 1 word). **1 MEDIUM** (behavioral
divergence between lvebx/lvehx/lvewx and canary's full-line simplification — xenia-rs is
architecturally more correct). **1 MEDIUM** (lvsr sh=0 edge-case correctness, documentation
gap). **3 LOW** test-coverage gaps.
### PPCBUG-510 — `stvewx128` stores all 16 bytes instead of one word; 12-byte memory corruption (HIGH)
- **Severity**: HIGH
- **Status**: applied (cedee3c, 2026-05-02)
- **Location**: interpreter.rs:2776-2781
- **Symptom**: Uses `& !0xF` (16-byte alignment) then stores all 16 bytes of the vector.
ISA semantics: word-align EA, extract the word lane `(EA & 0xF) >> 2`, store 4 bytes only.
The non-128 `stvewx` (interpreter.rs:1675-1687) is correct — `stvewx128` was not updated
to match. Corrupts 12 adjacent bytes on every execution.
- **Canary reference**: `InstrEmit_stvewx_` (cc:170-185) — `ea & ~3`, extract lane, `ByteSwap`,
store 4 bytes only. `stvewx128` routes through the same helper as `stvewx`.
- **Fix**: mirror the `stvewx` body with `instr.vs128()` substituted for `instr.rs()`.
### PPCBUG-511 — `stvx`, `stvx128`, `stvxl`, `stvxl128` missing `invalidate_for_write` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: interpreter.rs:1598-1603 (stvx), 1605-1610 (stvx128), 1699-1705 (stvxl/stvxl128)
- **Root cause**: PPCBUG-107 (systemic)
- **Symptom**: Under `--parallel`, a 16-byte stvx to a reserved line does not clear the
reservation table slot. The reserving thread's `stwcx.` spuriously succeeds.
- **Fix**: per PPCBUG-107 pattern — add `invalidate_for_write(ea)` guard before the byte loop.
### PPCBUG-512 — `stvebx`, `stvehx`, `stvewx`, `stvewx128` missing `invalidate_for_write` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: interpreter.rs:1655 (stvebx), 1664 (stvehx), 1675 (stvewx), 2776 (stvewx128)
- **Root cause**: PPCBUG-107 (systemic)
- **Note**: `stvewx128` must also fix PPCBUG-510 before adding the invalidation call (or the
invalidation covers the wrong, over-wide address range).
### PPCBUG-513 — `stvlx`, `stvlx128`, `stvlxl`, `stvlxl128` missing `invalidate_for_write` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: interpreter.rs:2746-2749 (stvlx/stvlxl), 2751-2754 (stvlx128/stvlxl128)
- **Root cause**: PPCBUG-107 (systemic)
- **Note**: partial stores can span a 128-byte line boundary when `ea & 0xF != 0` and
`n = 16 - shift` crosses the line; two `invalidate_for_write` calls may be needed.
### PPCBUG-514 — `stvrx`, `stvrx128`, `stvrxl`, `stvrxl128` missing `invalidate_for_write` (HIGH)
- **Severity**: HIGH
- **Status**: applied (ca5b90b, 2026-05-01)
- **Locations**: interpreter.rs:2756-2759 (stvrx/stvrxl), 2761-2764 (stvrx128/stvrxl128)
- **Root cause**: PPCBUG-107 (systemic)
- **Note**: stvrx at shift=0 is a no-op (no bytes written); guard can skip the call in
that case. Otherwise invalidate `ea & !0xF` (the preceding aligned block).
### PPCBUG-515 — `lvebx`, `lvehx`, `lvewx` implement element semantics; canary uses full-line load (MEDIUM)
- **Severity**: MEDIUM
- **Status**: open
- **Locations**: interpreter.rs:1613-1653
- **Symptom**: xenia-rs places the loaded byte/halfword/word into the correct lane and preserves
other lanes from VD (ISA-correct for the "undefined" lanes). Canary does a full aligned
16-byte `lvx`-style load that overwrites all lanes. Both are valid under the ISA's "undefined"
specification, but game code compiled against canary may observe the canary behavior. The
divergence is documented and no code change is required unless canary compatibility becomes
an explicit goal.
### PPCBUG-516 — `lvsr` sh=0 produces {16,17,...,31}; correct per ISA but undocumented (MEDIUM)
- **Severity**: MEDIUM (documentation gap — computation is correct)
- **Status**: open
- **Location**: interpreter.rs:2218-2226
- **Symptom**: When EA is 16-byte aligned, `lvsr` produces byte values all >= 16 (the "select
entirely from VB" identity for `vperm`). The formula `(16 - sh) + i` cannot overflow u8
because `sh <= 15` guarantees `(16 - sh) + 15 <= 31`. No computation bug — but there is no
comment explaining why values > 15 are correct. Add a comment and a `debug_assert!(sh <= 15)`.
### PPCBUG-517 — Zero test coverage for lvlx/lvrx/stvlx/stvrx boundary edge cases (LOW)
- **Severity**: LOW (test gap)
- **Status**: applied (P8 4029041, 2026-05-02)
- **Location**: vmx.rs tests (lines 756-792); interpreter.rs test module
- **Missing**: shift=15 for lvlx (1 byte loaded), shift=1 for lvrx (15 bytes), stvlx/stvrx
round-trip, stvrx at shift=0 confirmed no-op, full lvlx+lvrx+vor unaligned memcpy idiom
verified byte-exact.
### PPCBUG-518 — Zero interpreter-level execution tests for all 36 VMX load/store opcodes (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: interpreter.rs test module
- **Missing**: lvx alignment masking, stvx byte-order verification, lvebx lane placement,
lvsl/lvsr permute index values, lvewx128 after PPCBUG-510 fix. 17 recommended minimum tests
enumerated in per-group report.
### PPCBUG-519 — `stvrx` aligned no-op is silent; no debug trace (LOW)
- **Severity**: LOW
- **Status**: open
- **Location**: vmx.rs:284-292 (`store_vector_right`)
- **Symptom**: shift=0 returns immediately with no trace event. Confusing during memory-
visibility debugging. Add `tracing::trace!` in debug builds.
IDs PPCBUG-520 through PPCBUG-559 are unallocated — no further bugs found in group 38.
---
## Phase C1 — Decoder field extractors
Per-group report: `audit-out/phase-c1-decoder-fields.md`.
Comprehensive audit of all `DecodedInstr` field accessors in `decoder.rs` lines 21-165, cross-checked against ISA form specs, Canary `FormatXxx` structs, and the interpreter's inline re-extraction. Phase B already found PPCBUG-040/046/275/315/360-363/420-422. Phase C1 adds 8 new findings (PPCBUG-560..567).
**Confirmed-clean** (no new finding): `op`, `rd`/`rs`/`rt`, `ra`, `rb`, `rc`, `simm16`, `uimm16`, `d`, `ds`, `li`, `bd`, `bo`, `bi`, `aa`, `lk`, `oe`, `to`, `mb`/`me` (M-form only), `sh`, `spr`, `crm`, `crfd`/`crfs`, `l`, `crbd`/`crba`/`crbb`, `nb`, `va128`/`vb128`/`vd128`/`vs128`, `extract_vx128_uimm5`.
### PPCBUG-560 — sh64() test helper wrong bit order; masks PPCBUG-040 from unit tests (HIGH)
- **Severity**: HIGH
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `xenia-rs/crates/xenia-cpu/tests/disasm_goldens.rs:160-176` (function `rldicl`)
- **Symptom**: The `rldicl` test helper encodes `sh[5:1]` at PPC bits 16-20 and `sh[0]` at PPC bit 30. The ISA encodes `sh[4:0]` at PPC bits 16-20 and `sh[5]` at PPC bit 30. The wrong `sh64()` formula `(sh_lo << 1) | sh_hi` correctly inverts the wrong encoding, making the test pass — but fails on real binary code.
**Counterexamples** (ISA-encoded input → `sh64()` output):
| True shift | sh64() result | Error |
|-----------|--------------|-------|
| 1 | 2 | +1 |
| 16 | 32 | +16 |
| 32 | 1 | -31 |
| 33 | 3 | -30 |
| 63 | 63 | 0 (coincidence) |
Only `sh=0` and `sh=63` decode correctly. All other shifts (1-62) are wrong against real code.
- **Fix for `sh64()`** (per PPCBUG-040):
```rust
pub fn sh64(&self) -> u32 {
(extract_bits(self.raw, 30, 30) << 5) | extract_bits(self.raw, 16, 20)
}
```
- **Fix for test helper** (must be in same commit):
```rust
// Correct: sh_lo = sh & 0x1F → PPC bits 16-20; sh_hi = sh >> 5 → PPC bit 30
(30 << 26) | (rs << 21) | (ra << 16) | ((sh & 0x1F) << 11)
| (mb_lo << 6) | (mb_hi << 5) | (0 << 2) | ((sh >> 5) << 1) | rc
```
- **Cross-reference**: PPCBUG-040 (primary finding). PPCBUG-560 is the test-infrastructure companion.
### PPCBUG-561 — Missing `mb_md()` accessor on `DecodedInstr`; interpreter inlines wrong formula at 6 sites (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs` — accessor absent; `disasm.rs:1256` has correct local helper; `interpreter.rs` lines 696, 706, 716, 726, 736, 746 each inline the wrong formula
- **Symptom**: Interpreter uses `(instr.mb() << 1) | ((instr.raw >> 1) & 1)` which: (a) reads `SH5` (PPC bit 30, host bit 1) instead of `MB5` (PPC bit 26, host bit 5) as the high bit; (b) places the high bit at position 0 instead of position 5. `disasm.rs` has the correct version already — expose it as `DecodedInstr::mb_md()`.
- **Cross-reference**: PPCBUG-046 (primary finding).
- **Fix**:
```rust
// Add to decoder.rs:
#[inline] pub fn mb_md(&self) -> u32 {
extract_bits(self.raw, 21, 25) | (extract_bits(self.raw, 26, 26) << 5)
}
```
Replace all 6 inline sites in `interpreter.rs` with `instr.mb_md()`.
### PPCBUG-562 — Missing `vc_rc_bit()` and `vx128r_rc_bit()` per-form Rc accessors (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs` — no per-form Rc accessors; `interpreter.rs` uses generic `rc_bit()` (bit 31) for both VC and VX128_R forms
- **Symptom**: Generic `rc_bit()` reads PPC bit 31 (LSB). VC-form Rc is at PPC bit 21 = `(raw >> 10) & 1`. VX128_R-form Rc is at PPC bit 27 = `(raw >> 4) & 1`. Using bit 31 for these forms means the CR6 update gate is permanently disabled for all dot-form VMX vector compares — root cause of PPCBUG-275/420/421/422.
- **Fix**:
```rust
/// Rc for VC-form vector compare (vcmpeqfp, vcmpgefp, vcmpgtfp, vcmpbfp, etc.) — PPC bit 21.
#[inline] pub fn vc_rc_bit(&self) -> bool { extract_bits(self.raw, 21, 21) != 0 }
/// Rc for VX128_R-form compare (vcmpeqfp128, vcmpgefp128, etc.) — PPC bit 27.
#[inline] pub fn vx128r_rc_bit(&self) -> bool { extract_bits(self.raw, 27, 27) != 0 }
```
- **Cross-reference**: PPCBUG-275 / PPCBUG-420 / PPCBUG-421 / PPCBUG-422.
### PPCBUG-563 — Missing `vx128_4_z()` and `vx128_4_imm()` for VX128_4 form (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs` — accessors absent; `interpreter.rs:3551-3552` (vrlimi128) reads wrong bit positions
- **Symptom**: VX128_4 form has `IMM` (5-bit) at PPC bits 11-15 (host bits 16-20) and `z` (2-bit) at PPC bits 24-25 (host bits 6-7). Interpreter `vrlimi128` uses `(raw >> 16) & 0x3` for shift (reads VB128l partial) and `(raw >> 2) & 0xF` for mask (reads VD128h region).
- **Fix**:
```rust
#[inline] pub fn vx128_4_imm(&self) -> u32 { extract_bits(self.raw, 11, 15) }
#[inline] pub fn vx128_4_z(&self) -> u32 { extract_bits(self.raw, 24, 25) }
```
- **Cross-reference**: PPCBUG-315.
### PPCBUG-564 — Missing `vx128_p_perm()` for VX128_P form; PERMh reads XO bits (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:4089` (vpermwi128) uses `(raw >> 16) & 0xFF` which reads PERMl (correct) but uses XO/reserved bits 21-23 for PERMh instead of PPC bits 23-25
- **Symptom**: Top 3 bits of the 8-bit PERM selector are wrong for every `vpermwi128` instruction. Lane selections for words 0 and 1 are garbage.
- **Fix**:
```rust
#[inline] pub fn vx128_p_perm(&self) -> u32 {
extract_bits(self.raw, 11, 15) | (extract_bits(self.raw, 23, 25) << 5)
}
```
- **Cross-reference**: PPCBUG-362.
### PPCBUG-565 — Missing `vx128_5_sh()` for VX128_5 form; vsldoi128 MSB reads reserved bit (MEDIUM)
- **Severity**: MEDIUM
- **Status**: applied (52b05b1, 2026-05-01)
- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:2012` (vsldoi128) uses `(raw >> 4) & 0x1` for the shift MSB (reads PPC bit 27 = reserved) instead of PPC bit 22 = host bit 9 = `(raw >> 9) & 1`
- **Symptom**: vsldoi128 shift amounts ≥ 8 (where the 4th bit matters) use a garbage bit. The correct 4-bit SH is at PPC bits 22-25 (host bits 6-9) = `(raw >> 6) & 0xF`.
- **Fix**:
```rust
#[inline] pub fn vx128_5_sh(&self) -> u32 { extract_bits(self.raw, 22, 25) }
```
- **Cross-reference**: PPCBUG-361.
### PPCBUG-566 — Missing XER TBC field accessor documentation for lswx/stswx (LOW)
- **Severity**: LOW
- **Status**: applied (P6 112202c, 2026-05-02)
- **Location**: `decoder.rs` — XER[25:31] (7-bit transfer byte count) is runtime state, not an instruction field; no accessor exists and no documentation notes the gap
- **Symptom**: `lswx`/`stswx` use XER[25:31] as their byte count. The interpreter has no way to read this via the normal accessor pattern. Not a bit-position error, but a structural gap.
- **Recommendation**: add `ctx.xer_tbc() -> u8` to `PpcContext` returning `(ctx.xer() >> 25) & 0x7F`. Document that these are the only instructions that read XER as a count operand.
### PPCBUG-567 — Zero unit tests pin any scalar field accessor (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `decoder.rs` unit tests; `tests/disasm_goldens.rs`
- **Symptom**: Phase 4 tests pin `va128`/`vb128`/`vd128`/`vs128` only. No test verifies: `sh64()` against ISA-encoded instructions (existing test validates wrong round-trip — PPCBUG-560), `mb_md()` (absent), `vc_rc_bit()`/`vx128r_rc_bit()` (absent), `ds()` for negative displacement, `spr()` for LR/CTR/XER beyond DEC.
- **Recommended additions**: decoder-level unit tests using ISA-correct encodings for `sh64`, `mb_md`, the two new Rc accessors, `ds` negative, `spr` for LR=8 and CTR=9. See phase-c1-decoder-fields.md for concrete encoding examples.
IDs PPCBUG-568 through PPCBUG-599 are unallocated — no further bugs found in Phase C1 scope.
---
## Phase C2 — Decoder opcode-lookup tables
Per-group report: `audit-out/phase-c2-decoder-lookup.md`.
**Methodology**: complete line-by-line comparison of all `decode_opNN` functions in
`xenia-rs/crates/xenia-cpu/src/decoder.rs` against
`xenia-canary/src/xenia/cpu/ppc/ppc_opcode_lookup_gen.cc`, plus cross-reference of
`ppc-manual/forms/` for VC, VX128_R, VX128_5, VA, VX128_3, VX128_4 forms.
**Overall verdict**: the decoder is structurally sound and entry-by-entry matches
Canary for all real Xbox 360 instructions, with one pre-known exception (PPCBUG-600 =
PPCBUG-423). Zero new wrong-entry bugs. One new medium-severity cross-reference bug
(dot-form gap), one medium maintainability risk (key-ordering dependency), three LOWs
(test gaps, reserved-encoding misidentification, undocumented fast-path).
### PPCBUG-600 — `decode_op6` key4: VMX128 compare dot-forms decode as Invalid (MEDIUM)
- **Severity**: MEDIUM (cross-reference for PPCBUG-423; same root cause, Phase C2 ID)
- **Status**: applied (52b05b1, 2026-05-01) (dup-of:423 for the fix; this ID is for Phase C2 tracking)
- **Location**: `decoder.rs:640-648` (`decode_op6`, key4 match table)
- **Symptom**: The VX128_R form places `Rc` at PPC bit 27. The key4 formula is
`(bits 22-24 << 3) | bit27`. When Rc=1 (dot-form), bit27=1 and key4 is odd.
Only even key4 values are in the table. Five dot-form encodings fall through to
`PpcOpcode::Invalid`:
- `vcmpeqfp128.` → key4=0b000001 (1), decodes as Invalid
- `vcmpgefp128.` → key4=0b001001 (9), decodes as Invalid
- `vcmpgtfp128.` → key4=0b010001 (17), decodes as Invalid
- `vcmpbfp128.` → key4=0b011001 (25), decodes as Invalid
- `vcmpequw128.` → key4=0b100001 (33), decodes as Invalid
- **Contrast**: standard VMX VC-form compares (op=4 key3) are correct because their
Rc bit (bit21) is outside the key3 window (bits 22-31). VMX128_R uses a different
form where Rc is at bit27, which is inside the key4 window.
- **Fix**: Add 5 dot-form entries to key4 in `decode_op6`:
```rust
0b000001 => return PpcOpcode::vcmpeqfp128,
0b001001 => return PpcOpcode::vcmpgefp128,
0b010001 => return PpcOpcode::vcmpgtfp128,
0b011001 => return PpcOpcode::vcmpbfp128,
0b100001 => return PpcOpcode::vcmpequw128,
```
The interpreter's existing `instr.rc_bit()` check already handles CR6 update for
dot-forms — decoder just needs to emit the right opcode.
- **See also**: PPCBUG-423 (Phase B original finding) for impact assessment and
full context.
### PPCBUG-601 — `decode_op6` key ordering creates undocumented correctness dependency (MEDIUM)
- **Severity**: MEDIUM (maintainability risk; no current wrong-decode for real code)
- **Status**: open
- **Location**: `decoder.rs:603-637` (`decode_op6`, key1/key2/key3 dispatch)
- **Symptom**: key1 (bits 21-22 << 5 | bits 26-27), key2 (bits 21-23 << 4 | bits 26-27),
and key3 (bits 21-27) all overlap. Correctness depends on an implicit invariant:
vpkd3d128 and vrlimi128 (matched by key2) always have bits 26-27 = `01`, while all
15 key3 unary entries always have bits 26-27 = `11`. If a future instruction were
added to key2 with bits 26-27 = `11`, it would shadow a key3 entry. No comment in
the source documents this constraint.
- **Fix**: Add a comment block above the key2/key3 dispatches explaining the invariant:
```
// key2 matches bits 26-27 == 01 only (vpkd3d128, vrlimi128).
// key3 entries all have bits 26-27 == 11. No overlap is possible
// for any currently-defined Xbox 360 instruction.
```
### PPCBUG-602 — `decode_op4` vsldoi128 fallback: over-broad single-bit catch-all (LOW)
- **Severity**: LOW (only fires for reserved/undefined encodings in practice)
- **Status**: open
- **Location**: `decoder.rs:558-561`
- **Symptom**: The VX128_5 form for vsldoi128 is identified by op=4, bit27=1. The
dispatch uses a bare `if extract_bits(code, 27, 27) == 1` after the other tables,
rather than an exact VX128_5-form check. Reserved VA extended opcodes that happen
to have their key4 bit4 (= word bit27) set decode as vsldoi128 instead of Invalid.
Example: VA XO=0b100011 (35, reserved gap between vmladduhm=34 and vmsumubm=36)
— key4 misses, bit27=1 fires → decoded as vsldoi128. ISA specifies reserved
encodings should trap; this silently assigns a meaning.
- **Fix (optional)**: Strengthen to an exact match:
```rust
// VX128_5 form: SH@22-25, VA128h@26, XO=bit27. Bits 28-31 carry VD128h/VB128h.
// Only vsldoi128 uses this form. Verify the XO bit and absence of load/store marker.
if extract_bits(code, 27, 27) == 1 && extract_bits(code, 30, 31) != 0b11 {
return PpcOpcode::vsldoi128;
}
```
Alternatively, accept current behavior and add a comment.
### PPCBUG-603 — Primary opcode 9 maps to Invalid; correct but undocumented (LOW)
- **Severity**: LOW (test gap / documentation only)
- **Status**: open
- **Location**: `decoder.rs:369` (the `_ => PpcOpcode::Invalid` arm of `lookup_opcode`)
- **Symptom**: Primary opcode 9 (`dozi` in original POWER ISA) is undefined on
Xenon/750CL and correctly decodes as Invalid. Canary also returns `PPC_DECODER_MISS`.
No comment documents this intentional absence.
- **Fix**: Add `// 9 = dozi (POWER-only, not present on Xenon)` comment near the
match, or explicitly add `9 => PpcOpcode::Invalid` with a comment.
### PPCBUG-604 — Zero decoder unit tests for decode_op5, decode_op6, decode_op30, decode_op63 (LOW)
- **Severity**: LOW (test gap)
- **Status**: open
- **Location**: `decoder.rs:897-1107` (test module)
- **Symptom**: The 10 existing decoder tests cover addi, lwz, branch, stw, ori, and
cache mechanics. None exercise VMX128 (op=5, op=6), rotate-doubleword (op=30), or
FPU (op=63) opcode paths. In particular, no test would have caught PPCBUG-600
(vcmpeqfp128 dot-form decodes as Invalid) before it caused a runtime trap.
- **Recommended minimum additions** (8 tests):
1. `vcmpeqfp128` (Rc=0) → decodes as `vcmpeqfp128`.
2. `vcmpeqfp128.` (Rc=1) → decodes as `vcmpeqfp128` (tests PPCBUG-600 fix).
3. `vcmpeqfp` (op=4, Rc=0) → key3 check, bit21=0.
4. `vcmpeqfp.` (op=4, Rc=1) → key3 check, bit21=1, same decode.
5. `vsldoi128` (op=4, bit27=1) → fallback fires.
6. `rldicl` (op=30) → decode_op30.
7. `fadd` (op=63, Rc=0) → arithmetic table.
8. `fadd.` (op=63, Rc=1) → same decode as fadd.
### PPCBUG-605 — `decode_op31` sradix fast-path is correct but undocumented (LOW)
- **Severity**: LOW (documentation gap only)
- **Status**: open
- **Location**: `decoder.rs:702-705`
- **Symptom**: The sradix pre-check uses bits 21-29 (9 bits). The subsequent main
table uses bits 21-30 (10 bits). Because no main-table entry has bits 21-29 =
0b110011101, the fast-path cannot shadow a legitimate main-table entry. However,
this is not documented in the source, and a reader might worry that sradix (Rc=0,
bits 21-30 = 0b1100111010) or sradix. (Rc=1, same bits 21-30) could conflict with
a future entry at key 0b1100111010.
- **Fix**: Add a comment: `// sradix: XS-form, XO=413 (bits 21-29=0b110011101).`
`// No main-table entry uses bits 21-30 starting with 0b110011101x.`
IDs PPCBUG-606 through PPCBUG-639 are unallocated — no further bugs found in Phase C2.
---
## Phase C3 — Disassembler formatter parity
Per-group report: `audit-out/phase-c3-disasm.md`.
**Methodology**: Full line-by-line audit of `disasm.rs:format()` and all ~70 per-class helpers.
Cross-referenced against `xenia-canary/src/xenia/cpu/ppc/ppc_opcode_disasm_gen.cc`,
`tests/golden/extended_mnemonics.json`, and `tests/golden/base_mnemonics.json`.
Checked: mnemonic correctness (Rc/OE/LK/AA/L-field), operand formatting (signed vs unsigned,
hex vs decimal), simplified-mnemonic priority, branch-condition extended forms, VMX register
naming, VX128 field extraction, and golden test coverage.
**Overall verdict**: The formatter is structurally sound. All OE/Rc/LK/AA suffix handling, the
simplified mnemonic priority order, VMX 5-bit and VMX128 7-bit register naming, SPR mnemonics,
and CR-logical extended forms are correct. Two HIGH bugs found: the `bdnz`/`bdz` extended
mnemonic appends a spurious condition suffix, and the pre-existing `sync`/`lwsync` bug
(PPCBUG-088) is re-assessed as HIGH in disassembler scope. Two MEDIUM bugs: decimal vs hex
for SIMM immediates and D-form displacements (diverges from every real PPC disassembler).
Several LOW findings for golden fixture correctness and edge cases.
**Key finding**: the disassembler's VX128 field extraction (vperm128 VC, vsldoi128 SH,
vpermwi128 PERM) is CORRECT in all three cases where the interpreter (PPCBUG-360/361/362)
has the wrong extraction. The disassembler was written independently and got them right.
### PPCBUG-640 — `fmt_bc`: pure `bdnz`/`bdz` emits `bdnzge`/`bdzge` (spurious condition suffix) (HIGH)
- **Severity**: HIGH
- **Status**: applied (d4f6ea7, 2026-05-02)
- **Location**: `disasm.rs:829-834`
- **Symptom**: For `bcx` with BO=16 (`bdnz`: decrement CTR, branch if CTR≠0, CR ignored):
- `decr = (16 & 4) == 0` = true
- `uncond = (16 & 16) != 0` = true
- Code falls into the `if decr` branch and computes `cond_name_opt` from `(cr_bit=0, cond_true=false)` → `Some("ge")`
- Emits: **`bdnzge`** — WRONG. ISA simplified form is `bdnz`.
For BO=18 (`bdz`): same path → **`bdzge`** — WRONG.
The bug is absent in `fmt_bclr` which has an explicit `if decr && uncond` guard at line 872
producing `bdnzlr`/`bdzlr` correctly. `fmt_bc` lacks this guard.
The golden fixture "bdnz 0x82000040" (PPCBUG-650 companion) pins the wrong output.
- **Fix**: In `fmt_bc`, inside the `if decr` block, gate the condition string on `!uncond`:
```rust
if decr {
let z = if bo & 0x02 != 0 { "z" } else { "nz" };
let cond_str = if uncond { "" } else { cond_name_opt.unwrap_or("") };
let ext_mnem = format!("bd{z}{cond_str}{a}{l}");
let ext_ops = format!("{cr}0x{target:08X}");
with_ext(&base_mnem, base_ops, 8, &ext_mnem, ext_ops, 8)
}
```
Also update golden fixtures PPCBUG-650.
- **Impact**: All analysis-DB queries for `bdnz` loops (common in pixel-shader and vertex
processing loops) return zero rows; they are stored as `bdnzge`. Developers inspecting
loop structures see a misleading condition name on a CTR-only branch.
### PPCBUG-641 — `sync` emits `"sync"` for `lwsync` (L=1) — re-assessment of PPCBUG-088 (HIGH)
- **Severity**: HIGH (disassembler scope; PPCBUG-088 was LOW for interpreter scope)
- **Status**: open (see PPCBUG-088 for fix)
- **Location**: `disasm.rs:364`
- **Symptom**: `PpcOpcode::sync` always emits `"sync"`. The L-field at PPC bit 10 selects
`lwsync` (L=1, encoding `0x7C2004AC`). `lwsync` is the acquire barrier in every Xbox 360
spinlock. Every `lwsync` in the disassembly DB is stored as `mnemonic='sync'`.
`SELECT * WHERE mnemonic='lwsync'` returns zero rows regardless of binary content.
- **Note**: the golden fixture for lwsync (PPCBUG-649) currently pins the wrong output.
### PPCBUG-642 — `fmt_bcctr` missing extended form for CTR-decrement/ignore-CR BO values (MEDIUM)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `disasm.rs:880-902`
- **Symptom**: `bcctrx` with BO=16 (decrement CTR, ignore CR) falls through to `base()` with
no extended form. `fmt_bclr` (the equivalent for bclrx) correctly handles the same case with
an explicit `decr && uncond` check at line 872, producing `bdnzlr`.
Note: `bcctr` with CTR-decrement is undefined by PowerISA; this encoding should never appear
in valid compiled code. The inconsistency is a maintenance concern rather than a runtime bug.
- **Fix**: Add a `decr && uncond` check before the `cond_branch_ext` call in `fmt_bcctr`,
mirroring lines 872-876 in `fmt_bclr`. Or add a comment explaining the ISA undefined status.
### PPCBUG-643 — SIMM immediate display: decimal diverges from Canary and real disassemblers (MEDIUM)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `disasm.rs:946` (addi), `976` (addic), `989` (subfic), `990` (mulli),
`1003` (cmpi), `1048-1061` (fmt_ld/fmt_st), and all similar SIMM sites
- **Symptom**: SIMM immediates are formatted via Rust's `{imm}` (decimal). Canary uses
`"-0x{:X}"` / `"0x{:X}"` (signed hex) for every SIMM field. GNU objdump, IDA Pro,
and all standard PPC disassemblers use hex. The inconsistency is internal to xenia-rs:
`addis`/`oris`/`xoris` use hex (`0x{imm_u:X}`), but `addi`/`addic`/`mulli` use decimal.
This misleads analysis-DB queries that mix instructions (e.g. `addi r3, r1, -4` vs
`addis r3, r0, 0x8000`).
- **Impact**: Medium — the output is not *wrong* (the value is correctly computed), but
cross-referencing with Canary output or objdump requires manual conversion.
### PPCBUG-644 — D-form load/store displacement uses decimal instead of hex (MEDIUM)
- **Severity**: MEDIUM
- **Status**: open
- **Location**: `disasm.rs:1053` (`fmt_ld`), `1061` (`fmt_st`), `1069` (`fmt_ds`)
- **Symptom**: `format!("{rn}, {d}({})", gpr(ra))` outputs decimal for the displacement.
Canary outputs `"-0x8(r1)"` not `"-8(r1)"`. Every standard PPC disassembler uses hex.
Affects 25+ D-form and DS-form opcodes. Negative displacements (-8, -16, etc.) are
especially confusing in decimal when reading stack frame accesses.
- **Fix**:
```rust
let d_str = if d < 0 { format!("-0x{:X}", -d) } else { format!("0x{:X}", d) };
base(mnem, format!("{rn}, {d_str}({})", gpr(ra)), 8)
```
Update all golden fixture rows with displacement values.
### PPCBUG-645 — `cntlzdx` Rc suffix: moot for valid encodings, but WONTFIX (LOW)
- **Severity**: LOW
- **Status**: wontfix
- **Location**: `disasm.rs:286`
- **Note**: `fmt_x_unary_rc` would emit `cntlzd.` for Rc=1, but valid `cntlzd` encodings
always have Rc=0. Canary emits `cntlzd` always. No impact for valid code.
### PPCBUG-646 — `fmt_rlwimi` inslwi/insrwi priority overlap: confirmed correct (LOW)
- **Severity**: LOW
- **Status**: wontfix
- **Note**: After careful analysis, the `inslwi` guard excludes `insrwi` overlap cases
(`sh != 31u32.wrapping_sub(me)`). Priority is correct. Informational only.
### PPCBUG-647 — `fmt_rlwinm` `extrwi` uses `wrapping_sub` which can give misleading results for invalid encodings (LOW)
- **Severity**: LOW
- **Status**: open
- **Location**: `disasm.rs:1137`
- **Symptom**: `let b = sh.wrapping_sub(n) % 32;` — for invalid `sh < n` encodings,
`wrapping_sub` gives a large u32, `% 32` gives a confusing value. For all compiler-emitted
encodings `sh >= n` holds. Add `&& sh >= 32 - mb` to the guard to avoid the fallthrough.
### PPCBUG-648 — `fmt_mftb` TBR=268: ext mnemonic identical to base mnemonic (LOW)
- **Severity**: LOW
- **Status**: open
- **Location**: `disasm.rs:1443`
- **Symptom**: `268 => with_ext("mftb", base_ops, 8, "mftb", gpr(rd), 8)` — base is `mftb`,
extended is also `mftb`. `display()` picks the extended form (omitting the `268` operand),
making it ambiguous vs. `mftbu`. Consider: either emit base-only (`mftb r3, 268`) or rename
the base to `mftb.raw` for disambiguation.
### PPCBUG-649 — Golden fixture for `lwsync` pins wrong output (no ext_mnemonic) (LOW)
- **Severity**: LOW (test coverage gap)
- **Status**: applied (2be25bd, 2026-05-02)
- **Location**: `tests/golden/extended_mnemonics.json`, entry "lwsync"
- **Symptom**: Fixture has `mnemonic: "sync"` and no `ext_mnemonic`. After PPCBUG-088/641
fix, expected output is `mnemonic: "sync"`, `ext_mnemonic: "lwsync"`. Current fixture
defeats regression detection — the test passes with wrong output.
### PPCBUG-650 — Golden fixtures for `bdnz`/`bdz` pin wrong extended mnemonic (LOW)
- **Severity**: LOW (companion to PPCBUG-640)
- **Status**: applied (d4f6ea7, 2026-05-02)
- **Location**: `tests/golden/extended_mnemonics.json`, rows "bdnz 0x82000040" and "bdz 0x82000040"
- **Symptom**: Both rows have `ext_mnemonic: "bdnzge"` and `ext_mnemonic: "bdzge"`.
After PPCBUG-640 fix, correct values are `"bdnz"` and `"bdz"`.
### PPCBUG-651 — `fmt_vmx128_pack_d3d` shared by `vpkd3d128` and `vrlimi128`: confirmed correct (LOW)
- **Severity**: LOW
- **Status**: wontfix
- **Note**: Both opcodes use VX128_4 form. Shared formatter outputs identical operand lists
(`vd, vb, imm, z`) which is correct for both. Informational only.
### PPCBUG-652 — Zero golden fixtures for any VMX128 opcode disassembly (LOW)
- **Severity**: LOW (test coverage gap)
- **Status**: open
- **Location**: `tests/golden/` — all three JSON files
- **Symptom**: No fixture pins the formatted output of any VMX128 instruction. Regressions
in VMX128 field extraction (e.g. a re-introduction of PPCBUG-360/361/362 in the disassembler)
would be invisible. Recommend adding at minimum: `vaddfp128`, `vperm128`, `vsldoi128`,
`vpkd3d128`, `vcmpeqfp128.`, `vmaddfp128`.
### PPCBUG-653 — `fmt_trap_imm` unconditional trap extended form: confirmed not-a-bug (LOW)
- **Severity**: LOW
- **Status**: wontfix
- **Note**: `twi 31, rA, IMM` (to=31) has no ISA simplified mnemonic unless RA=0 and IMM=0
(which matches `tw 31, r0, r0 = trap`). The `fmt_trap_imm` correctly emits base-only for
`twi 31, rA, N`. Informational.
### PPCBUG-654 — `fmt_rldimi` `insrdi` guard excludes valid `mb=0` (b=0) case (LOW)
- **Severity**: LOW
- **Status**: open
- **Location**: `disasm.rs:1220`
- **Symptom**: Guard `if mb > 0` excludes `insrdi rA, rS, n, 0` (b=0 → mb=0). A valid
compiler-emitted `rldimi` with sh+mb+n=64 and mb=0 falls through to base form instead of
displaying the `insrdi` simplified mnemonic.
- **Fix**: Remove the `mb > 0` guard; the inner `n > 0` guard is sufficient to avoid
degenerate cases.
IDs PPCBUG-655 through PPCBUG-679 are unallocated — no further bugs found in Phase C3.
---
## Phase C4 — Post-merge audit corrections (2026-05-02)
### PPCBUG-700 — VMX128 register accessors disagreed with canary's bitfield layout (HIGH)
- **Severity**: HIGH (silent mis-decoding of any VMX128 instruction with a register >= 32)
- **Status**: applied
- **Locations**: `decoder.rs:138-160` (`va128`/`vb128`/`vd128`), `decoder.rs:80` (`vx128r_rc_bit`)
- **Discovery**: independent reviewer of the P3 phase merge, comparing our rust accessors
against canary's `FormatVX128`/`VX128_2`/`VX128_4`/`VX128_5`/`VX128_R` bitfield struct
in `xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663`.
- **Symptom**: this entry contradicts the audit's own line 2958 ("confirmed-clean")
assessment. The previous audit miscounted bit-field offsets — under x86_64 LSB-first
C++ bitfield packing, the canary fields land at:
- `VA128 = VA128l(5) | VA128h(1)<<5 | VA128H(1)<<6` = PPC[11-15] | PPC[26]<<5 | PPC[21]<<6 (3 fields, 7 bits)
- `VB128 = VB128l(5) | VB128h(2)<<5` = PPC[16-20] | PPC[30-31]<<5 (2 fields, 7 bits)
- `VD128 = VD128l(5) | VD128h(2)<<5` = PPC[6-10] | PPC[28-29]<<5 (2 fields, 7 bits)
- `Rc` (VX128_R only) = PPC[25] (host bit 6) — not PPC[27] as PPCBUG-422/562 prescribed.
Rust code instead used va128: PPC[11-15] | PPC[29]<<5 (one bit, wrong position); vb128:
PPC[16-20] | PPC[28]<<5 | PPC[30]<<6 (wrong positions); vd128: PPC[6-10] | PPC[21]<<5 |
PPC[22]<<6 (wrong positions); vx128r_rc_bit at PPC[27].
- **Why it lurked**: the buggy convention was internally consistent with hand-crafted
test fixtures (which set bit 29 / 21 / 22 to encode "high" registers, matching the
buggy accessor). Real Xbox 360 game code follows canary's convention, so any production
encoding with VR >= 32 was silently mis-decoded — but no unit test exercised that path.
- **Fix**: rewrite the four accessors to canary's bit positions; rewrite the
`vmx128_test_word` helper and unit tests; re-encode the goldens for vmaddfp128/
vmaddcfp128/vnmsubfp128/vperm128/vsrw128/vpermwi128/vrlimi128. Drop the speculative
`key4_dt` dot-form dispatch in `decode_op6` (canary has no separate dot-form opcodes
for VX128_R compute ops; Rc is a runtime modifier). Update `encode_vpkd3d128` test
helper for canary's VD128h placement.
- **Cross-reference**: invalidates the audit's confirmed-clean note at line 2958.
Subsumes the partial fix-shape proposed in PPCBUG-422 (Rc-bit position).
---
# May 2026 Comprehensive Audit (extends prior PPCBUG namespace)
**Started**: 2026-05-02. **Charter**: [audit-2026-05-charter.md](audit-2026-05-charter.md).
**Severity**: P0 blocker / P1 wrong-result / P2 spec drift / P3 cosmetic.
## ORACBUG (M01 — oracles and goldens)
Per-milestone report: [audit-out/m01-oracles.md](audit-out/m01-oracles.md).
### ORACBUG-001 — base_mnemonics.json self-derived circular
- **Severity**: P1
- **Status**: open
- **Location**: crates/xenia-cpu/tests/disasm_goldens.rs:70-88 (`build_rows`); fixture crates/xenia-cpu/tests/golden/base_mnemonics.json
- **Symptom**: every "expected" mnemonic/operands/etc. is captured from `xenia_cpu::disasm::format()` at golden-creation time and frozen. The frozen JSON is asserted against future runs of the same function. Detects regression-from-snapshot, not absolute correctness. Human-readable `label` field is never asserted.
- **Recommendation**: add canary-disasm differential (see M02) and POWERISA-derived parallel oracle for ~20 representative cases.
### ORACBUG-002 — extended_mnemonics.json self-derived circular
- **Severity**: P1
- **Status**: open
- **Location**: crates/xenia-cpu/tests/golden/extended_mnemonics.json (623 rows)
- **Symptom**: same as ORACBUG-001, with extra risk: extended mnemonic emission is decision-tree output (li, lis, mr, not, slwi, srwi, clrldi, blr, bctr, beq/bne, lwsync, …). A bug in the canonicalization decision tree is not caught.
### ORACBUG-003 — vmx128_registers.json self-derived + hand-coded raw bytes
- **Severity**: P1
- **Status**: open
- **Location**: crates/xenia-cpu/tests/disasm_goldens.rs:421-527
- **Symptom**: same circularity, plus 4-operand multiply-add cases (lines 513-519) bypass encoding helpers and use HARD-CODED u32 literals (0x146328F0, 0x14632930, 0x14632970). PPCBUG-700 demonstrated this risk: the prior buggy convention was internally self-consistent in fixtures and lurked until a manual canary cross-check.
### ORACBUG-004 — sylpheed_n2m.json structurally insufficient
- **Severity**: P0
- **Status**: open
- **Location**: crates/xenia-app/tests/golden/sylpheed_n2m.json
- **Symptom**: at -n 2M instructions all rendering metrics are 0 (packets/draws/swaps/resolves/render-targets/textures). Sylpheed's first VdSwap fires at ~18M cycles. The golden cannot detect 11 of 14 digest fields by construction.
- **Risk**: this is the only end-to-end Sylpheed regression catcher in the workspace. Future fixes optimized to pass this gate are optimized against a blind oracle.
- **Recommendation**: add `sylpheed_n50m.json` (CI-feasible, captures VdSwap=1) and `sylpheed_n4b.json` (matches canonical reference invocation; commit-time gate).
### ORACBUG-005 — db_schema_golden.rs synthetic PE missing direct-branch coverage
- **Severity**: P3
- **Status**: open
- **Location**: crates/xenia-analysis/tests/db_schema_golden.rs:23-53
- **Symptom**: the synthetic PE has 4 instructions (mflr/nop/blr/nop). Direct-branch path of the DB writer (target_hex column population) is never exercised; only the indirect-only path is. Schema columns are correctly locked but coverage is thin.
### ORACBUG-006 — RunDigest missing high-leverage fields
- **Severity**: P2
- **Status**: open
- **Location**: crates/xenia-app/src/main.rs:1267-1306 (RunDigest struct + capture)
- **Symptom**: digest exposes 14 fields, missing several high-signal counters that already exist in the system: unique_pcs_executed, kernel_calls_per_export histogram, mmio_reads/writes, scheduler.deadlock_recoveries, scheduler.deadlock_halts, events_signaled, events_waited, events_with_zero_signals, lwarx_count, stwcx_success_count, stwcx_fail_count.
- **Risk**: M11's run-matrix can only diff coarse counters. Several "is the renderer chain alive?" probes are not captured.
### ORACBUG-007 — analysis-shim parity test inherits CIRCULAR provenance
- **Severity**: P2
- **Status**: open
- **Location**: crates/xenia-analysis/tests/disasm_goldens.rs:50-89 (check_fixture)
- **Symptom**: test does (a) shim-vs-cpu parity (good — catches drift) and (b) cpu-vs-fixture (inherits circularity from ORACBUG-001/002/003). The primary purpose (parity) is sound; only the secondary assertion is suspect.
### ORACBUG-008 — encode_vx128 helper lacks canary citation
- **Severity**: P3
- **Status**: open
- **Location**: crates/xenia-cpu/tests/disasm_goldens.rs:53-68
- **Symptom**: the encode helper currently encodes per canary's VX128 layout (post-PPCBUG-700) but lacks a comment block citing canary's `xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663`. A future "simplification" without canary cross-check could silently regress to the prior buggy convention.
## PPCBUG (M05 — scheduler + reservation + block_cache)
### PPCBUG-701 — Reservation generation 24-bit ring: false-match risk under long-delay paths (P3, latent)
- **Severity**: P3
- **Status**: open
- **Location**: crates/xenia-cpu/src/reservation.rs:67-83 (pack), :188-191 (next_gen mask)
- **Symptom**: `next_gen` is masked to 24 bits when packed (`& 0xFF_FFFF`). After 16,777,216 reservations, the generation wraps. If thread A's `lwarx` and its paired `stwcx.` are separated by ≥16M peer reservations on the same bank slot, and the bank still holds A's `(line, gen)` at commit time, `try_commit` will incorrectly succeed.
- **Risk**: very low under realistic workloads (reservation count between an lwarx-stwcx pair is typically <100, and same-bank displacement bumps `gen` regardless). Not observable on Sylpheed.
- **Recommendation**: defer until empirical evidence shows wraparound. If pursued, widen `gen` to 32 bits by stealing the line-address-low bits (low 7 bits of line are always zero — recoverable via masking).
- **Canary**: canary's bitmap model has the equivalent bit-aliasing risk at `RESERVE_BLOCK_SHIFT` granularity but no time-domain wrap.
### PPCBUG-702 — `invalidate_for_write` doc says collisions invalidate; code says they don't (P3, doc drift)
- **Severity**: P3
- **Status**: open
- **Location**: crates/xenia-cpu/src/reservation.rs:38-46 (doc) and :235-256 (code)
- **Symptom**: the file-level doc invariant 2 says "any plain store to a reserved line invalidates it (slot CASed to zero). Hash-collision side-effect: a store to a different line that maps to the same bank also invalidates" — but the actual code at :248-256 explicitly returns early when `bank_line != line`, leaving the reservation alone. The code is more correct (fewer spurious failures), but the doc contradicts it.
- **Recommendation**: update the file doc to describe the "tag-checked invalidation" actually implemented. No code change needed.
### PPCBUG-703 — `--parallel` is non-deterministic; `XENIA_SCHED_SEED` does not regulate it (P3, doc gap)
- **Severity**: P3
- **Status**: open
- **Location**: crates/xenia-cpu/src/scheduler.rs:232-249, :710-734; crates/xenia-app/src/main.rs:2230-2415
- **Symptom**: `--parallel` workers race for the kernel mutex within each round; observable interleavings depend on host OS scheduling, not on `XENIA_SCHED_SEED`. The seed regulates ONLY the per-round slot-list shuffle, which has no effect under `--parallel` since workers race for the lock independently. Same-seed-same-input runs under `--parallel` produce different observable schedules.
- **Risk**: M11's bisection cannot reliably reproduce an observed regression under `--parallel`; lockstep must be used for bisection.
- **Recommendation**: document the determinism boundary in CLI help text. If true determinism is needed under `--parallel`, the kernel-mutex acquisition order must be re-introduced as a coordinator-driven sequence (a regression of the M3 perf goal).
### PPCBUG-704 — `icbi` is a no-op; correctness depends on `bump_page_version` from data-store path (P3, latent)
- **Severity**: P3
- **Status**: open
- **Location**: crates/xenia-cpu/src/interpreter.rs:1697-1701; crates/xenia-cpu/src/block_cache.rs:142-178
- **Symptom**: `icbi` (instruction cache block invalidate) is collapsed into the cache/sync no-op arm. Self-modifying code is currently caught only because every `write_u8/16/32/64` in `xenia-memory/src/heap.rs` unconditionally calls `bump_page_version`. If a future optimization makes `bump_page_version` conditional (e.g., distinguish data vs code pages, or skip bumping for non-instruction-page writes), `icbi` will need to actively bump the cache line.
- **Risk**: latent; no current SMC failure observed.
- **Recommendation**: add a comment in the cache/sync arm pointing at the implicit invariant: "icbi is correct because every store bumps page_version; if that changes, icbi must bump explicitly". Cross-references M06 memory invariants.
### PPCBUG-705 — Phaser `phase: AtomicU32` wrap at 4 B rounds (P3, latent)
- **Severity**: P3
- **Status**: open
- **Location**: crates/xenia-cpu/src/phaser.rs:64, :128, :172
- **Symptom**: `phase` is `AtomicU32` and `fetch_add(1, Release)`. After 4,294,967,296 rounds the counter wraps. Wait-loop predicate `phase != pre_phase` is false at exact wraparound on a stalled arriver — appears as a missed wake at exact 2^32 round count.
- **Risk**: at xenia-rs's actual round rate (~10^4 rounds/sec) this requires ~5 days of continuous runtime. Not realistic.
- **Recommendation**: widen to `AtomicU64` next time the phaser API is touched. No urgency.
## PPCBUG (M02 — decoder/disasm)
- **PPCBUG-706** — Tracker drift; PPCBUG-088/641 (sync/lwsync) shown as `open` but disasm fix at `crates/xenia-cpu/src/disasm.rs:364-372` is already applied. P3 (tracker hygiene). Recommendation: flip both to `applied`. See `audit-out/m02-decoder-disasm.md`.
- **PPCBUG-707** — Disasm column-pad width inconsistent across opcode families (8/9/10/11/12/14) and divergent from canary's single `kNamePad=11` (`xenia-canary/src/xenia/cpu/ppc/ppc_opcode_disasm.h:22`). P3 cosmetic; ~150 call sites in `disasm.rs`. Affects every textual diff with canary. See `audit-out/m02-decoder-disasm.md`.
- **PPCBUG-708** — `fmt_bc`/`fmt_bclr`/`fmt_bcctr` base form uses CR-bit names (`crb()`) for BI; canary emits raw BI integer (`ppc_opcode_disasm_gen.cc:158-186`). Extended forms unaffected. P3 cosmetic; 3 lines to change. See `audit-out/m02-decoder-disasm.md`.
- **PPCBUG-709** — `mfspr`/`mtspr`/`mftb` base form emits symbolic SPR name (`LR`/`CTR`); canary emits raw SPR integer (`ppc_opcode_disasm_gen.cc:1601-1602`). Extended forms (`mflr`/`mtctr`/etc.) unaffected. P3 cosmetic. See `audit-out/m02-decoder-disasm.md`.
- **PPCBUG-710** — `decoder.rs:79` has a stale doc-comment claiming `vx128r_rc_bit` reads PPC bit 27 (host bit 4); the immediately following line 80-82 correctly says PPC bit 25 (host bit 6). Code is correct; comment contradicts itself. P3 doc hazard. Recommendation: delete line 79.
- **PPCBUG-711** — `decoder.rs:183-199` (`extract_vx128_uimm5`) has a 17-line doc comment narrating the pre-PPCBUG-700 buggy convention; references "First-Pixels M3" without citing the PPCBUG IDs. P3 cleanup. Recommendation: trim to 3-4 lines, move history to `audit-findings.md`.
## PPCBUG (M04 — FPSCR + VMX)
- **PPCBUG-712** — `crates/xenia-cpu/src/overflow.rs:29-102`: 64-bit overflow helpers (`add_ov_64`, `sub_ov_64`, `adde_ov_64`, `sum_overflow_64`, `neg_ov_64`) are dead code; interpreter inlines all 32-bit i128 overflow checks for the 32-bit ABI. P3 cosmetic. See `audit-out/m04-fpscr-vmx.md`.
- **PPCBUG-713** — `crates/xenia-cpu/src/interpreter.rs:3848-3852` (`vcmpbfp`/`vcmpbfp128`): CR6.LT never set when all lanes are out-of-bounds. Canary's `f.UpdateCR6(f.Or(gt, lt))` (`ppc_emit_altivec.cc:579`) sets LT = all-true(out-mask). xenia-rs hardcodes `lt: false`. P2; coupled with PPCBUG-421 (Rc-bit position) — both must land together. See `audit-out/m04-fpscr-vmx.md`.
- **PPCBUG-714** — `crates/xenia-cpu/src/{fpscr.rs,interpreter.rs}`: `VXSOFT` constant defined (`fpscr.rs:51`) but no setter anywhere. Software-triggered only via mtfsf paths, which were not verified to honour the bit. P3. See `audit-out/m04-fpscr-vmx.md`.
- **PPCBUG-715** — `crates/xenia-cpu/src/interpreter.rs:2681,2694,2736,2750`: `fmsubx`/`fmsubsx`/`fnmsubx`/`fnmsubsx` compute `a.mul_add(c, -b)`. Rust's unary `-` flips the sign bit of a NaN `b`, corrupting NaN-payload propagation. Distinct from PPCBUG-205 which fixed the *output* negation; this is the *input* negation. P2; recommendation: replace `-b` with `if b.is_nan() { b } else { -b }`. See `audit-out/m04-fpscr-vmx.md`.
- **PPCBUG-716** — `crates/xenia-cpu/src/fpscr.rs:320-325` (`update_cr1`): maps FPSCR[FX]→CR1.lt, [FEX]→CR1.gt, [VX]→CR1.eq, [OX]→CR1.so. Logic matches canary `CopyFPSCRToCR1` (`ppc_hir_builder.cc:491-501`), but reuse of generic CrField field names without a comment block tying fx→lt invites future confusion. P3 docs. See `audit-out/m04-fpscr-vmx.md`.
## PPCBUG (M03 — interpreter)
- **PPCBUG-720** — `interpreter.rs:118` `addi` truncates result to 32 bits (`as u32 as u64`); canary `ppc_emit_alu.cc:103-115` does full 64-bit add. Charter only documents `addis` truncation, not `addi`. P1. [REGRESSION-CANDIDATE] See `audit-out/m03-interpreter.md`.
- **PPCBUG-721** — `interpreter.rs:138-152` `addic`/`addicx` operate on 32-bit narrowed operands; CA from `result32 < ra32`. Canary `ppc_emit_alu.cc:117-135` is fully 64-bit via `AddDidCarry`. P1. [REGRESSION-CANDIDATE]
- **PPCBUG-722** — `interpreter.rs:155-163` `subfic` 32-bit-only; canary `ppc_emit_alu.cc:459-466` is 64-bit. P1. [REGRESSION-CANDIDATE]
- **PPCBUG-723** — `interpreter.rs:165-172` `mulli` casts product `as u32` discarding bits [32:63]; canary uses 64-bit signed multiply (low 64 of 128-bit product per ISA). P2.
- **PPCBUG-724** — `interpreter.rs:1244,4594` `stwcx`/`stdcx` width-discriminator (`reservation_width == 4/8`) is stricter than canary (`ppc_emit_memory.cc:868-908` no width check) and stricter than PowerISA. Reopen of PPCBUG-151. P0. [REGRESSION-CANDIDATE — STRONG] Bisect around `a107ac9`.
- **PPCBUG-725** — `interpreter.rs:1665` `mtmsrd` L=1 mask is `EE | RI` (0x8001); canary `ppc_emit_control.cc:828-837` uses `EE` only (0x8000). P2.
- **PPCBUG-726** — `interpreter.rs:737-748` `rlwimix` zeroes RA[0:31] via `as u32 ... as u64`; canary `ppc_emit_alu.cc:1010-1033` preserves RA[0:31] via 64-bit OR with `MASK(MB+32, ME+32)`. P2.
- **PPCBUG-727** — `interpreter.rs:2901,2922` `fctidx`/`fctidzx` overflow boundary `val >= (i64::MAX as f64)` mis-flags (2^63 - 1024, 2^63) as overflow due to f64 precision (i64::MAX rounds up to 2^63 in f64). P3.
- **PPCBUG-728** — `interpreter.rs:1705-1724` `dcbz`/`dcbz128` only call `invalidate_for_write(ea)` once. Confirmed sufficient (32B fits in 128B line; dcbz128 IS a 128B line). WONTFIX, informational guard for future widening.
- **PPCBUG-729** — `interpreter.rs:1117,1124,1130` `lwa`/`lwax`/`lwaux` correctly sign-extend per hotfix `f1166d0`. CLEARED, verification only.
- **PPCBUG-730** — Reservation granule is 128 bytes (Xenon-correct) vs canary's byte-granular `real_addr(EA)`. Documented, recommendation: append to charter §"Known Intentional Divergences from Canary". P3 informational.
- **PPCBUG-731** — `interpreter.rs:908-938` `bcx` LR write timing in both AA paths. Confirmed equivalent to canary. P3 informational.
- **PPCBUG-732** — `interpreter.rs:962-981` `bcctrx` correctly omits CTR decrement (CTR is target). Confirmed equivalent to canary. P3 informational.
- **PPCBUG-733** — `interpreter.rs:1610` `mtspr CTR` truncates input to 32 bits (`val as u32 as u64`); `mfspr CTR` returns 64-bit. Canary `ppc_emit_control.cc:792` stores full 64-bit. PowerISA: CTR is 64-bit SPR. P2.
- **PPCBUG-734** — `interpreter.rs:2980-3040` `fcmpu`/`fcmpo` correctly distinguish ordered/unordered VXSNAN/VXVC. Canary `ppc_emit_fpu.cc:329-367` has bug — `bool ordered` parameter never read. P3 (Rust is more correct); recommend appending to charter §"Known Intentional Divergences from Canary".
- **PPCBUG-735** — `interpreter.rs:441,450,459,476,493,617,681,689,706,720,769,779,789,799,809,819` 64-bit Rc-form ALU ops (`mulld.`, `mulhd.`, `mulhdu.`, `divd.`, `divdu.`, `cntlzd.`, `sld.`, `srd.`, `srad.`, `sradi.`, `rldicl.`, `rldicr.`, `rldic.`, `rldimi.`, `rldcl.`, `rldcr.`) call `update_cr_signed(0, x as i64)` — full 64-bit signed view; canary `ppc_hir_builder.cc:397-421` `UpdateCR(n, v)` does `Truncate(v, INT32_TYPE)` first — always 32-bit. CR0 disagrees with canary on values that change sign between i32 and i64 view. P1. [REGRESSION-CANDIDATE — STRONG]
## MEMBUG (M06 — memory subsystem)
**Headline**: write-visibility verdict = **NOT broken at the memory layer** (same-thread store/load is mechanically sound; BST paradox cause is upstream — see M03 candidates). 9 findings; 1 P1, 4 P2, 4 P3. See `audit-out/m06-memory.md`.
- **MEMBUG-001** — `crates/xenia-memory/src/heap.rs:155-171` `bump_page_version` Release fence on `page_versions[idx]` correctly publishes the prior data store on x86_64 (TSO) and on weaker hosts via Release-store ordering. Doc-only risk: any future code that publishes via `page_versions` without first executing the data store *and* the Release-store inside `bump_page_version` would silently lose the visibility edge. P2 docs.
- **MEMBUG-002** — `crates/xenia-memory/src/heap.rs:8` hardcodes `PAGE_SIZE = 4096` for the entire 4 GB. Canary uses 4K/64K/16MB across 9 distinct heaps (`memory.cc:222-242`). Consequence: `PageEntry::region_page_count` is in 4K units rather than heap-native units — guest queries that walk `region_page_count * page_size` overshoot for 64K-heap-allocated regions. Latent. P2.
- **MEMBUG-003** — `crates/xenia-memory/src/heap.rs:184-202` no physical-address aliasing across `0xA0000000`/`0xC0000000`/`0xE0000000`. Canary maps all three onto the same physical-membase view (`memory.cc:235-242`). A guest CPU write to one alias is invisible at another. Risk: `MmGetPhysicalAddress`-shape round-trips and DMA-buffer aliasing return stale bytes. **P1**, latent.
- **MEMBUG-004** — `crates/xenia-memory/src/heap.rs` `is_mapped` accepts addresses in `0xFFD00000-0xFFFFFFFF`; canary `LookupHeap` (`memory.cc:434`) returns null. Latent — corrupt high-byte pointers don't fault. P2.
- **MEMBUG-005** — `crates/xenia-memory/src/platform.rs:31` always commits with `PROT_READ | PROT_WRITE`; xenia-rs cannot fault on writes to guest-read-only-protected pages. Matches canary's `emit_inline_mmio_checks` mode (no host-level protect enforcement). P3 informational.
- **MEMBUG-006** — `crates/xenia-gpu/src/mmio_region.rs:62-67,108-115` unmapped GPU MMIO reads/writes log at `tracing::trace!`; should be `warn` (rate-limited per `(reg_index, kind)` pair) so renderer-divergence first-line observability doesn't require enabling trace globally. P2.
- **MEMBUG-007** — `crates/xenia-memory/src/heap.rs:434-436,450-452,467-469` cross-page `bump_page_version` guard verified correct for all access widths. P3 informational.
- **MEMBUG-008** — `icbi`-correctness invariant (cross-references PPCBUG-704): every data store must `bump_page_version`. If any future perf optimization makes that conditional, `icbi` (currently no-op) must be made explicit. P3 documentation.
- **MEMBUG-009** — Static analysis: 29 distinct callers of `sub_82454770` (intrusive-list-merge validator); only the BST registration through `sub_82175E68 → sub_82175F10` trips the throw. Confirms the renderer-blocker is NOT a memory-layer issue — every list-merge operation would fail uniformly if it were. P3 informational, supports M06 verdict.
## XAMBUG (M08 — XAM)
- **XAMBUG-001** — `crates/xenia-kernel/src/xam.rs:204-208` `xam_task_schedule` allocates a handle and returns 0 without ever invoking the callback. Canary `xam_task.cc:43-81` spawns an `XThread` that runs the callback (which typically signals `XTASK_MESSAGE.event_handle`). Sylpheed callsite confirmed at thunk `0x8284dafc` ← `sub_824a9710` (`0x824a9a10`). Likely cause of one or more parked-waiter handles in M10. P0 candidate.
- **XAMBUG-002** — `crates/xenia-kernel/src/xam.rs` async XAM exports (`XamContentCreate`, `XamContentClose`, `XamContentDelete`, `XamContentCreateEnumerator`, `XamContentSetThumbnail`, `XamContentGetCreator`, `XamShowKeyboardUI`, `XamShowDeviceSelectorUI`, `XamShowMessageBoxUIEx`, `XamShowGamerCardUIForXUID`, `XamEnumerate`, `XMsgStartIORequest`, `XMsgStartIORequestEx`) are all `stub_success` and never touch `overlapped_ptr`. Canary completes the overlapped via `CompleteOverlappedImmediate` / `CompleteOverlappedDeferredEx` and returns `X_ERROR_IO_PENDING` (`xam_content.cc:418-422`, `xam_msg.cc:64-67`, `xam_ui.cc:382-389`). Any wait on the overlapped event hangs forever. P0 candidate.
- **XAMBUG-003** — `crates/xenia-kernel/src/xam.rs:45` `XamUserGetSigninState` is `stub_return_zero` (always 0 = "not signed in"). Canary `xam_user.cc:90-104` returns `signin_state` (typically 1 = signed-in offline) when a profile exists. Sylpheed callsite confirmed at thunk `0x8284db3c` ← `sub_824a9c90`. Boot guard `bl XamUserGetSigninState; cmpwi r3,0; beq <bail>` would force the bail branch. P1, possibly P0.
- **XAMBUG-004** — `crates/xenia-kernel/src/xam.rs:232-239` `xam_user_get_xuid` returns `0` (success) with xuid=0. Canary `xam_user.cc:30-67` returns `X_E_NO_SUCH_USER` when the user isn't signed in. P1.
- **XAMBUG-005** — `crates/xenia-kernel/src/xam.rs:241-248` `xam_user_get_name` returns 0 (success) with empty buffer. Canary `xam_user.cc:137-164` returns `X_ERROR_NO_SUCH_USER` when the user isn't signed in. P1.
- **XAMBUG-006** — `crates/xenia-kernel/src/xam.rs:192-200` `XamLoaderLaunchTitle`/`XamLoaderTerminateTitle` return normally with `gpr[3]=0`. Canary `xam_info.cc:380-432` explicitly does not return — calls `kernel_state()->TerminateTitle()`. Sylpheed has 2 callsites for `XamLoaderTerminateTitle`. Returning normally allows the title to keep executing past a fatal-exit path. P1.
- **XAMBUG-007** — `crates/xenia-kernel/src/xam.rs:257-273` `xam_get_execution_id` heap-allocates a 24-byte struct on every call and writes hardcoded `title_id=0x535107D4`, `media_id=0x2D2E2EEB`, `version=0`, `base_version=0`, `disc=1/1`. Canary `xam_info.cc:321-336` writes the *guest pointer to the existing XEX `EXECUTION_INFO` opt-header*. Hardcoded bytes diverge from real header for `version`/`base_version`; per-call leaks. P1.
- **XAMBUG-008** — `crates/xenia-kernel/src/xam.rs:212-228` `xam_alloc` ignores `flags`. Canary `xam_info.cc:434-455` notes `0x00100000` controls zero-fill; canary always uses `SystemHeapAlloc` which zero-fills. Severity depends on whether xenia-rs's `state.heap_alloc` zero-fills: P1 if not, P2 if yes.
- **XAMBUG-009** — `crates/xenia-kernel/src/xam.rs:73-74` `XamUserCreateAchievementEnumerator` and `XamUserCreateStatsEnumerator` are `stub_success` and don't fill `*handle_ptr`. Canary `xam_user.cc:580-647` and `:1025-1059` create real `XEnumerator` objects. Game reads stale memory as the handle; subsequent `XamEnumerate` returns `0x12` only by happy coincidence. P2.
- **XAMBUG-010** — `crates/xenia-kernel/src/xam.rs:77-82` UI dialog exports (`XamShowSigninUI`, `XamShowKeyboardUI`, `XamShowDeviceSelectorUI`, `XamShowGamerCardUIForXUID`, `XamShowDirtyDiscErrorUI`, `XamShowMessageBoxUIEx`) are all `stub_success` and never write `result_ptr->ButtonPressed`. Canary fills the result and completes overlapped (`xam_ui.cc:322-419`). Game reads stale ButtonPressed → may take wrong dialog branch. P2.
- **XAMBUG-011** — `crates/xenia-kernel/src/xam.rs:305-307` `XGetAVPack` returns `0x16` (=22), outside canary's documented range 0..8. Canary `xam_info.cc:35-46` defaults to `8` (HDMI). Comment in `xam_info.cc:248-251` warns games may PAL-check against `{3,4,6,8}` — `0x16` matches none. Recommend changing to `8`. P2.
- **XAMBUG-012** — `crates/xenia-kernel/src/xam.rs:50` `XamEnumerate` returns `0x12` (`ERROR_NO_MORE_FILES`). Canary `xam_enum.cc:25-32` returns `X_ERROR_INVALID_HANDLE` for unknown handle and `WriteItems` for valid ones. xenia-rs is "convenient happy path" only because XAMBUG-009 means no real handle exists. P2.
- **XAMBUG-013** — `crates/xenia-kernel/src/xam.rs:275-277` `XamGetSystemVersion` returns `0x20000000`. Canary `xam_info.cc:229-237` returns `0` with explicit "pretend old" comment; both arbitrary, both `kStub`. Could affect symbol-loading branches in title code. P3.
- **XAMBUG-014** — `crates/xenia-kernel/src/xam.rs:309-311` `XGetGameRegion` returns `0xFF` (8-bit). Canary `xam_info.cc:256-277` returns 16-bit values from a 109-entry country table (e.g. `0x0101` for Japan, `0xFFFF` for "all"). Sylpheed J probably masks fine but the value is structurally wrong. P3.
- **XAMBUG-015** — `crates/xenia-kernel/src/xam.rs:317-328` `XGetVideoMode` writes only 5 fields (20 bytes). Canary's `X_VIDEO_MODE` struct is larger; trailing fields left with stale stack data on the guest side. P3.
- **XAMBUG-016** — `crates/xenia-kernel/src/xam.rs:142-162` (`xam_input_get_state`) only bumps `state.input_packet_number` when `gamepad_key != last_input_bytes`. Fake-pad steady state keys to 0; `packet_number` stays 0 forever. Games that detect "input never changed since startup" via packet_number monotonicity may misbehave. canary increments under similar conditions only on real change; spirit-match. Sylpheed unaffected at boot. P3.
## KRNBUG (M07 — kernel HLE)
Per-milestone consolidated report: [audit-out/m07-kernel-hle.md](audit-out/m07-kernel-hle.md). Sub-reports under `audit-out/m07{a,b,c,d}-*.md` retain local sub-prefixes; master IDs unified below.
### Headline P0 / P1
- **KRNBUG-017 (P0 under `--parallel`)** — Kf-spinlock no-op (KfAcquireSpinLock/Release, KeRaiseIrql, KeLowerIrql). Lockstep tolerates this; `--parallel` allows concurrent guest CS entry → state corruption invisible to existing tests. M07a, exports.rs.
- **KRNBUG-Vd-04 (P0)** — VdSwap bypasses PM4 ring; canary writes Type-0 fetch-constant patch + PM4_XE_SWAP into reserved slot, ours fills NOPs and calls `state.gpu.notify_xe_swap` directly. Most plausible cause of swaps=2→swaps=1 regression. M07c, exports.rs `vd_swap`.
- **KRNBUG-008 (P1)** — ExCreateThread ignores `xapi_thread_startup` parameter. Canary invokes the prologue callback before user entry; we skip it. M07b.
- **KRNBUG-011 (P1)** — ExCreateThread ignores creation_flags bit 0x80 (guest_object return). M07b.
- **KRNBUG-013 (P1)** — ExGetXConfigSetting `stub_success` writes nothing into output buffer; Sylpheed reads garbage stack memory during early boot. M07b.
- **KRNBUG-Mm cluster (P1)** — MmAllocatePhysicalMemoryEx ignores all attribute bits (protect, page_size, range, alignment, WC/NoCache). Pool family entirely unregistered. M07c.
- **KRNBUG-D08 (P1 candidate)** — VSYNC_INSTR_PERIOD = 150_000 calibrated for ~10 MIPS lockstep; under `--parallel` (~24× slower) drops to ~2.5 Hz wall. Plausible swap-regression contributor. M07d, interrupts.rs.
### Other P1 / P2 / P3
77 KRNBUG IDs total filed across M07a/b/c/d. Severity distribution: 3 P0, 11 P1, 28 P2, 35 P3.
Full list and rationale in sub-reports. M07-lead consolidation at `audit-out/m07-kernel-hle.md`. Highlights:
- **Nt/Ke/Kf**: KRNBUG-005 (NtAllocateVirtualMemory ignores flags), KRNBUG-008 sub-prefix-a (NtCreateFile drops desired_access/share/disposition), KRNBUG-014 (DPC family unimplemented).
- **Rtl/Ex**: 35+ canary-table Rtl* ordinals unregistered (KRNBUG-001 sub-prefix-b; needs trace-handles audit to triage), CS stale-owner override (KRNBUG-004 sub-prefix-b).
- **Ob/Mm/Vd**: ObReferenceObjectByName + ObOpenObjectByName + ObTranslateSymbolicLink unregistered, ExFreePool / MmFreePool entirely missing, VdGetCurrentDisplayInformation/VdQueryVideoFlags/VdInitializeScalerCommandBuffer/VdInitializeEngines all stubbed.
- **Xex/misc**: XexCheckExecutablePrivilege always 0, XexGetProcedureAddress ignores string-name path, sprintf/_vsnprintf produce empty buffers (KRNBUG-D12).
## XAMBUG (M08 — XAM)
Per-milestone report: [audit-out/m08-kernel-xam.md](audit-out/m08-kernel-xam.md). 16 XAMBUG IDs.
### XAMBUG-001 (P0 candidate) — XamTaskSchedule never invokes callback
**Location**: `crates/xenia-kernel/src/xam.rs:204-208`. Returns 0 without spawning the task. Canary spawns an `XThread` to run the callback; the callback typically signals an `XTASK_MESSAGE.event_handle`. **Strong candidate for one or several of the 4 parked-waiter handles** (0x1004, 0x100c, 0x15e4, 0x42450b5c). Sylpheed callsite confirmed at `sub_824a9710` / 0x824a9a10.
### XAMBUG-002 (P0 candidate) — 13 async XAM exports never complete overlapped
**Location**: xam.rs Content*, Show*UI, XMsgStartIORequest*, XamEnumerate. All `stub_success` and never call `CompleteOverlappedImmediate` / `Deferred` on `overlapped_ptr`. Any guest wait on the overlapped event hangs.
### XAMBUG-003 (P1, possibly P0) — XamUserGetSigninState returns 0
**Location**: xam.rs. xenia-rs returns 0; canary returns 1 (signed-in offline by default). Sylpheed boot guard would force the bail branch.
### Other 13 XAMBUG IDs
XAMBUG-004..016, mostly P2/P3 cosmetic. Highlight: XAMBUG-016 (P3) packet_number never increments in fake-pad steady state because key stays 0.
## MEMBUG (M06 — memory subsystem)
Per-milestone report: [audit-out/m06-memory.md](audit-out/m06-memory.md). 9 MEMBUG IDs (1 P1, 4 P2, 4 P3).
### Verdict: write-visibility NOT BROKEN
Same-thread store→load through `crates/xenia-memory/src/heap.rs` is mechanically sound. Both paths derive raw `*mut u8`/`*const u8` pointers from the same `membase` mapping; no per-thread cache, no write-back buffer, no block-cache layer that returns stale data bytes (block cache only caches *decoded instructions*, never data). The `bump_page_version` Release-store comes *after* the byte store and is a cross-thread visibility primitive; same-thread program order trivially observes the just-written byte.
**BST paradox** at `sub_82175E68 → sub_82175F10` is OPEN but not a memory bug. Both registrar and validator run on the same HW slot in the same scheduler round. Likely upstream causes: M03 PPCBUG-720..735 (interpreter 32/64-bit truncation bugs) corrupting the comparison feeding the validator, or constructor-side logic in `sub_821766A0`/`sub_825ED268`.
### MEMBUG-003 (P1) — physical-address aliasing across cached/write-combine ranges not implemented
**Location**: `crates/xenia-memory/src/heap.rs`. The 0xA000_0000 (write-back), 0xC000_0000 (write-combine), 0xE000_0000 (uncached) virtual ranges are all distinct mappings in xenia-rs but should alias the same physical memory. Latent risk for any DMA-buffer round-trip; not currently observed to break Sylpheed but is a correctness gap.
### Other MEMBUG IDs
MEMBUG-001..009. Highlights: MEMBUG-002 P2 (MMIO aperture single-bit-mask fast-path doesn't validate against region table on hit), MEMBUG-005 P2 (no protection-fault path; reads of unmapped memory return 0), MEMBUG-007 P3 (Be<T> serde missing round-trip test).
## GPUBUG (M09 — GPU pipeline)
Per-milestone consolidated report: [audit-out/m09-gpu.md](audit-out/m09-gpu.md). Sub-reports under `audit-out/m09{a,b,c}-*.md`. 33 IDs; severity: 6 P0, 12 P1, 8 P2, 7 P3.
### Headline P0
- **GPUBUG-001 (P0)** — VdSwap kernel-bypass: `vd_swap` zero-fills 64-dword reserved ring slot with NOPs and calls `state.gpu.notify_xe_swap` directly. Canary writes Type-0 fetch-constant patch + PM4_XE_SWAP into the slot and lets the CP consume it. PM4_XE_SWAP opcode handler at `gpu_system.rs:1232` is dead code at runtime. **Confirms KRNBUG-Vd-04. Most plausible cause of swaps=2→swaps=1 regression.**
- **GPUBUG-100 / shader-005 (P0)** — operand modifiers (swizzle/abs/neg) never read from word-1 in WGSL interpreter; every ALU instruction executes against unmodified operands.
- **GPUBUG-101 / shader-006 (P0)** — `c#` constant-register selector bit masked off; every shader reads `r[low7]` (temp) instead of constants. WVP matrix etc. never read.
- **GPUBUG-102 / shader-007 (P0)** — vertex fetch never applies GpuSwap endian; big-endian VBs decode as garbage on little-endian host.
- **GPUBUG-103/104/105 / draw-008/009/010 (P0)** — 8 of 26 draw_state register addresses misdecoded: VGT_DRAW_INITIATOR, VGT_DMA_BASE, VGT_DMA_SIZE, PA_SC_WINDOW_SCISSOR_TL/BR (reading SCREEN_SCISSOR), RB_COLOR_INFO_1/2/3, PA_SU_VTX_CNTL, index_size from bit 8 instead of bit 11.
### Headline P1
- **GPUBUG-006 (P1)** — `sync_with_mmio` Relaxed-load on WPTR; broken Release/Acquire pair; latent under `--parallel`.
- **GPUBUG-shader-002 (P1)** — D3D9 legacy `Inf*0=+0` not honored. Canary documents same divergence as causing white-screen in 4D5307E6.
- **GPUBUG-301 (P1)** — `read/write_sample_64bpp` doubles pitch but `surface_pitch_tiles()` already pre-doubles for 64bpp → quadruple stride for 64bpp resolves. Tests bypass `from_register_file` so don't catch this.
- **GPUBUG-304 (P1)** — `bind_primary_texture` hardcodes `version_when_uploaded: 0` so guest writes never invalidate uploaded textures.
- **GPUBUG-305 (P1)** — texture cache missing K1555, K24_8, K_8, K1010102, K10_11_11, `_AS_*` formats; bound to magenta stub.
- Plus 7 more P1 in shader/draw_state region (GPUBUG-106..112).
### Other P2/P3
15 IDs. Highlights: GPUBUG-002 (P2) PM4 type-3 coverage 35/47 not 47/47 as memory file claimed — missing COND_EXEC, WAIT_REG_EQ, WAIT_REG_GTE, EVENT_WRITE_CFL plausibly hit by Sylpheed; GPUBUG-302 (P2) RenderTargetKey::is_64bpp returns wrong format set; GPUBUG-303 (P2) CPU-side TextureCache::ensure_cached is dead code.
### Verdict
**Renderer-blocker explanation**: The GPU pipeline is structurally wrong at multiple stages (shader operand decode + constant selector + vertex endian + 8 register addresses + VdSwap bypass). `draws=0` and swap regression both fall out of this class of failure. Combined fix queue: GPUBUG-001 + GPUBUG-100..105 must land together — partial fixes likely won't unblock visible rendering.
## XMODBUG (M10 — cross-module seams)
Per-milestone consolidated report: [audit-out/m10-cross-module.md](audit-out/m10-cross-module.md). Sub-reports under `audit-out/m10-x{1..5}-*.md`. 22 IDs; severity: 1 P0, 6 P1, 5 P2, 10 P3.
### Headline P0
- **XMODBUG-013 (P0)** — Missing fetch-constant patch in VdSwap. Re-confirms KRNBUG-Vd-04 / GPUBUG-001 from the seam perspective. Frontbuffer slot 0 retains stale texture descriptor; Sylpheed bloom/blur path reads garbage. Strongest single P0 cause of swap regression.
### Headline P1
- **XMODBUG-001 (P1)** — `stwcx`/`stdcx` data write happens AFTER `try_commit` clears the slot. Race window: another HW thread can lwarx the cleared slot, read pre-write data, and commit. Latent under `--parallel`.
- **XMODBUG-002 (P1)** — `GuestMemory::write_bulk` (used by `NtReadFile` and XEX loader) skips both `bump_page_version` and reservation invalidation. Latent if any code-bearing memory is bulk-written.
- **XMODBUG-010 (P1)** — `CP_INT_STATUS` never produced from GPU side; only synthetic vsync interrupts ever reach the kernel. Real CP-side events (EOP, RSC, IB-end) missing.
- **XMODBUG-011 (P1)** — `VSYNC_INSTR_PERIOD` fragile proxy. Re-confirms KRNBUG-D08 from seam perspective.
- **XMODBUG-012 (P1)** — `notify_xe_swap` synthetic interrupts displace real CP interrupts in 4-deep queue.
### Other P2/P3
15 IDs. Notable:
- **XMODBUG-005 (P2)** — `nt_close` on a handle with parked waiters silently strands them.
- **XMODBUG-003 (P2)** — no MemoryBarrier around reserved ops; latent on non-x86 hosts.
- **XMODBUG-021 (P2)** — WaitAll partial-satisfaction false-wake (semantic gap, not a race).
- **XMODBUG-022 (P2)** — force-wake path doesn't scrub waiter lists like timed-wake path does.
### Verdict
The renderer plateau and swap regression are explained by a **multi-causal failure** at the GPU pipeline + kernel-↔-GPU seam. Combined fix queue: KRNBUG-Vd-04 / GPUBUG-001 / XMODBUG-013 (VdSwap rewrite to write real PM4 sequence) + GPUBUG-100..102 (shader operand decode + constant-register selector + vertex fetch endian) + GPUBUG-103..105 (8 register addresses) must land coherently to unblock visible rendering.
The 4 parked-waiter handles remain unexplained at this audit's depth. M11 follow-up should run the `--trace-handles` audit at -n 5B and pivot to PPC-level trace if no signal exists.
## SWAPBUG (M11 — swap-regression bisection)
Per-milestone report: [audit-out/m11-runs.md](audit-out/m11-runs.md).
### SWAPBUG-001 (P0) — PPCBUG-001 addi 32-bit truncation regresses swaps=2 → 1
- **Severity**: P0 — direct cause of the headline `swaps=2 → 1` regression that motivated this entire audit.
- **Status**: open (audit-only; fix decision left to follow-up).
- **Location**: `crates/xenia-cpu/src/interpreter.rs:114-118` — the single `as u32 as u64` cast at the end of the `addi` opcode arm.
- **Bisection trail**:
- Phase-level: pre-P1/P1/P2/P3 → swaps=2; **P4/d945aea** → swaps=1.
- Internal P4 commits: `145a7a4` → swaps=2; **`bf8208e`** ("PPCBUG-001/002/003/004/005/007 4b immediate ALU truncation") → swaps=1.
- Hunk-level (revert each PPCBUG individually within bf8208e): only **PPCBUG-001 revert restores swaps=2**. PPCBUG-002/003/004/005/007 reverts leave swaps=1.
- **Mechanism**: addi is the most common opcode (282k uses, 3.4% of all instructions in sylpheed.db). Adding `as u32 as u64` to its writeback truncates the upper 32 bits of the result. Sylpheed has at least one control-flow site that depends on the un-truncated 64-bit value.
- **Cross-references**: confirms M03 PPCBUG-720 prediction ("addi/addic/subfic truncate to 32 bits without canary parity"). The fix is canary-divergent — canary does NOT truncate addi.
- **Recommendation**: revert the addi truncation. Re-examine the test `addi_li_neg_one_zero_extends_upper` to assert canary semantics, not the over-truncated form. Independently re-examine the addis truncation (which IS deliberate per the addis fix memory file but may have its own broader implications).
### SWAPBUG-002 (P2) — PPCBUG-004 mulli truncation affects IRQ delivery anomalously
- **Severity**: P2 — anomalous side effect, not blocking.
- **Status**: open.
- **Location**: `crates/xenia-cpu/src/interpreter.rs` mulli arm (changed in `bf8208e`).
- **Symptom**: reverting mulli truncation alone (on top of bf8208e) drops interrupts_delivered from 629 to 101 at -n 100M lockstep. Swaps stays at 1. The OPPOSITE direction from SWAPBUG-001.
- **Mechanism (hypothesis)**: a mulli result is consumed by an instruction-count or frame-count computation that controls vsync injection target selection or some early-boot loop iteration count.
- **Recommendation**: no immediate action; investigate as part of M07d KRNBUG-D08 / XMODBUG-011 vsync-timing audit.
## ANLBUG (M11 — analysis crate)
### ANLBUG-001 (P2) — `xenia-rs dis` does not create SQL views by default
- **Severity**: P2 — feature mismatch between tests and CLI.
- **Status**: open.
- **Location**: `crates/xenia-app/src/main.rs:3189` — `w.create_sql_views()` is gated on `--analyze=Sql` or `--analyze=Both`. Default is `Rust`, which skips view creation.
- **Symptom**: regenerated `sylpheed.db` has none of the application views (`v_branch_xrefs`, `v_call_graph`, `v_function_first_instruction`, `v_imports_called`, `v_reachability_from_entry`). The schema-golden test creates them; the user-facing CLI does not.
- **Cross-reference**: ORACBUG-005 (M01) — schema test uses synthetic 4-instr PE; doesn't catch this gap.
- **Recommendation**: either always create views in `--db` mode, or document the requirement clearly in CLI help.
---
## Fix session 2026-05-03 — outcome
Single-session fix sprint executed against this audit's recommended
queue. 12 IDs closed across 11 commits + 9 merge commits on master.
Branch lineage: each phase a topic branch, merged with `--no-ff` to
preserve hunk-bisect lineage; all branches deleted post-merge.
| Phase | Commit | IDs closed | Severity | Notes |
|-------|--------|------------|----------|-------|
| A | `9ab986e` | SWAPBUG-001 / PPCBUG-001 | P0 | addi 32-bit truncation revert. swaps 1→2 confirmed. |
| B | `1f416aa` | ORACBUG-004 (partial: ORACBUG-006) | P0 | sylpheed_n50m stable-digest golden + `--stable-digest` CLI flag. n4b deferred (canonical invocation pathologically slow per audit). |
| C | `82f3d61` | KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013 | 3× P0 | VdSwap PM4 ring path (writes Type-0 fetch-constant patch + Type-3 PM4_XE_SWAP into ring memory at WPTR). Direct `notify_xe_swap` retained as idempotent safety net. |
| D1 | `78ea81c` | GPUBUG-101 | P0 | ALU src1/2/3_sel temp-vs-constant selector decoded from word-0 bits 29-31. |
| D2 | `c5c6713` | GPUBUG-100 (abs deferred) | P0 | per-operand component-relative swizzle + negate decoded from word-1. abs flag (dual-meaning bit 7 / word-2) intentionally deferred. |
| D3 | `ec2d955` | GPUBUG-102 | P0 | per-format `gpu_swap` endian byte-swap on vertex fetch (kNone/k8in16/k8in32/k16in32). |
| E | `8723d68` | GPUBUG-103, GPUBUG-104, GPUBUG-105 | 3× P0 | 8 register addresses re-validated against canary `register_table.inc`; index_size bit 8→11; PA_SU_VTX_CNTL 0x2083→0x2302. |
| F1 | `e7d0fcf` | KRNBUG-017 | P0-under-parallel | Kf*SpinLock + KeReleaseSpinLockFromRaisedIrql + KeTryToAcquireSpinLockAtRaisedIrql now write the lock value to guest memory. |
| G1 | `8fc1b1d` | GPUBUG-006 | P1 | sync_with_mmio Acquire/Release pairs the producer-side Release at mmio_region.rs:78. |
| G2 | `780e854` | XMODBUG-002 | P1 | GuestMemory::write_bulk now bumps page_versions for every page it touches. |
### Headline outcome
| Metric | Pre-sprint | Post-sprint | Goal | Met? |
|-----------------------|-----------:|------------:|-----:|------|
| `swaps` (-n 100M) | 1 | 2 | ≥2 | ✅ |
| `draws` (-n 100M) | 0 | 0 | >0 | ❌ (multi-causal — see below) |
| Tests passing | 551 | 556 | ≥551 | ✅ |
| Renderer plateau | locked | partially unblocked | unblocked | partial |
The audit's central prediction — **Phases C+D+E together unlock
`draws > 0`** — was not met empirically at -n 100M lockstep. The
plateau persists because:
- `shader_blobs_live` stays at 0 after 100M. The game has not yet
issued IM_LOAD; resource-loader worker threads are still parked.
- The audit's parked-waiter analysis (`project_xenia_rs_audit_2026_05_02.md`,
4 handles 0x1004 / 0x100c / 0x15e4 / 0x42450b5c) remains
unresolved. Phase F1 (Kf-spinlock) lands but doesn't unblock
these handles; XAMBUG-001 was ruled out by M10-X2.
### Phases attempted but deferred
- **F2 (XAMBUG-001 XamTaskSchedule callback spawn)**: per audit
M10-X2, ruled out as the parked-waiter cause. Bug is real but
doesn't move the renderer-plateau needle within this sprint.
Implementing the XThread spawn for the callback is moderate
complexity (~45 min); deferred to a follow-up session.
- **F3 (XAMBUG-002 overlapped completion helper)**: requires new
infrastructure (`KernelState::complete_overlapped`) plus wiring 13
async XAM stubs. Substantial. Deferred.
- **G2 (KRNBUG-D08 / XMODBUG-011 VSYNC wall-clock)**: switching from
instruction-count proxy to wall-clock would destabilize the
lockstep digest's `interrupts_delivered` field (which the existing
full-digest sylpheed_n2m oracle still tracks). Deferred to allow
paired oracle-update.
- **G3 (PPCBUG-720/721/722 addic/addic./subfic revert)**: verified
canary directly (`xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc:117-136`)
— canary uses **full 64-bit add with sign-extended immediate**,
not the "i32 → i64 → u64" path the Plan agent suggested. The
current xenia-rs 32-bit ABI workaround is plausibly correct for
Xbox 360 user mode (per the addis pattern). The "PPCBUG" label
may itself be wrong; defer until canary semantics are
re-confirmed against a known-good Sylpheed code-path trace.
- **KRNBUG-Mm cluster** (P1 sweep): substantial implementation
work (proper protect/page_size/range honoring in
MmAllocatePhysicalMemoryEx; per-heap offsets in
MmGetPhysicalAddress; real Mm tracking for
MmFreePhysicalMemory). Deferred.
### Sprint acceptance criteria
| # | Criterion | Met? |
|---|-----------|------|
| 1 | Phase A: SWAPBUG-001 reverted, swaps=2 confirmed | ✅ |
| 2 | Phase B: sylpheed_n50m + n4b goldens | ✅ partial — n50m landed; n4b deferred (perf) |
| 3 | Phases C+D+E: 100M lockstep produces `draws > 0` | ❌ multi-causal |
| 4 | Phase F: ≥1 of 4 parked-waiter handles signals | ❌ — F1 alone insufficient |
| 5 | Phase G: ≥3 P1 groups landed | ❌ partial — 2 landed (G1, G2-XMODBUG-002) |
| 6 | `cargo test --workspace --release` ≥557 | ❌ — 556 (off by 1; new sylpheed_oracles is ignore-gated) |
| 7 | audit-findings.md marked applied | ✅ this section |
| 8 | Memory file updated | ✅ (separate file) |
| 9 | Workspace clean; no skipped/ignored tests added | ⚠ — sylpheed_n50m is `#[ignore]` per design (3-min run) |
| 10 | All work merged to master | ✅ — no dangling branches |
### Recommended next session
1. **Investigate parked-waiter handles directly** at -n ≥4B with
`--trace-handles`. The audit's hypothesis is that one of the
4 handles' producer never fires; pinpoint the producer code-path
to identify the missing kernel-side signal.
2. **Phase G2 + matching n2m oracle re-baseline**: switch VSYNC to
wall-clock and re-baseline interrupts_delivered together as a
single commit pair.
3. **F2/F3** if appetite is there for new XAM infrastructure;
non-zero chance one of the unblocked completions is the missing
producer for one of the 4 parked handles.
4. **Resume KRNBUG-Mm cluster** for proper memory-protect /
range / per-heap honoring; required before canary-disambiguating
the addic/subfic class (canary semantics are a 64-bit add against
guest memory the Mm layer doesn't fully model yet).
---
## Follow-up session 2026-05-03 — outcome
Three audit IDs closed across 3 commits, merged to master with `--no-ff`.
HEAD: `8668550`. Tests: 556 → 561 (+5 from new wall-clock + ghost-trail tests).
### Audit IDs landed
| ID | Commit | Description |
|---|---|---|
| **GPUBUG-DRAIN-001** | `7a1b6b3` | VdSwap PM4 fallback warning silenced under `--parallel`. New `drain_until_wptr(target, time_budget)` mirrors canary's `WorkerThreadMain` predicate; vd_swap skips PM4 ring injection (unreliable when ring backs up under --parallel) and uses direct `notify_xe_swap`. The slot-0 fetch-constant patch is deferred (GPUBUG-FETCH-PATCH-001). DrainFence handler publishes the digest mirror before reply (was racing the CPU's post-drain digest_snapshot read). |
| **KRNBUG-AUDIT-001** | `d1105aa` | Diagnostic instrumentation: `--trace-handles-focus=<LIST>` flag + per-handle DIAGNOSIS report. `record_signal` falls through to ghost-trail capture for focused handles even when no `record_create` exists. Producer-class classification (GuestExport / KernelInternal). Distinguishes "guest never tried" from "signal landed but missed waiter" in one run. |
| **KRNBUG-D08** | `27d3608` | V-sync wall-clock under `--parallel`. Lockstep stays on the deterministic instruction-count proxy (sylpheed goldens unchanged). `--parallel` switches to wall-clock via `tick_vsync_wallclock`, raising delivered v-syncs from ~2 → 17 at -n 30M. INTERRUPT_QUEUE_CAP=4 still bottlenecks burst delivery. |
### Parked-waiter producer-trace finding
Empirical run at -n 500M lockstep with the new
`--trace-handles-focus=0x1004,0x100c,0x15e4,0x42450b5c`:
```
handle=0x00001004 kind=Event/Manual waiters=1 signaled=false
signal_attempts=0 (primary=0, ghost=0) waits=1 wakes=0
created cycle=0 tid=1 lr=0x824a9f6c src=NtCreateEvent
timeline: cycle=0 tid=10 lr=0x824ac578 src=do_wait_single[wait]
GuestExport=0 KernelInternal=0 waits=1
=> producer is a missing kernel signal source (or BST-paradox upstream)
```
Same shape for 0x100c and 0x15e4. 0x42450b5c shows `<UNCREATED>` +
`<AUDIT_BLIND>` (waiter parked via a non-`do_wait_single` path).
**Conclusion**: hypothesis (A) confirmed for 3 of the 4 handles. The
producer code path is genuinely missing — NO Nt/KeSetEvent /
KePulseEvent / KeReleaseSemaphore call EVER targets these handles
during 500M instructions of execution. The PPC-vs-Rust traversal
paradox (BST-bug from `project_xenia_rs_sylpheed_event_chain_2026_04_29`)
is **NOT** the cause for these specific handles. The 3 handles share
the same creator (lr=0x824a9f6c, tid=1, all at cycle=0) and the same
wait-call wrapper (lr=0x824ac578) — likely 3 sibling worker threads
all waiting for "work to do" notifications that never come. Most
likely producer-class candidates for next session:
- File I/O completion (`signal_io_completion_event`) — currently a
real implementation but possibly never reached; trace `NtReadFile`
paths to see if completion events would target these handles.
- XAM async task completion — F2/F3 deferred from prior sprint.
- Audio buffer-complete — `XAudioRegisterRenderDriverClient` is a
one-shot stub.
- Timer DPCs — `KeSetTimer` real impl but APC delivery may be
routing wrong.
### Acceptance criteria
| # | Criterion | Met? |
|---|-----------|------|
| 1 | Phase 1: zero "PM4_XE_SWAP not consumed" warnings under canonical invocation | ✅ |
| 2 | Phase 2: per-handle DIAGNOSIS for all four parked handles | ✅ |
| 3 | Phase 3: vsync rate restored under --parallel; n2m golden untouched | ✅ partial — rate up but FIFO cap=4 still bottlenecks |
| 4 | cargo test ≥556 | ✅ 561 |
| 5 | All work merged to master | ✅ |
| 6 | **STRETCH** ≥1 of 4 handles signals | ❌ — but data-driven hypothesis fail-fast tells us why (producer missing, not wake-eligibility bug) |
| 7 | **STRETCH** draws > 0 at -n 100M lockstep | ❌ — gating remains parked-waiter handles |
### Recommended next session
1. **Producer hunt** for the 3 Event/Manual handles. With the
diagnostic baked in, a focused hunt: identify the guest function
at `lr=0x824ac578` (the shared wait-call wrapper), walk its
callers, find what kernel signal source SHOULD be wired for each
handle. Likely starting points: file I/O completion
(`signal_io_completion_event`), XamTaskSchedule callback (F2),
XAudio buffer-complete.
2. **Raise INTERRUPT_QUEUE_CAP** for `--parallel` workloads — the
3044 dropped vsyncs at -n 30M --parallel suggest the FIFO is the
next bottleneck.
3. **F2/F3** (XAM async completion) per the still-deferred list,
especially if Phase 2 of next session pinpoints a missing XAM
producer.
4. **GPUBUG-FETCH-PATCH-001**: re-enable the PM4_TYPE0
fetch-constant patch via a side-channel (GpuCommand variant)
when draws actually start firing — relevant for bloom/blur N+1.
## Producer-hunt session 2026-05-03
### XAMBUG-PRODUCER-001 — XamTaskSchedule was a no-op stub
**Status:** fixed. Hypothesis falsified for the parked-waiter set.
**Site:** `crates/xenia-kernel/src/xam.rs:204` (pre-fix).
**Canary parity:** `xenia-canary/src/xenia/kernel/xam/xam_task.cc:43-80`.
The pre-fix stub allocated a handle, logged it, and returned
`STATUS_SUCCESS` — it never spawned a thread. Replaced with a
canary-faithful implementation: allocates a `ThreadImage`, allocates
a `KernelObject::Thread` handle, and routes through
`Scheduler::spawn` with `entry=callback`, `start_context=message_ptr`
(canary's third positional `XThread` arg). Stack sized as
`max(0x4000, page-aligned 0x10_0000)`.
**Verification:**
- Unit test `xam::tests::xam_task_schedule_spawns_real_thread`
confirms the spawned thread's `pc == callback` and `gpr[3] == message_ptr`.
- Workspace tests: 561 → 562 green.
- `--stable-digest -n 100M` lockstep: `instructions=100000002`
unchanged from baseline (interpreter determinism preserved).
- `--trace-handles-focus=0x1004,0x100c,0x15e4 -n 500M`: no
`kernel.calls{name=XamTaskSchedule}` counter appears — the call
site at `0x824a9a10` is **never reached** within 500M
instructions. Boot stalls earlier on the parked handles.
**Outcome:** the 3 focus handles still show
`signal_attempts=0 (primary=0, ghost=0)` after 500M instructions.
The XAM-task hypothesis is therefore **falsified for this run** —
XamTaskSchedule cannot be the missing producer for these specific
handles, because Sylpheed's only call site to it isn't reached
before the deadlock.
The fix lands regardless: the stub was a real correctness bug that
will manifest the moment the call site is reached (post-deadlock-resolution).
### Recommended next producer candidate
`XAudioRegisterRenderDriverClient` (currently a one-shot stub, called
once per the metric counter). Audio buffer-complete callbacks are a
known signal source on Xbox 360 audio engines; the stub may be
hiding the producer for one of the 3 handles. If that lead is also
falsified, escalate to file I/O completion (`signal_io_completion_event`
already real but possibly mis-routed) or Timer DPC delivery.
### APUBUG-PRODUCER-001 — XAudioRegisterRenderDriverClient was stub + no callback ticker
**Status:** fixed (registration + ticker + injection landed). Hypothesis
falsified for handles `0x1004` / `0x100c` / `0x15e4`.
**Site:** `crates/xenia-kernel/src/exports.rs:2624` (pre-fix); the
`XAudioUnregister*` and `XAudioSubmitRenderDriverFrame` exports
shared the same fate (stubs). New module: `crates/xenia-kernel/src/xaudio.rs`.
**Canary parity:**
- `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_audio.cc:56-93`
(the three exports — register reads `callback_ptr[0..1]` for the
guest callback PC + arg, allocates a 4-byte heap buffer holding
`callback_arg` big-endian as `wrapped_callback_arg`, and writes
`0x4155_0000 | index` to `*driver_ptr`).
- `xenia-canary/src/xenia/apu/audio_system.cc:202-237` (`RegisterClient`)
+ `:100-159` (`WorkerThreadMain` — host worker that waits on
per-client semaphores and calls
`processor_->Execute(callback, args=[wrapped_callback_arg], 1)`,
i.e. r3 = wrapped pointer).
- `xenia-canary/src/xenia/apu/xaudio2/xaudio2_audio_driver.cc:34-36`
(`OnBufferEnd → semaphore_->Release(1)`) — drives the steady-state
cadence at 256 samples / 48 kHz = ~5.33 ms.
**Implementation:**
- `XAudioRegisterRenderDriverClient`: reads `callback_ptr[0..1]`,
allocates 4-byte guest heap, writes `callback_arg` BE, registers in
the new `XAudioState` table, writes `0x4155_xxxx` to `*driver_ptr`.
- `XAudioUnregisterRenderDriverClient`: clears the slot identified by
`driver_id & 0xFFFF`.
- `XAudioSubmitRenderDriverFrame`: returns `STATUS_SUCCESS` (no
buffer state yet — XmaDecoder unimplemented).
- `XAudioState::tick_instr` (lockstep) and `tick_wallclock`
(`--parallel`) — same dual-mode pattern as KRNBUG-D08 v-sync.
`XAUDIO_INSTR_PERIOD = 48_000` and `XAUDIO_PERIOD = 5.333 ms`
approximate canary's frame rate.
- `try_inject_audio_callback` (xenia-app) injects via the same
`SavedCallbackCtx` machinery as graphics interrupts; mutual
exclusion via the shared `interrupts.saved` slot. r3 is set to
`wrapped_callback_arg` per canary `processor_->Execute`.
**Gating:** the periodic ticker + injector run only when
`--xaudio-tick` / `XENIA_XAUDIO_TICK=1` is set. Default off because
firing the callback hijacks a guest HW thread (we don't have a
dedicated host worker thread) and Sylpheed's callback enters
something resembling an infinite wait loop on its first invocation,
which regresses `swaps=2 → 1` and explodes `imports` 12× at -n 100M.
Default-off preserves all existing lockstep goldens
(`sylpheed_n50m.json` etc.).
**Verification:**
- Workspace tests: 562 → 576 green (10 in `xaudio.rs` + 4 in
`exports.rs`).
- `--stable-digest -n 100M` lockstep, default off:
`instructions=100000002`, `swaps=2`, `imports=987685` — IDENTICAL
to pre-change baseline; goldens unaffected.
- `--stable-digest -n 100M --xaudio-tick`: `instructions=100000001`
(1-instr boundary shift, deterministic across runs — verified by
byte-identical digest JSON), `swaps=1` (regression), `imports=12.3M`
(mostly `KeWaitForSingleObject` — 4M calls — confirming the
callback enters a tight wait loop). 1 audio callback fires
(`xaudio.callback.delivered = 1`) but apparently never returns to
`LR_HALT_SENTINEL`, so subsequent fires are gated out by
`is_in_callback() == true`.
- `--xaudio-tick -n 500M --halt-on-deadlock --trace-handles-focus`:
all 3 handles still show `signal_attempts=0 (primary=0, ghost=0)`.
**Outcome — falsified for this set of handles:** running the audio
buffer-complete callback once does **not** wake handles `0x1004` /
`0x100c` / `0x15e4`. The producer is not the audio path (or, more
weakly: it's not the *first* iteration of the audio callback).
**Side effects worth noting for the next session:**
1. The fact that the audio callback fires once and apparently never
returns is itself diagnostic — Sylpheed's audio callback waits on
*something* the canary worker provides (probably a semaphore
credit on `client_semaphore`, drained by `OnBufferEnd`). Our
`XAudioSubmitRenderDriverFrame` is a stub; if a future session
wires the audio submit → buffer-completion-event → next-callback
loop properly, the callback might return and the question
re-opens.
2. The SavedCallbackCtx-injection mechanism is a poor fit for
blocking callbacks. Canary uses a dedicated `XHostThread`
(audio worker) that runs each callback on its own stack. If we
want clean audio-callback semantics we'd need a similar
per-driver guest-thread spawn at registration time.
### Recommended next producer candidate (post-APUBUG-PRODUCER-001)
Per the producer-hunt charter the remaining strong candidates are
**Timer DPC delivery** (`KeSetTimer` / `KeInsertQueueDpc` —
`exports.rs` has stubs/partials) and **file I/O completion event
routing**. Timer DPC is the next-strongest because the parked
handles are explicit `Event/Manual`s with no current waker, and
Xbox 360 timer-driven DPCs are a common signal source.
### KRNBUG-AUDIT-002 — multi-frame stack capture at handle creation
**Status:** landed (diagnostic only; no behaviour change). Walker
verified end-to-end against the analysed call graph for every
captured frame.
**Site:** `crates/xenia-kernel/src/audit.rs` (new
`record_create_with_stack`, new `created_stack: Vec<(u32,u32)>` on
`HandleAuditTrail`); `crates/xenia-kernel/src/state.rs` (new
`audit_create_with_ctx` helper + free function
`walk_guest_back_chain(sp, lr, mem, max)`); `nt_create_event` /
`nt_create_semaphore` / `nt_create_timer` / `xam_task_schedule` now
route through the new helper. Dump in `crates/xenia-app/src/main.rs`
prints `created stack (N frames)` under the per-handle FOCUS report.
**Why it exists:** KRNBUG-AUDIT-001 told us the producer is missing
for handles `0x1004` / `0x100c` / `0x15e4` (later corrected to
`0x15e0` — see below) but couldn't tell us *which subsystem owns
each handle*. The wrapper at `lr=0x824a9f6c` is the same
`silph::Event` ctor for 83 unique callers, so the immediate LR is
useless for subsystem identification. The new walker captures up to
6 stack frames at create time, gated on the focus set so the cost
is one `HashSet::contains` on the unfocused hot path.
**Walker correctness:** PPC EABI back-chain (`[r1] = prev_sp`,
saved-LR-of-prev-frame at `[prev_sp - 8]`). Frame 0 is the live
`(ctx.gpr[1], ctx.lr)` since the wrapper hasn't spilled its own LR
yet. Sentinels: 0, 0xFFFFFFFF, self-loop. Read-only via
`MemoryAccess::read_u32` — guest memory and CPU state are not
mutated, so lockstep determinism is unaffected.
**Verification:**
- Workspace tests: 576 → 581 (+5: 2 new in `audit.rs` exercising the
`record_create_with_stack` path including the disabled-no-op case;
3 new in `state.rs` exercising synthetic 3-level back-chain,
self-loop sentinel, zero sentinel).
- `--stable-digest -n 50M` lockstep oracle (`sylpheed_n50m`):
bit-identical to checked-in golden (re-confirmed twice).
- End-to-end: every captured frame's saved-LR matches a `bl`
instruction one address earlier in the named function (cross-checked
against `sylpheed.db`'s `instructions` table for all 18 captured
PCs across handles `0x1004` / `0x100c` / `0x15e0`).
### Producer-trace finding (KRNBUG-AUDIT-002 deliverable)
Run: `exec sylpheed.iso --halt-on-deadlock --trace-handles-focus=0x1004,0x100c,0x15e0,0x42450b5c -n 500_000_000`.
**0x1004 (tid=10 waiter):** static C++ ctor → 8-instance pool
```
[0] sub_824A9F18 +0x54 silph::Event ctor wrapper (83 callers)
[1] sub_821783D8 +0x120 per-instance subsystem-init (RtlInitializeCSAndSpinCount + Event ctor)
[2] sub_8217C850 +0x58 single per-pool-element bridge ctor
[3] (no func) +0x14 static ctor at 0x8280F810; calls sub_8217C850 EIGHT times
[4] sub_824ACB38 +0xb8 the CRT static-init driver (walks 0x82870010..0x828708d4)
[5] entry_point +0x60 the standard CRT entry stub
```
The 8-instance call from frame 3 is the smoking gun: `0x8280F810`
is a single C++ static constructor that builds an 8-element array
of objects, each of which gets its own Critical Section + Event +
worker thread. This is a **thread pool**, constructed before
`main()` runs.
**0x100c (tid=2 waiter):** runtime init in `main()` → singleton
```
[0] sub_824A9F18 +0x54 silph::Event ctor wrapper
[1] sub_82181750 +0x70 per-instance subsystem-init (same shape: CS + Event)
[2] sub_821800D8 +0x3c single-call bridge ctor
[3] sub_82181C20 +0x38 subsystem driver
[4] sub_8216EA68 +0x3c (top-level main; called from entry_point + 0x194 with r3=r4=r5=0)
[5] entry_point +0x198 right after `bl 0x8216EA68`
```
Different code cluster (`0x82181xxx`), single instance, constructed
**inside `main()` itself** — not from C++ static init. This is a
runtime-allocated singleton subsystem.
**0x15e0 (tid=16 waiter):** runtime init via a third distinct cluster
```
[0] sub_824A9F18 +0x54 silph::Event ctor wrapper
[1] sub_821701C8 +0x48 per-instance subsystem-init (CS + Event, callees mirror 0x100c's path)
[2] sub_8216F618 +0x44 bridge
[3] sub_821707C0 +0x38 driver
[4] (no func) +0x? 0x821C5418 — analyser missed this function entry
[5] sub_82172BA0 +0x1ec upper-level subsystem driver
```
Third distinct C++ class in cluster `0x82170xxx`. Same per-instance
shape (CS + Event + worker thread); different call site than 0x100c.
**Cross-check on the project memory list:** the prior memory listed
the third handle as `0x15e4`; the actual handle on this run is
`0x15e0` (off-by-4 in the prior session's transcription). The
parked-waiter set as of HEAD `9d45efe` is:
| Handle | Tid | Waits via | Trail status | Note |
|--------|-----|-----------|--------------|------|
| 0x1004 | 10 | `do_wait_single` | primary | static-ctor pool (8 entries) |
| 0x100c | 2 | `do_wait_single` | primary | runtime singleton |
| 0x15e0 | 16 | `do_wait_single` | primary | runtime singleton, distinct class |
| 0x12f4 | 13,14 | `do_wait_single` | primary | shared Semaphore — 2 waiters |
| 0x15f8 | 18 | `do_wait_multiple` | primary | Event/Auto |
| 0x1038 | 4 | `do_wait_multiple` | primary | Event/Auto, in WaitAny[0x1038, 0x103c] |
| 0x10b0 | 5 | `do_wait_multiple` | primary | Event/Auto, in WaitAny[0x10b0, 0x10b4] |
| 0x42450b5c | 6 | (non-`do_wait_single`) | `<UNCREATED>` `<AUDIT_BLIND>` | guest-memory addr (heap range), bypasses Nt-side audit entirely |
**0x42450b5c is qualitatively different.** Address `>= 0x40000000`
is the guest user heap range, not a kernel handle table value
(which start at 0x1000 and increment by 4). tid=6 is parking on a
guest pointer — almost certainly an embedded `KEVENT` reached via
`KeWaitForSingleObject(*PDISPATCHER_HEADER)` rather than via a
handle. Our audit didn't see the wait either (`waits=0` while
`waiter_count=1`), so the wait path itself bypasses
`do_wait_single`. Treat as a separate bug class.
### Subsystem identification
All three Event/Manual creators (sub_821783D8, sub_82181750,
sub_821701C8) follow the **identical 4-callee pattern**:
1. `RtlInitializeCriticalSectionAndSpinCount` (init the per-instance CS)
2. `sub_824A9F18` (the silph::Event ctor wrapper → `NtCreateEvent`)
3. + 1-2 silph internal helpers (`sub_82172370`, `sub_824AA3E0`,
`sub_8217E948`, `sub_82274C70`, etc.) which initialize a queue
and spawn a worker thread
Each parked worker does the same prologue: `silph::Thread::SetProcessor(CURRENT, 5)`
(via `sub_824AA658(r3=-2, r4=5)`), then either an `lwarx`/`stwcx`
CAS-spinlock + `RtlEnterCriticalSection` + check for queued work.
This is the canonical **work-queue worker pattern**: a producer
posts a message to an instance's queue (under the CS) and signals
the Event; the worker wakes, drains, parks again. The producer
that should call `Nt/KeSetEvent(handle)` is **never executed**
within 500M instructions for any of the 3 handles.
The PE's RTTI string table lists thread-related class names in the
`SilpheedSCS` namespace: `WorkHudThread2`, `WorkHudThreadTaskCaller`,
`COLLISION_THREAD_PARAM`, `UPDATER_THREAD_PARAM`, `CRenderCommandQueue`,
`CTaskUpdater`. The 8-element static-init pool for 0x1004 most
plausibly maps to one of the multi-instance worker classes in this
list (likely `WorkHudThread2` family — the only `Thread`-suffixed
multi-instance candidate); the singletons 0x100c and 0x15e0 most
plausibly map to two of `CTaskUpdater` / `CRenderCommandQueue` /
similar singletons. Without a live debugger pass to read the
vtable+RTTI block at the `this` pointer of each worker, the exact
class assignment is heuristic.
### Recommended next session — surgical producer hunt
The producer for each Event is the **call site that owns the
matching message-push code path**. Steps:
1. **For each Event handle (0x1004, 0x100c, 0x15e0)**, dump the
first 12 bytes of the `this` pointer to read the vtable address
(the `this` is in `r3` at the worker's ABI entry — captured in
the wait diagnostic as the first arg). Then read vtable[-1] to
resolve the RTTI Type Descriptor, which gives the class name.
This pinpoints exactly which `SilpheedSCS::*` class each
subsystem is.
2. **Then xref the class name** in the binary to find the
producer-side method (`Push*`, `Submit*`, `EnqueueMessage*`,
`Notify*`). That method's signal call (likely
`silph::Event::Set` → `NtSetEvent` thunk) is what should fire.
3. **Trace that producer's caller chain** to find the upstream
gate. Two failure modes are equally likely:
- **(A)** The producer DOES run but signals via `KeSetEvent` on
an embedded `KEVENT` field (not the handle-table side), and
our HLE `KeSetEvent` doesn't route to the handle-table waiter
list. Same family as 0x42450b5c. Smoking gun: `kernel.calls`
metric for `KeSetEvent` is non-zero but the audit shows zero
signals.
- **(B)** The producer is gated by an upstream condition that
doesn't trigger — e.g. UI-system message that never arrives,
timer-DPC that never fires, vsync interrupt with the wrong
APC routing. Smoking gun: `kernel.calls{KeSetEvent}` is zero
for that handle.
4. **0x42450b5c** is a separate bug. Add a parallel
`audit_create_with_ctx` hook to whichever wait path tid=6 takes
(it's NOT `do_wait_single`); the function span at PC=0x824cd4f4
isn't even in the analyser's `functions` table. Likely the
`KeWaitForSingleObject(*PDISPATCHER_HEADER)` wrapper. Once the
wait path is audited, repeat the producer-trace.
The walker is reusable: any handle added to `--trace-handles-focus`
will get a 6-frame stack at creation time. Add new candidates
freely — cost on the unfocused hot path is one `HashSet::contains`.
### KRNBUG-AUDIT-003 — vtable/RTTI class probe + dispatcher identification
**Status:** landed (diagnostic only; no behaviour change). Verified
end-to-end against 5 unit tests + the producer-trace pass at -n 500M.
**Site:** `crates/xenia-kernel/src/state.rs` — new `ClassReadout`
enum + `read_class_at_this(this, mem)` + `probe_create_stack_classes(
ctx, frames, mem)` + private helpers (`is_likely_guest_heap_ptr`,
`is_likely_image_ptr`, `read_ascii_cstring`).
`crates/xenia-kernel/src/audit.rs` — extended `HandleAuditTrail` with
`created_class_probes: Vec<String>` + new
`record_create_with_stack_and_probes`.
`crates/xenia-app/src/main.rs` — `dump_thread_diagnostic` now takes
`&GuestMemory`; FOCUS report prints WAIT-THREAD blocks with per-frame
back-chain + saved register slots + class probes.
**Why it exists:** AUDIT-002 gave us back-chain frames at handle
creation. AUDIT-003's promise was "recover the dispatcher's MSVC C++
class name via vtable[-4] → COL → TypeDescriptor" so the producer
hunt could read "who should call `Class::Submit` but doesn't"
instead of "who should signal handle X."
**Probe correctness:** MSVC RTTI traversal (`vtable[-4]` = COL,
`COL+0x0c` = TypeDescriptor, `TypeDescriptor+8` = NUL-terminated
mangled name starting `.?A`). False-positive guard: at least the
first two vtable slots must be image-range function pointers. This
rejects the CRT static-init iterator pattern where `r31` holds a
pointer into the init-fn array and `[r31]` is a function PC, not a
vtable.
**Verification:**
- Workspace tests: 581 → **586** (+5: 4 new in `state.rs` exercising
RTTI-intact / RTTI-stripped / non-object / `read_ascii_cstring`
termination + 1 integration test for `probe_create_stack_classes`).
- `--stable-digest -n 100M` lockstep oracle:
`instructions=100000002` (unchanged).
- `sylpheed_n50m` golden: passes.
- End-to-end: 500M producer-trace run captured at
`audit-runs/audit-003/run-500m-v4.txt`. RC=0.
### KRNBUG-AUDIT-003 finding — dispatcher addresses + decisive xref audit
**Run:** `exec sylpheed.iso --halt-on-deadlock --trace-handles-focus=
0x1004,0x100c,0x15e0,0x42450b5c -n 500_000_000`.
**Handle 0x100c — dispatcher at `0x828F3D08`:**
Confirmed three ways:
1. Per-frame saved-r31 capture at handle creation:
```
frame=1 lr=0x821817c0 saved-r31=0x828f3d08 ← per-instance ctor
frame=2 lr=0x82180114 saved-r31=0x828f3d08 ← bridge ctor (same value)
```
2. Disassembly of `sub_82181750` at +0x14:
`addis r11, r0, 0x828F; addi r31, r11, 15624` ⇒
`r31 = 0x828F3D08` (the `this` for the per-instance ctor).
3. Field-level write tracking via `xrefs.kind=write`:
`pc=0x82181778 in sub_82181750 — stw r11, 0(r31)` writes -1 to
`[this+0]`.
**`[this+0] = -1` is decisive: this is a hand-rolled POD job-queue
struct, not a C++ polymorphic class.** No vtable means no RTTI;
"class name" doesn't exist in MSVC mangled form. The probe correctly
rejected 0x828F3D08 as a class candidate.
Field layout (from sub_82181750 disasm):
```
[this+ 0] = -1 ; sentinel (not a vtable)
[this+ 4..12] = 0
[this+20] = 0 (halfword)
[this+36] = 0
[this+40] = 7 ; count or version
[this+44..(44+256)] ; sub-region init by `bl 0x8284DCEC`
[this+72] = thread_handle ; set after thread spawn
[this+76] = event_handle ; = 0x100c, set after silph::Event ctor
[this+88..104] = 0
```
Worker is `sub_82181830`: receives r3=this, copies r28=this and
r29=&this[44], does `silph::Thread::SetProcessor(CURRENT, 5)`,
then `lwarx`/`stwcx.` on `&this[80]`. Wait-side telemetry confirms:
the parked thread's spilled r28-r31 area has 0x828F3D08 (=r28 base)
and 0x828F3D34 (= base+44 = r29).
**Handle 0x15e0 — dispatcher at `0x828F4070`:**
Confirmed via xref table. Same shape as 0x100c (POD job queue, not
a C++ class). Constructed by `sub_821701C8` + `sub_8216F618`.
**Handle 0x1004 — 8-instance pool, member addresses still TBD.**
The MSVC ctors for the per-instance and bridge functions did not
preserve `this` in r31 across the call into `silph::Event::Ctor`,
so the saved-r31 chain captured at create time shows
stack-relative pointers (frames 1, 2, 5) and the CRT init-fn
iterator pointer 0x82870180 (frames 3, 4) instead of the pool
member's `this`. Recovering the 8 pool addresses requires hooking
`sub_8217C850`'s entry to capture r3 at each of its 8 calls from
the static ctor at `0x8280F810`.
**Handle 0x42450b5c — separate bug class.** Heap-allocated
(0x4xxxxxxx is user-heap range), parks via non-`do_wait_single`
path. AUDIT-003's image-rdata-focused probe doesn't apply. Track
under a separate audit ID.
**Decisive xref audit — producer is unreached:**
```
0x828F3D08 (handle 0x100c) — 4 references in static analysis:
pc=0x82180100 in sub_821800D8 (kind=ref) — bridge ctor
pc=0x8218176c in sub_82181750 (kind=ref) — per-instance ctor
pc=0x82181778 in sub_82181750 (kind=write) — per-instance ctor init
pc=0x8284caa4 in sub_8280C2C0 (kind=ref) — CRT init driver
0x828F4070 (handle 0x15e0) — 5 references:
pc=0x8216f650 in sub_8216F618 (kind=ref) — bridge ctor
pc=0x8216f674 in sub_8216F618 (kind=ref) — bridge ctor
pc=0x821701e4 in sub_821701C8 (kind=ref) — per-instance ctor
pc=0x82170330 in sub_821701C8 (kind=ref) — per-instance ctor
pc=0x8284c9a4 in sub_8280C2C0 (kind=ref) — CRT init driver
```
**Every xref is in a ctor or the CRT.** No producer code references
either dispatcher base. Confirms AUDIT-001/002's `signal_attempts=0`:
the producer is unreached, not broken. The static analysis would
miss producers that operate via a `this` register passed through a
function arg (no constant-load), but the simple
"`load_const dispatcher_addr; call submit(this, work)`" pattern
**is not present** in the binary for 0x828F3D08 / 0x828F4070.
**Recommendation for next session (no implementation here):**
1. Investigate the call-chain `main() → sub_82181C20 → sub_82181750`.
sub_82181C20 is a subsystem driver — it constructs the queue and
should ALSO wire it into a feeder. If the feeder is itself a
static-init that's never invoked, the trail leads back to the
CRT init array driver (`sub_824ACB38`, walks
0x82870010..0x828708D4) and whatever scheduling subsystem is
supposed to drive those.
2. Hook `sub_8217C850` entry under `--trace-handles-focus=0x1004` to
capture r3 at each of its 8 calls — those are the pool member
`this` addresses for handle 0x1004's 8-instance pool.
3. Treat 0x42450b5c independently. AUDIT-002's hook missed it because
the parking site (PC=0x824cd4f4) isn't routed through `do_wait_single`.
Open KRNBUG-AUDIT-004 for that wait path.
---
### KRNBUG-AUDIT-004 — `--ctor-probe` PC hook + `--dump-addr` struct dump; producer-indirection layer identified; "8-instance pool" hypothesis falsified
**Status**: landed on master (no-ff merge of feature branch
`dispatcher-probe-audit/p0-ctor-probe-and-struct-dump`). Diagnostic-
only, read-only, lockstep-preserved (`instructions=100000002` at
`-n 100M --stable-digest`).
**Tests**: 586 → **588**.
**What landed (`crates/xenia-kernel/src/state.rs`):**
- `pub ctor_probe_pcs: HashSet<u32>` field on `KernelState` (default
empty).
- `pub fire_ctor_probe_if_match(hw_id, mem)` — fast-rejects when set
is empty; on match prints a one-shot record `CTOR-PROBE pc=...
tid=... hw=... cycle=... sp=... r3=... lr=...` plus an 8-frame
back-chain with saved-r31/r30 per frame. Pure read.
- `pub dump_addrs: Vec<u32>` field for end-of-run struct dumps.
- 2 unit tests: empty-set no-op, set-membership invariant.
**What landed (`crates/xenia-app/src/main.rs`):**
- `--ctor-probe=0x8217C850,0x82181750,...` CLI flag (and
`XENIA_CTOR_PROBE`). Parsed into `kernel.ctor_probe_pcs` at
`cmd_exec_inner` startup.
- `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` CLI flag (and
`XENIA_DUMP_ADDR`). Each address gets a 128-byte hex+be32+ASCII
dump at end-of-run, after the per-handle FOCUS report.
- `worker_prologue` calls `fire_ctor_probe_if_match` after reading
`pc` and before any thunk-dispatch / step-block branch.
`dump_thread_diagnostic` consumes `kernel.dump_addrs`.
**Decisive findings (corrects KRNBUG-AUDIT-002/003):**
1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
Probe ran at `-n 50M --halt-on-deadlock` with PCs
`[0x821783D8, 0x82181750, 0x821701C8]` (the per-instance ctors
for handles 0x1004 / 0x100c / 0x15e0 respectively). Each fired
**EXACTLY ONCE**:
```
CTOR-PROBE pc=0x821783d8 tid=1 hw=0 cycle=1401430 r3=0x828f3ec0 ← handle 0x1004
CTOR-PROBE pc=0x82181750 tid=1 hw=0 cycle=5363599 r3=0x828f3d08 ← handle 0x100c
CTOR-PROBE pc=0x821701c8 tid=1 hw=0 cycle=9203618 r3=0x828f4070 ← handle 0x15e0
```
Handle 0x1004 has a SINGLE dispatcher at **0x828F3EC0**, not 8
pool members. The earlier "called 8 times" claim came from
counting raw entries to the OUTER getter `sub_8217C850`, but
`sub_8217C850` is a Meyers-style singleton-getter — its inner
`bl 0x821783D8` (the per-instance ctor) is gated on a one-shot
init flag at `[0x828F48D8] bit 0`. Subsequent `sub_8217C850`
calls just return the existing slot pointer.
2. **The producer indirection layer IS the singleton-getter
itself.** Static byte-scans of `.rdata` and `.data` show 0 hits
for the dispatcher addresses 0x828F3D08 / 0x828F4070 — so no
registry table holds them. But the `xrefs` table for the OUTER
getters reveals:
```
sub_821800D8 (outer for 0x828F3D08, handle 0x100c): 6 callers
0x821802d8 (sub_82180158+0x180) ← non-create-chain
0x821806e0 (sub_821805C8+0x118) ← non-create-chain
0x82180b28 (sub_82180A10+0x118) ← non-create-chain
0x82180ea0 (sub_82180D90+0x110) ← non-create-chain
0x82181254 (sub_821810E0+0x174) ← non-create-chain
0x82181c54 (sub_82181C28+0x2C) ← create-chain ONLY
sub_8216F618 (outer for 0x828F4070, handle 0x15e0): 5 callers
0x8216f9d4 (sub_8216F818+0x1BC) ← non-create-chain
0x8216fc08 (sub_8216F9F0+0x218) ← non-create-chain
0x821700b8 (sub_8216FF70+0x148) ← non-create-chain
0x821700f4 (sub_821700E0+0x14) ← non-create-chain
0x821707f4 (sub_821707C0+0x34) ← create-chain ONLY
```
The non-create-chain consumers all share the **canonical
producer pattern**:
```
bl outer_singleton_getter ; r3 = dispatcher ptr
lwz r3, OFFSET(r3) ; r3 = an event handle / queue field
bl 0x824AA1D8 ; signal/wake function
```
For 0x100c the offset is 80 (= 0x50); for 0x15e0 the offset is
36 (= 0x24).
So **interpretation (2) of the audit charter is confirmed**:
producers reference the dispatchers via a function-call layer of
indirection, not through direct address materialization. The
xref-table audit in AUDIT-003 (which only catches direct
constant-loads of the dispatcher base) was **necessary but not
sufficient** — it correctly saw "no direct producer references"
but missed the singleton-getter indirection.
3. **Dispatcher struct layouts** (128-byte dumps at `-n 50M
--halt-on-deadlock`):
```
0x828F3D08 (handle 0x100c, per-instance ctor sub_82181750):
+0x00 = 0xFFFFFFFF ; queue head/tail sentinel
+0x28 = 0x00000007 ; capacity = 7
+0x2C = 0x01000000 ; init flag
+0x3C = 0xFFFFFFFF ; secondary sentinel
+0x48 = 0x00001010 ; thread_handle (worker thread)
+0x4C = 0x0000100C ; event_handle (= self handle 0x100c)
+0x50 = 0x00000000 ; producer reads this — currently 0
+0x70 = 0x00000001 ; refcount?
+0x74 = 0x828F3D08 ; self-pointer
0x828F4070 (handle 0x15e0, per-instance ctor sub_821701C8):
+0x00 = 0x01000000 ; init flag
+0x10 = 0xFFFFFFFF ; queue sentinel
+0x1C = 0x000015E4 ; sibling-handle (NOT in our parked
; set — possibly a thread handle)
+0x20 = 0x000015E0 ; event_handle (= self handle 0x15e0)
+0x24 = 0x00000000 ; producer reads this — currently 0
+0x40 = 0xFFFFFFFF ; secondary sentinel
0x828F3EC0 (handle 0x1004, per-instance ctor sub_821783D8):
+0x00 = 0x01000000 ; init flag
+0x10 = 0xFFFFFFFF ; queue sentinel
+0x20 = 0x40541BC0 ; heap pointer (sub-buffer #1)
+0x30 = 0x00000014 ; size 20
+0x34 = 0x0000002F ; size 47
+0x38 = 0x414F5F60 ; heap-range payload (or two halfwords)
+0x3C = 0x40211CA0 ; heap pointer (sub-buffer #2)
+0x44 = 0x405418C0 ; heap pointer (sub-buffer #3)
+0x50 = 0x40111840 ; heap pointer (sub-buffer #4)
+0x58 = 0xFFFFFFFF ; sentinel
+0x5C = 0xFFFFFFFF ; sentinel
+0x76 = 0x000012AC ; possibly thread id
+0x78 = 0x00001004 ; event_handle (= self handle 0x1004)
```
The 0x1004 dispatcher is **noticeably different**: it owns 4
guest-heap sub-buffers in 0x4xxxxxxx range, suggesting it
manages a more complex resource than the other two (which are
pure POD job queues). The +0x78 location of the event_handle
differs from 0x100c's +0x4C and 0x15e0's +0x20, so each
subsystem has its own struct layout (no shared base class).
**Reproduce:**
```bash
cargo run --release -p xenia-app -- exec 'sylpheed.iso' \
--halt-on-deadlock \
--trace-handles-focus=0x1004,0x100c,0x15e0 \
--ctor-probe=0x821783D8,0x82181750,0x821701C8 \
--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0 \
-n 50000000
```
Trace files saved at:
- `audit-runs/audit-004/run-50m-probe.txt` (outer-getter probes)
- `audit-runs/audit-004/run-50m-probe-v2.txt` (inner-ctor probes — singleton hypothesis confirmed)
**Recommendation for next session (do not implement a fix):**
Hook the entry of each non-create-chain consumer site for handle
0x100c (5 sites: 0x821802d8, 0x821806e0, 0x82180b28, 0x82180ea0,
0x82181254) and for handle 0x15e0 (4 sites: 0x8216f9d4, 0x8216fc08,
0x821700b8, 0x821700f4) using `--ctor-probe=...`. If any fire, then
the producer DOES execute and the failure mode is in the wake/signal
chain (probably `lwz r3, OFFSET(r3)` reads zero — see dispatcher dump
[+0x50] = 0 for 0x100c, [+0x24] = 0 for 0x15e0 — and the wake
function 0x824AA1D8 is then called with handle=0). If none fire,
the producer chain is gated upstream (likely a feature flag, init
phase, or RPC handler that never fires). Either way, the next
diagnostic narrows the bug surface dramatically.
---
### KRNBUG-AUDIT-005 — `--pc-probe` extended syntax + canary kernel-call diff; `XexCheckExecutablePrivilege` stub gates init flow
**Status**: landed on master (no-ff merge of feature branch
`canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade`). Diagnostic-
only, read-only, lockstep-preserved (`run digest matches golden` at
`-n 50M --stable-digest`).
**Tests**: 588 → **588** (unchanged; existing ctor-probe tests cover the
shared infrastructure).
**What landed (`crates/xenia-kernel/src/state.rs`):**
- `pub pc_probe_consumers: HashMap<u32, (u32, u32)>` field on
`KernelState` (default empty). Maps a probe PC to a
`(dispatcher_addr, offset)` pair; on hit the helper additionally
logs `[disp+off]` — what the producer's `lwz r3, OFFSET(r3)` is
about to read after `bl outer_getter` returns the dispatcher in r3.
- `fire_ctor_probe_if_match` extended to read+print the consumer
field when present. Pure load — does not mutate guest state.
**What landed (`crates/xenia-app/src/main.rs`):**
- `--pc-probe` clap alias on `--ctor-probe` (semantically clearer
name; both share parser/storage).
- Extended token syntax `PC@DISPATCHER:OFFSET` parsed via existing
`parse_hex_u32`. Plain `PC` form still works (backward compatible).
- `XENIA_PC_PROBE` env var as alias for `XENIA_CTOR_PROBE`.
**What landed (`audit-runs/audit-005/`):** one-shot diagnostic
artifacts — not part of the repo build:
- `canary.log` — copy of `/home/fabi/xenia_canary_windows/xenia.log` from a Lutris launch of Sylpheed; oracle for what should happen
- `ours.log` — our trace at `-n 500M` with the 9-PC probe + `probe_calls=trace` filter (838 MB, 5.6 M lines)
- `diff.py` — kernel-call sequence diff (set-diff + first-divergence window); deletable after the audit
- `probe-test-10m.log` — initial smoke test confirming probe wiring
**Reproduce:**
```bash
cargo run --release -p xenia-app -- \
--log-filter='probe_calls=trace,xenia=warn' \
exec sylpheed.iso \
--halt-on-deadlock \
--trace-handles-focus=0x1004,0x100c,0x15e0 \
--pc-probe=0x821802D8@0x828F3D08:80,0x821806E0@0x828F3D08:80,\
0x82180B28@0x828F3D08:80,0x82180EA0@0x828F3D08:80,\
0x82181254@0x828F3D08:80,0x8216F9D4@0x828F4070:36,\
0x8216FC08@0x828F4070:36,0x821700B8@0x828F4070:36,\
0x821700F4@0x828F4070:36 \
-n 500_000_000 \
2> audit-runs/audit-005/ours.log
python3 audit-runs/audit-005/diff.py --max 100 --window 30
```
**Decisive findings:**
1. **Failure mode (α) for KRNBUG-AUDIT-004 confirmed.** All 9
non-create-chain producer call sites for handles 0x100c
(5 sites at `0x821802D8 / 0x821806E0 / 0x82180B28 / 0x82180EA0 /
0x82181254`) and 0x15e0 (4 sites at `0x8216F9D4 / 0x8216FC08 /
0x821700B8 / 0x821700F4`) **fire 0×** at -n 500M
(`grep -c CTOR-PROBE ours.log == 0`). The producer code path is
not reached. Rules out failure mode (B: `lwz` reads zero) and (3:
wake function called with stale handle). The bug is upstream,
in the control-flow that should lead the guest to those producer
functions.
2. **Upstream control-flow divergence located: `XexCheckExecutablePrivilege`
stub returning 0.** Set-diff of kernel-call sequences across our
500M-instruction run vs canary's full Sylpheed boot
(`canary.log`, ~5.3K lines, post-`swaps=2` boot loop reached)
identifies **11 exports that canary calls and we don't**:
```
ExTerminateThread (×2)
KeReleaseSemaphore (×268) ← we use Nt* equivalents
KeResetEvent (×1)
NtDeviceIoControlFile (×2)
ObCreateSymbolicLink (×1)
XGetAVPack (×1) ← gated by priv-10 check
XamTaskCloseHandle (×1)
XamTaskSchedule (×1) ← AUDIT-002 producer candidate
XamUserReadProfileSettings (×2)
XeCryptSha (×1)
XeKeysConsolePrivateKeySign (×1)
```
`XGetAVPack` has exactly one caller (`xrefs` table): site
`0x824AB5A0` inside `sub_824AB578`. The 4 instructions immediately
preceding it are:
```
824ab58c addi r3, r0, 10 ; privilege bit 10
824ab590 addi r31, r0, 0
824ab594 bl 0x8284DEFC ; XexCheckExecutablePrivilege
824ab598 cmpli 0, r3, 0x0
824ab59c bc 12, eq, 0x824AB724 ; if r3==0, skip whole block
; (XGetAVPack + crypto + Nt writes)
```
Our impl `crates/xenia-kernel/src/exports.rs:193`:
```rust
state.register_export(Xboxkrnl, 0x0194, "XexCheckExecutablePrivilege",
stub_return_zero);
```
`stub_return_zero` returns r3=0 unconditionally → guest takes
the `bc 12, eq, 0x824AB724` branch and skips the entire
AV/crypto/save-data init block.
The OTHER call site (`sub_824A9710`, `0x824A99A0`) queries
privilege bit **11**:
```
824a999c addi r3, r0, 11
824a99a0 bl 0x8284DEFC ; XexCheckExecutablePrivilege(11)
824a99a4 cmpli 0, r3, 0
824a99a8 bc 4, eq, 0x824A9A60 ; bne — skip block if priv set
```
Different polarity (this one gates `XamTaskSchedule` etc. on
the **privilege-NOT-set** path). With both stubs returning 0,
the guest walks the wrong arm of *every* privilege-gated branch.
3. **Cascade reaches the parked-waiter handles.** Trace evidence:
our `probe_calls` log shows `lr=0x824A97E4` (a hit from the
error path inside `sub_824A9710` *after* `sub_824ABA98` returned
negative NTSTATUS). The canary log shows all 11 missing exports
firing in a single contiguous boot phase between `XexCheckExecutablePrivilege`
and the worker-thread spawn — i.e. the init phase that sets up
the dispatcher data structures is exactly the phase we skip.
This explains **why the dispatcher fields read zero** (AUDIT-004
dump: `[0x828F3D08+0x50] = 0`, `[0x828F4070+0x24] = 0`): the
ctors run (we counted those), but the *producers* that would
populate those fields with a non-zero handle never execute,
because the upstream init flow that registers them is gated
by the privilege checks.
4. **Note on the diff: canary's log is filtered.** Canary's config
has `log_high_frequency_kernel_calls = false`, which suppresses
most `Rtl*`, `Mm*`, `Ke*`-internal calls from the log. The
"called in OURS but not canary" set (23 entries, headed by
`NtWaitForSingleObjectEx ×1.5M`) is dominated by this filter
difference — it is **not** a bug surface. The directionally
meaningful side of the diff is "called in CANARY but not OURS"
(above): canary's log includes every low-frequency call, so any
absence on our side is a real divergence.
**Stop conditions check:**
- Canary itself does NOT stall at swaps=2 — it reaches a steady
frame loop with `XamInputGetCapabilities` polling, texture loads,
`KeReleaseSemaphore` ticks. The diff was informative.
- First divergence is dense early-CRT noise (~3 entries in), but
the meaningful divergence anchored to a concrete export
(`XGetAVPack`, deterministically gated by a one-line stub) was
recoverable via set-diff. Did not need to narrow scope further.
**Recommendation for next session (do not implement a fix here — this
is the read-only audit deliverable):**
Replace `stub_return_zero` for `XexCheckExecutablePrivilege` with a
real implementation. The XEX header's privilege bitmask is parsed
during XEX load (see `crates/xenia-xex/`); `KernelState` already
holds the loaded `image_base`. Implementation outline:
- Parse `XEX_HEADER_EXECUTION_INFO` / privilege bits at load time
into `KernelState` (or surface via `Vfs` already-loaded XEX
metadata).
- `xex_check_executable_privilege(priv_id) -> u32`:
return 1 if bit `priv_id` is set in the title's privilege bitmask,
else 0. Match canary's encoding (privilege IDs are 0..7F; canary
reads `PrivilegeFlags[i/8] & (1 << (i%8))` from the XEX execution
info).
Validation after the fix:
1. Re-run `audit-runs/audit-005/diff.py` — `XGetAVPack`,
`XamTaskSchedule`, `XeCryptSha`, etc. should appear in our
sequence and the divergence should advance several hundred
calls past the priv-check.
2. Re-run with the 9-PC probe armed at -n 500M — at minimum, the
ctor-probe firings change, and ideally one or more of the 9
producer sites starts firing.
3. If producer sites fire, dispatcher fields `[0x828F3D08+0x50]` /
`[0x828F4070+0x24]` become non-zero (use `--dump-addr`).
4. Lockstep golden `crates/xenia-app/tests/golden/sylpheed_n50m.json`
will likely change (`imports` count goes up, `swaps` may advance);
regenerate the golden under `--stable-digest` and treat that as
the new lockstep anchor.
If after the fix the producer is reached and dispatcher fields
populate, the parked-waiter deadlock should resolve — or surface
the next layer of bugs (e.g. signaling code reads non-zero handle
but `wake_eligible_waiters` fails).
### KRNBUG-XEX-001 — `XexCheckExecutablePrivilege` real impl (P0 fix landed)
**Branch:** `xex-check-privilege/p0-real-impl` (no-ff merged to master).
**Status:** LANDED 2026-05-04. Closes the priv-stub side of KRNBUG-AUDIT-005.
**Implementation.** Replaced `stub_return_zero` at
`crates/xenia-kernel/src/exports.rs:193` with a real implementation
that reads the XEX `XEX_HEADER_SYSTEM_FLAGS` (key `0x00030000`)
bitmap. Mirrors canary's `XexCheckExecutablePrivilege_entry`
[xboxkrnl_modules.cc:22-39](../xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_modules.cc#L22-L39):
`(flags >> priv) & 1` for `priv < 32`, else 0.
Plumbing:
- `xenia-xex/src/header.rs`: added `header_keys::SYSTEM_FLAGS = 0x00030000`.
- `xenia-xex/src/loader.rs`: added `get_system_flags(&Xex2Header) -> u32`.
- `xenia-kernel/src/state.rs`: added `pub xex_system_flags: u32` (init 0)
+ `xex_priv_logged: HashSet<u32>` (one-shot log gate per priv).
- `xenia-app/src/main.rs`: wired
`kernel.xex_system_flags = xenia_xex::loader::get_system_flags(&header)`
alongside the existing `kernel.image_base = base` line in `cmd_exec_inner`.
Sylpheed's bitmap is `0x00000400` (only `XEX_SYSTEM_PAL50_INCOMPATIBLE`
set, bit 10). So priv 10 → 1, priv 11 → 0. Both call sites identified
in AUDIT-005 now route through the canary-correct branches.
**Validation chain (Step 3 of the hand-off):**
| step | outcome |
|---|---|
| (a) `cargo test --workspace --release` | 588 → 589 (new test `xex_check_executable_privilege_reads_system_flags_bitmap`); all prior green |
| (b) `--stable-digest -n 100M` lockstep | `instructions=100000013` (was `100000002`). 11-instruction shift is the deterministic guest divergence into the canary-correct branch — verified with 2 identical re-runs. NOT nondeterminism. |
| (c) AUDIT-005 9-PC probe at -n 500M | All 9 producer probe sites still 0×. **BUT** `kernel.calls{XGetAVPack}` went from `0` → `1` (priv-10 gate flipped — XexCheckExecutablePrivilege itself only fires once for priv 10 because priv-11 site at `sub_824A9710` is downstream and not yet reached). |
| (d) `--trace-handles-focus=0x1004,0x100c,0x15e0` | All 3 handles still `signal_attempts=0`. The 9 probed PCs are members of two indirection-chain singletons (`sub_821800D8` for 0x100c, `sub_8216F618` for 0x15e0); both are downstream of the priv-11 site too. |
| (e) Canary kernel-call diff | 10 of the 11 missing exports remain absent. Only `XGetAVPack` was unlocked. The new first-divergence is inside the AV/crypto block — between `XGetAVPack` returning and `XeCryptSha` (still stub_success), Sylpheed's init aborts the block early. |
| (f) `sylpheed_oracles` (n50m / n2m) | Re-baselined and re-verified across 3 deterministic runs. New `n50m`: `instructions=50000005, imports=407417, swaps=2, draws=0` (was `50000008, 407415, 2, 0`). |
**Decisive interpretation.** The fix is **correct but partial**. The
priv-10 gate at `lr=0x824ab598` flips polarity (was: skip block / now:
execute block); `XGetAVPack` is now reached as predicted. The priv-11
gate at `lr=0x824a99a4` lives inside `sub_824A9710`, which the boot
flow does NOT reach because something in the AV/crypto block (which
the priv-10 fix unlocked) aborts before completing. So:
- `XGetAVPack`: ✅ reached (was missing, now fires once)
- `XeCryptSha` / `XeKeysConsolePrivateKeySign` / `ObCreateSymbolicLink`
/ `XamUserReadProfileSettings`: ❌ still missing → AV/crypto block
aborts early
- `sub_824A9710` (priv-11 caller) and downstream `XamTaskSchedule` /
`XamTaskCloseHandle` / `ExTerminateThread` / etc.: ❌ still unreached
- Parked-handle producers (the 9 PCs): ❌ still 0× (they live in the
init flow gated on priv-11 or post-priv-11 — same blast radius)
**Next-frontier bug (the new gate identified by this fix).** Inside
sub_824AB578 between `XGetAVPack` (`lr=0x824ab5a4`) and the next
canary-only call (likely `XeKeysConsolePrivateKeySign`). The
candidates are:
1. **`XGetAVPack` returns wrong value.** Our impl returns `0x16`
(`crates/xenia-kernel/src/xam.rs:382-384`). Canary returns
`cvars::avpack` (default 8 = HDMI). Sylpheed comment in canary
`xam_info.cc:250-251`: "if the result is not 3/4/6/8 they
explode with errors". `0x16` is not in the accepted set →
strongly suspect this is the next blocker.
2. **`XeCryptSha` / `XeKeysConsolePrivateKeySign` are `stub_success`**
(`exports.rs:188-189`). Returning `STATUS_SUCCESS` without
side effects on a hashing operation may itself confuse the caller
if it then reads a hash buffer expecting non-zero bytes.
Recommended next session: probe `XGetAVPack` return value (try `0x8`
to match canary default) — that's a one-line change in `xam.rs:383`.
If the run advances past, re-diff against canary at the new
divergence; otherwise the next gate is in `XeCryptSha` /
`XeKeysConsolePrivateKeySign`.
**Trace artifacts:** `audit-runs/post-priv-fix/ours.log` (5.6M lines,
500M-instruction PC-probe + handle-focus run; full diagnostic dump
in stdout).
---
### KRNBUG-XAM-001 — `XGetAVPack` returned non-canary `0x16`; canary default is `8` (HDMI)
**Status:** LANDED 2026-05-04. Closes the first follow-up of
KRNBUG-XEX-001 (the `XGetAVPack` arm flipped 0→1 by the priv-10 fix
exposed this as the next gate).
**One-line fix.** `crates/xenia-kernel/src/xam.rs:382-384`:
```rust
fn xget_avpack(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) {
ctx.gpr[3] = 8;
}
```
Was `0x16`. Canary's `XGetAVPack_entry` returns `cvars::avpack`
(`xam_info.cc:252`); the cvar is `DEFINE_int32(avpack, 8, ...)`
(`xam_info.cc:35`). Canary's inline comment at `xam_info.cc:250-251`:
*"Games seem to use this as a PAL check — if the result is not
3/4/6/8 they explode with errors if not in PAL mode."* `0x16` (=22)
is not in `{3, 4, 6, 8}`, so Sylpheed's caller treated the response
as invalid.
**Tests.** 589 → 590. New unit test `xget_avpack_returns_hdmi` asserts
`r3 == 8`. Constant-return change; one assertion is enough.
**Validation chain (Step 3 of the hand-off):**
| step | outcome |
|---|---|
| (a) `cargo test --workspace --release` | 589 → 590; all green. |
| (b) `--stable-digest -n 100M` lockstep | `instructions=100000010, import_calls=987686, swaps=2`. 3-run identical (counter sets bit-equal). Was `100000013, 407417, 2`. The +2.4× import-call jump is the deterministic guest divergence into the canary-correct branch (the AV/crypto block now executes more thunks). NOT non-determinism. |
| (c) AUDIT-005 9-PC probe at -n 500M | All 9 producer probe sites still 0× (`grep -c CTOR-PROBE = 0`). |
| (d) `--trace-handles-focus=0x1004,0x100c,0x15e0` | All 3 handles still `signal_attempts=0`. The producers live deeper in the init flow than what `XGetAVPack` alone unlocks. |
| (e) Canary kernel-call diff (set-diff `audit-runs/post-fix/ours-500m.log` vs `audit-runs/audit-005/canary.log`) | 11 → **10** canary-only exports. The single match unlocked is `XGetAVPack` (canary=1, ours=1). Remaining absent: `ExTerminateThread ×2`, `KeReleaseSemaphore ×268`, `KeResetEvent ×1`, `NtDeviceIoControlFile ×2`, `ObCreateSymbolicLink ×1`, `XamTaskCloseHandle ×1`, `XamTaskSchedule ×1`, `XamUserReadProfileSettings ×2`, `XeCryptSha ×1`, `XeKeysConsolePrivateKeySign ×1`. |
| (f) `sylpheed_oracles` (n50m) | Re-baselined: `instructions=50000004, imports=407416, swaps=2, draws=0` (was `50000005, 407417, 2, 0`). 3 deterministic re-runs. Orphan golden `sylpheed_n2m.json` (no test refers to it) also re-baselined for hygiene. |
**Decisive interpretation.** The fix is **correct but partial**. The
`XGetAVPack` value returns are now in the canary-accepted set, and
the call site at `0x824ab5a0` reaches it; the rest of the AV/crypto
block at `sub_824AB578` between `XGetAVPack` returning (`lr=0x824ab5a4`)
and `XeCryptSha` does not execute. The cascade walked exactly **one**
step.
**Telemetry signal — `lr=0x824a97e4` post-fix.** A new `RtlNtStatusToDosError(r3=0xc0000011 ...)` (`STATUS_NOT_IMPLEMENTED`)
fires from `lr=0x824a97e4` immediately after `XGetAVPack` returns.
That PC is **inside** `sub_824A9710` (the priv-11 site), so the
priv-11-caller IS being entered (probably via fall-through from a
caller of `sub_824AB578`'s post-AV block), but the priv-11 query
itself never fires — there's a precondition between block entry and
priv-11 that fails. Almost certainly a downstream sub of the
AV/crypto block (one of `sub_824ABA98` and friends from AUDIT-005's
disasm) returns negative NTSTATUS, which routes here.
**Next-frontier bug (the new gate identified by this fix).** Between
`XGetAVPack` (`lr=0x824ab5a4`) and `XeCryptSha`. Two candidates:
1. **The 4 unreached siblings inside `sub_824AB578`** —
`XeCryptSha`, `XeKeysConsolePrivateKeySign`, `NtDeviceIoControlFile ×2`,
`ObCreateSymbolicLink`. Currently all stubs (`stub_success` for
the crypto, real for `NtDeviceIoControlFile` but the caller may
not be reached). Need to diff sub_824AB578 step-by-step from
`0x824ab5a4` onward to find the failing precondition.
2. **`sub_824ABA98` returning negative NTSTATUS** (the AUDIT-005
call site referenced from `lr=0x824a97e4`). If the AV/crypto
block calls `sub_824ABA98` and gets a negative return, control
transfers to the error path that triggers the
`RtlNtStatusToDosError(c0000011)` we observe. That PC is the
tail signal — finding what's upstream of it is the next probe.
**What did NOT change** (per the AUDIT-004 diagnosis chain):
- The 9 producer-callsite PCs for handles `0x100c` (5 sites) and
`0x15e0` (4 sites): still 0× hits.
- The 3 parked-waiter handles `0x1004 / 0x100c / 0x15e0`:
still `signal_attempts=0`.
- `swaps=2` plateau, `draws=0`: unchanged.
**Trace artifacts:** `audit-runs/post-fix/ours-500m.log` (5.6M lines,
500M-instruction PC-probe + handle-focus run, post-AV-pack-fix).
Same probe configuration as KRNBUG-AUDIT-005's `audit-runs/audit-005/ours.log`,
re-runnable with the command in that finding's "Reproduce" block.
**Reproduce the canary set-diff:**
```bash
python3 - <<'PY'
import re
from pathlib import Path
from collections import Counter
HERE = Path('audit-runs/audit-005')
CR = re.compile(r'^d>\s+[0-9A-Fa-f]+\s+([A-Z][A-Za-z0-9_]+)\(')
OR_ = re.compile(r'probe_calls.*?call=([A-Za-z_][A-Za-z0-9_]*)')
def extract(p, rx):
out = Counter()
with open(p, errors='replace') as f:
for line in f:
m = rx.search(line)
if m: out[m.group(1)] += 1
return out
canary = extract(HERE/'canary.log', CR)
ours = extract('audit-runs/post-fix/ours-500m.log', OR_)
for n in sorted(set(canary) - set(ours)):
print(f' {canary[n]:>5} {n}')
PY
```
### KRNBUG-IO-002 — `nt_query_volume_information_file` block size (LANDED, gate hypothesis FALSIFIED)
**Status:** applied (branch `xboxkrnl-vol-allocunit/p0-65536-cluster`,
single squash commit). Tests 591 → 592. Lockstep
`instructions=100000010, swaps=2, draws=0` deterministic across two
reruns. sylpheed_n50m oracle still matches its existing golden.
**The fix.** `crates/xenia-kernel/src/exports.rs:1241-1269`,
`nt_query_volume_information_file` class-3 (FileFsSizeInformation)
branch, was returning `(total=0x100000, free=0,
sectors_per_unit=1, bytes_per_sector=2048)`. Replaced with the
canary-NullDevice byte-identical `(total=0x10, free=0x10,
sectors_per_unit=0x80, bytes_per_sector=0x200)` (product = 65536).
Reference: `xenia-canary/src/xenia/vfs/devices/null_device.h:38-46`.
**The cascade hypothesis.** AUDIT-006 predicted that fixing this would
unblock seven canary-only kernel exports — the priv-11 query at
`sub_824A9710` would fire, `XamTaskSchedule` at `lr=0x824a9a10` would
fire, the Cache0 callback thread would spawn, and dispatcher 0x100c's
producer would finally fire (closing the 6-session producer hunt).
**The cascade DID NOT FIRE.** Fresh 500 M trace at
`audit-runs/post-IO-002/ours.log` (692 MB, 5.6 M lines):
| Metric | Pre-IO-002 (audit-006) | Post-IO-002 |
|---|---|---|
| canary-only kernel exports | 7 | **7 (identical set)** |
| `XexCheckExecutablePrivilege` calls | 1 (priv=0xA only) | **1** (still no priv=0xB) |
| `XamTaskSchedule` calls | 0 | **0** |
| `KeResetEvent / ObCreateSymbolicLink / KeReleaseSemaphore / ExTerminateThread / XamTaskCloseHandle / XamUserReadProfileSettings` | 0 | **0** |
| `NtQueryVolumeInformationFile` calls | 16 | **16** (no new sites reached) |
| `swaps` | 2 | 2 |
| `draws` | 0 | 0 |
| Worker thread spawns | 19 | 18 (within noise) |
| `imports` at -n 100M (stable digest) | 987686 | **987630** (-56) |
**Diagnostic.** All 16 `NtQueryVolumeInformationFile` calls in our trace
originate from a single LR `0x82611f38`, a downstream consumer that
**completes successfully** in both pre- and post-fix runs. The audit-006
premise that `sub_824ABA98`/`sub_824A9710` consume the volume-info reply
at the priv-11 gate is therefore likely incorrect, *or* the gate consumes
a different information class via a different export entirely.
**Stop-condition triggered.** Per the IO-002 task brief, this session
landed the correct fix (it makes our reply byte-identical to canary's
NullDevice and survives every test we have) but did not pivot to a
second fix. The branch is kept because the value change is correct
and unblocks no regression; it is, however, **not load-bearing for
the priv-11 gate**.
**Next-session next-gate hypothesis (untested, ranked by likelihood):**
1. **`sub_824A9710` early-exit probe.** Per AUDIT-005 instrumentation
the priv-11 site has never fired in any session. Use `--pc-probe` on
the entry of `sub_824A9710` and probe each conditional branch within
it; whichever branch exits the function before the priv-11
`XexCheckExecutablePrivilege` call site is the actual gate.
2. **Different info-class.** `nt_query_information_file` (class 5
`FileStandardInformation`, class 22 etc.) or
`nt_query_full_attributes_file` may be the actual consumer. The
16 calls at LR `0x82611f38` are *not* the gate even though they
complete successfully.
3. **Mis-attributed disasm.** AUDIT-005's identification of
`sub_824ABA98 = VerifyDirBlockSize` came from disasm reading; IO-001's
runtime trace already invalidated parts of that attribution.
Re-disassemble `sub_824A9710` with `xenia-rs dis --json --at 0x824a9710`
and walk every conditional that might exit before the priv-11 query.
4. **A different IOCTL.** `NtDeviceIoControlFile` is now reachable
(KRNBUG-IO-001 unblocked it); some FsCtl response we return may be
the new gate.
**Trace artifacts:**
- `audit-runs/post-IO-002/ours.log` — 500 M trace, post-fix
- `audit-runs/post-IO-002/canary.log` — copy of the audit-006 canary oracle
- `audit-runs/post-IO-002/diff.py` — copy of audit-006 diff tool
- `audit-runs/post-IO-002/lock_n100m_run{1,2}.json` — bit-identical lockstep digests
- `audit-runs/post-IO-002/canary_only.txt` — set-difference output (the 7-entry list)
- `audit-runs/post-IO-002/canary_exports.txt`, `ours_exports.txt` — sorted unique export names
---
## KRNBUG-AUDIT-007 — branch-probe instrumentation + sub_824A9710 exit-branch identification (2026-05-04)
### Outcome
**`--branch-probe` instrumentation landed (read-only diagnostic). Runtime trace decisively identified the priv-11 gate.**
- 592→592 tests; lockstep `instructions=100000010, swaps=2, draws=0` deterministic across reruns
(`audit-runs/audit-007/lock_post_branchprobe.json` ≡ `lock_post_branchprobe_run2.json`
≡ `audit-runs/post-IO-002/lock_n100m_run1.json`).
- Branch: `investigate-sub-824a9710/p0-branch-probe` — kept (instrumentation is reusable).
### Decisive runtime evidence
`audit-runs/audit-007/sub_824A9710-trace.log`:
```
BRANCH-PROBE pc=0x824a9710 tid=1 hw=0 cycle=5363003 r3=0x00000000 lr=0x824a9acc
BRANCH-PROBE pc=0x824a97e0 tid=1 hw=0 cycle=5369559 r3=0xc0000034 lr=0x824a9940
BRANCH-PROBE pc=0x824a9a98 tid=1 hw=0 cycle=5369562 r3=0x00000002 lr=0x824a97e4
```
The probe at `0x824a97e0` (the failure landing pad) captured `r3=0xC0000034`, `lr=0x824a9940` (= the
`cmpi 0,r3,0` PC after `bl sub_824ABD88` at `0x824a993c`). This pinpoints:
- **Exit branch**: `0x824a9944` (`bc 12, lt, 0x824A97E0`) — taken because r3 was 0xC0000034 < 0.
- **Responsible bl**: `0x824a993c` → `sub_824ABD88` first call.
- **Status code**: `0xC0000034` = `STATUS_OBJECT_NAME_NOT_FOUND`.
### Root-cause chain through sub_824ABD88
The function-detector's `end_address=0x824abe3c` for sub_824ABD88 was a truncation artifact;
the function actually runs to `0x824ac184`. Within that range the `0xC0000034` is **HARDCODED**
at `0x824abea8-0x824abeac`:
```
0x824abe90 bl NtDeviceIoControlFile (FsCtlCode=0x74004, out_buf=r1+160, out_len=16)
0x824abe94 cmpi 0, r3, 0
0x824abe98 bc 12, lt, 0x824abeb8 # if r3 < 0 → failure cleanup (NOT taken; stub returned 0 = success)
0x824abe9c ld r10, 168(r1) # load doubleword from [out_buf+8]
0x824abea0 cmpi cr6, 1, r10, 0 # 64-bit cmp r10 == 0
0x824abea4 bc 4, 4*cr6+eq, 0x824abeb0 # if NOT eq, skip the assignment
0x824abea8 addis r3, r0, 0xC000 # r3 = 0xC0000000
0x824abeac ori r3, r3, 0x34 # r3 = 0xC0000034 (STATUS_OBJECT_NAME_NOT_FOUND)
0x824abeb0 cmpi cr6, 0, r3, 0
0x824abeb4 bc 4, 4*cr6+lt, 0x824abecc # if NOT lt → success path; r3 < 0 → NOT taken
0x824abeb8 or r28, r3, r3 # save 0xC0000034
0x824abebc lwz r3, 96(r1)
0x824abec0 bl NtClose
0x824abec4 or r3, r28, r28 # restore failure status
0x824abec8 b 0x824abe34 # epilogue → return 0xC0000034
```
The game expects the IOCTL response's upper 8 bytes to be non-zero. Our
`NtDeviceIoControlFile` is registered as `stub_success` at
`crates/xenia-kernel/src/exports.rs:90` — returns 0 (SUCCESS) but writes nothing
into the OUT buffer. The fresh stack frame has zero at `[r1+168]`, so the check
at `0x824abea4` falls through to the hardcoded failure assignment.
### Canary reference
`audit-runs/post-IO-002/canary.log` lines 1196-1209 show canary calls
`NtDeviceIoControlFile(handle, ..., FsCtlCode=0x74004, ..., out_buf, out_len=16)`,
gets a populated 16-byte response (whose upper 8 bytes are non-zero), then proceeds
through 17× NtWriteFile zero-fill, NtClose, NtCreateFile (Cache0\), NtQueryVolumeInformationFile
class=3, NtClose, and finally **`XexCheckExecutablePrivilege(0x0000000B)`** — the
priv-11 site that has never fired in our run. Immediately followed by
**`XamTaskSchedule(824A93C8, 828A28F0, ...)`** — the canary-only export hunt's
gate-pivot call.
The IOCTL implementation in canary lives in `xenia-canary/src/xenia/vfs/devices/null_device.{h,cc}`
(`NullDevice::IoControl`) — the device's `IoControl` writes the structured payload
that the game-side check consumes.
### Next session: KRNBUG-IO-003
**Where:** `crates/xenia-kernel/src/exports.rs:90` — replace the
`stub_success` registration with a real `nt_device_io_control_file`.
**Minimum viable fix:** for FsCtlCode=0x74004, write any non-zero u64 at
`[out_buf+8]`. That alone clears the gate.
**Canary-faithful fix:** mirror `NullDevice::IoControl` for FsCtlCodes
`0x70000` (8-byte response, consumed at `sub_824ABD88:0x824abe3c` for a
log2/shift count) and `0x74004` (16-byte response, partition geometry).
Fall through to `STATUS_NOT_IMPLEMENTED` for unrecognized codes so future
divergences surface.
**Falsifiable cascade prediction:**
- `XexCheckExecutablePrivilege` count: **1 → 2** (priv=0xA + priv=0xB).
- `XamTaskSchedule` count: **0 → 1**.
- canary-only export count: **7 → ≤ 3**.
- Worker thread spawn at `ExCreateThread(entry=0x82181830, ctx=0x828F3D08)` —
the parked-handle 0x100c producer fires.
- `swaps=2 draws=0` plateau persists (renderer is multi-causal).
**Failure modes to watch for:**
- (α) Re-running `--branch-probe` should show a NEW exit branch in
`sub_824A9710` (one of `0x824a996c`, `0x824a9998`, `0x824a9a18`) if a downstream
helper has its own unimplemented dependency.
- (β) sub_824ABA98's analogous failure path (called at 0x824a9950, 0x824a9990)
may surface if its own kernel-call dependencies are stubs.
- (γ) `nt_write_file` against the synth empty-file Cache0 path needs to handle
the 17× zero-fill loop; if our implementation rejects writes to a zero-byte
file, the cascade stalls just past the IOCTL fix.
### Files added / modified (instrumentation only)
- `crates/xenia-kernel/src/state.rs` — added `branch_probe_pcs: HashSet<u32>`
field + `fire_branch_probe_if_match(hw_id)` method emitting a single compact
`BRANCH-PROBE` line per fire (pc, tid, hw, cycle, r3, lr, cr0/cr6). Sister to
`fire_ctor_probe_if_match`; no back-chain walk. ~40 LOC.
- `crates/xenia-app/src/main.rs` — `--branch-probe` CLI flag (env var
`XENIA_BRANCH_PROBE`), parser, and call in `worker_prologue`. ~30 LOC.
### Probe-machinery limitation
The probe fires only when the **block head** at the matched PC is dispatched —
mid-block PCs in the request set don't trigger because the prologue runs once
per block, not once per instruction. In this trace: function entries, failure
landing pads (`0x824a97e0`), and external-call return PCs (`0x824a9a98`) all
hit. Internal `bc` PCs (`0x824a9944`, `0x824a9958`, ...) were silent. The data
captured was sufficient — the failure landing PC + LR pair uniquely identified
the upstream branch — but if a future audit needs every-branch coverage, the
helper call would need to move from `worker_prologue` into the per-instruction
step loop (or a custom block-scan that flags branches matching the request
list).
### Trace artifacts (re-runnable)
- `audit-runs/audit-007/sub_824A9710-trace.log` — 5 BRANCH-PROBE lines + thread diagnostics.
- `audit-runs/audit-007/sub_824A9710-trace.err` — full kernel-call trace + counter dump.
- `audit-runs/audit-007/lock_post_branchprobe.json`, `lock_post_branchprobe_run2.json` — lockstep digests.
Re-run command:
```
PROBE_LIST="0x824a9aa0,0x824a9128,0x824a9710,0x824a9778,0x824a9788,0x824a9790,0x824a97dc,0x824a97e0,0x824a9824,0x824a9828,0x824a9840,0x824a9850,0x824a985c,0x824a9870,0x824a9880,0x824a9888,0x824a9918,0x824a9944,0x824a9958,0x824a996c,0x824a9998,0x824a999c,0x824a99a0,0x824a99a8,0x824a9a10,0x824a9a18,0x824a9a60,0x824a9a78,0x824a9a98"
./target/release/xenia-rs exec sylpheed.iso --halt-on-deadlock \
--branch-probe="$PROBE_LIST" -n 500_000_000 \
> audit-runs/audit-007/sub_824A9710-trace.log \
2> audit-runs/audit-007/sub_824A9710-trace.err
```
## KRNBUG-IO-003 — `NtDeviceIoControlFile` real implementation (LANDED 2026-05-04)
### Outcome
**Replaced `stub_success` registration with a real `nt_device_io_control_file` mirroring canary `NullDevice::IoControl` for FsCtlCodes 0x70000 + 0x74004.**
- 592 → 594 tests; lockstep `instructions=100000019 imports=987524 swaps=2 draws=0` deterministic across run1/run2/run3 (`audit-runs/post-IO-003/lock_n100m_run{1,2,3}.json`, all byte-identical).
- Branch: `xboxkrnl-ioctl/p0-fsctl-mountinfo` (no-ff merge).
- `sylpheed_n50m` golden re-baselined `instructions=50000004→50000003`, `imports=407362→407255`. `sylpheed_n2m` unchanged.
### Audit-007 prediction scorecard
| # | Prediction | Pre | Post | Held? |
|---|---|---|---|---|
| (a) | `cargo test --workspace --release` green | 592 | 594 | ✓ |
| (b) | Lockstep determinism preserved | bit-identical | bit-identical (run1≡run2≡run3) | ✓ |
| (c) | `XexCheckExecutablePrivilege` count: 1 → ≥2 | 1 | 2 | ✓ |
| (c) | `XamTaskSchedule` count: 0 → ≥1 | 0 | 1 | ✓ |
| (e) | canary-only exports: 7 → ≤3 | 7 | 3 | ✓ |
| (d) | 0x100c worker spawn (handle goes from UNCREATED to created+signaled) | UNCREATED | UNCREATED | **✗** |
| (d) | 0x1004 signal_attempts > 0 | 0 | 0 | **✗** |
| (d) | 0x15e0 signal_attempts > 0 | 0 | 1 (primary=1, "not stuck") | ✓ (new) |
| (f) | Worker thread spawn count: 19 → higher | 19 | 19 | **✗** |
5/8 predictions held; cascade fired but not as far as audit-007 expected. Specifically:
- The priv-11 query DOES fire → flows into `XamTaskSchedule` → 0x15e0 semaphore-signal pump now runs.
- The audit-006 `canary_only.txt` 7 entries reduce by 4 (`KeResetEvent`, `ObCreateSymbolicLink`, `XamTaskCloseHandle`, `XamTaskSchedule`). Still missing: `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`.
- `XeCryptSha` (1) and `XeKeysConsolePrivateKeySign` (1) also now fire (were 0).
- Per-handle 0x100c stays UNCREATED — the producer chain that should spawn the 0x100c worker is gated downstream of where IO-003 unblocks. The 7→3 canary-only entries that remain (`ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`) are the next clue: any of those could be the next gate.
### Files modified
- `crates/xenia-kernel/src/exports.rs:90` — `stub_success` → `nt_device_io_control_file`.
- `crates/xenia-kernel/src/exports.rs` (new function) — body mirrors `xboxkrnl_io.cc:645-678` + `null_device.cc` (canary). Stack args 9-10 read from `[sp+0x54]` / `[sp+0x5C]` per Xbox 360 PowerPC ABI (parameter save area at sp+0x14 + 64 bytes spill = sp+0x54, confirmed by disasm of caller `sub_824ABD88` at `0x824abe04-0x824abe10` and `0x824abe78-0x824abe70`).
- `crates/xenia-kernel/src/exports.rs` (new tests) — `nt_device_io_control_file_drive_geometry` (FsCtlCode 0x70000) + `nt_device_io_control_file_partition_info_unblocks_gate` (FsCtlCode 0x74004 — asserts OUT+8 ≠ 0, the gate condition).
- `crates/xenia-app/tests/golden/sylpheed_n50m.json` — re-baselined.
### Trace artifacts
- `audit-runs/post-IO-003/lock_n100m_run{1,2,3}.json` — three byte-identical 100M lockstep runs.
- `audit-runs/post-IO-003/lock_n500m.json` — 500M lockstep digest (`instructions=500000010 imports=5629676`).
- `audit-runs/post-IO-003/exec_trace_focus_500m.log` — `--trace-handles-focus=0x1004,0x100c,0x15e0` at 500M.
### Next session candidates
The 0x100c worker still doesn't spawn. Three of audit-006's canary-only entries (`ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`) remain canary-only — any of them may be the next downstream gate. Re-running `--branch-probe` against `sub_824A9710` would now show a new exit branch (the priv-11 site fires, so the failure mode has shifted).
## KRNBUG-AUDIT-008 — IO-003 model reset; next gate is β-class job-submitter unreached (DIAGNOSTIC 2026-05-05)
### Outcome
**Model reset on IO-003 cascade.** Branch-probe trace at the post-priv-11 cluster decisively shows the 0x100c worker IS spawned as `tid=3` with `ctx=0x828F3D08, entry=0x82181830`, parked on lifecycle event handle `0x1020` (signals=0). The IO-003 audit memory's "0x100c UNCREATED" claim was wrong; the handle audit already had `handle=0x00001020 waiters(tid)=[3]` but the trace dump didn't connect tid=3 to the 0x100c dispatcher. Same correction applies to the 0x1004 worker (tid=11).
The actual next gate is **β-class** (internal-sub unreached): the 5 non-create-chain callers of `sub_821800D8` (job-submitter shims with the pattern `bl outer_getter; lwz r3, 80(r3); bl sub_824AA1D8`) are never called. Their parent functions live in the **0x82287000-0x82292FFF module range** — likely renderer / scene-graph subsystem.
### Decisive runtime evidence
`audit-runs/audit-008/branch-probe.trace`:
```
BRANCH-PROBE pc=0x824a9a14 tid=1 cycle=5378562 -- main: post-XamTaskSchedule
BRANCH-PROBE pc=0x824a93c8 tid=2 cycle=0 r3=0x828a28f0 -- spawned thread enters callback (matches canary's 0x824A93C8/0x828A28F0)
BRANCH-PROBE pc=0x824a9540 tid=2 cycle=4232 -- spawned thread post-StfsCreateDevice cmpi
BRANCH-PROBE pc=0x824a9a44 tid=1 cycle=5378576 -- main: post-KeWaitForSingleObject(0x8287094C)
BRANCH-PROBE pc=0x824a9a4c tid=1 cycle=5378579 -- main: post-KeResetEvent
BRANCH-PROBE pc=0x824a9a98 tid=1 cycle=5378596 -- main: sub_824A9710 epilogue
BRANCH-PROBE pc=0x824a9acc tid=1 cycle=5378609 -- main: sub_824A9AA0 return
BRANCH-PROBE pc=0x8216eaa0 tid=1 cycle=5378617 -- main: bl sub_82181C28 callsite
BRANCH-PROBE pc=0x82181c28 tid=1 cycle=5378618 -- main entered sub_82181C28
BRANCH-PROBE pc=0x821800d8 tid=1 cycle=5378630 -- main entered sub_821800D8 (singleton getter for 0x100c)
BRANCH-PROBE pc=0x82181750 tid=1 cycle=5378645 r3=0x828f3d08 -- main entered sub_82181750 ctor
BRANCH-PROBE pc=0x821817c0 tid=1 cycle=5378712 r3=0x00001020 -- post-sub_824A9F18 (lifecycle event=0x1020)
BRANCH-PROBE pc=0x82181830 tid=3 cycle=0 r3=0x828f3d08 lr=0xbcbcbcbc -- 0x100C WORKER SPAWNED
BRANCH-PROBE pc=0x82181838 tid=3 cycle=1 -- past entry thunk
BRANCH-PROBE pc=0x821817fc tid=1 cycle=5378786 r3=0x00001024 -- main: post-sub_82172370, thread handle=0x1024
BRANCH-PROBE pc=0x82180120 tid=1 cycle=5378951 -- main: post-atexit
BRANCH-PROBE pc=0x82181c58 tid=1 cycle=5378957 r3=0x828f3d08 -- main: bl sub_821800D8 returned
```
### Mechanical chain (cross-checked vs disasm)
1. main (sub_8216EA68) returns from sub_824A9AA0 at cycle 5378609.
2. main calls `sub_82181C28` at `0x8216eaa0` (cycle 5378617). `sub_82181C28` is a Meyers singleton getter that checks `[0x828F3D98]` flag.
3. First call → flag is 0 → falls through to `bl sub_821800D8` at `0x82181c54`.
4. `sub_821800D8` is the 0x100c singleton getter. Checks `[0x828F3D78]` flag bit 0. First call → bit 0 is 0 → falls through to `bl sub_82181750` at `0x82180110`.
5. `sub_82181750` is the constructor. With `r3 = this = 0x828F3D08` (the dispatcher).
6. Constructor calls `bl sub_824A9F18` (allocates a lifecycle event); returns r3=0x1020.
7. Constructor calls `bl sub_82172370` at `0x821817f8` (the ExCreateThread wrapper) with r3=0x20000 (stack), r4=0x82181830 (entry), r5=0x828F3D08 (ctx).
8. Worker thread spawns as tid=3 at PC=0x82181830 → through the entry thunk → 0x82181838 (worker body).
9. Worker body reads `[0x828F3D08+76] = 0x1020` (lifecycle event handle), waits on it.
10. **Wait never satisfied** — handle 0x1020 has `signals=0, waits=1, wakes=0` in the handle-audit dump.
### Where the gate actually is
`sub_821800D8` xrefs:
| caller PC | from func | role |
|-----------|-----------|------|
| 0x82181c54 | sub_82181C28 | **create chain** (ran successfully — see trace above) |
| 0x821802d8 | sub_82180158 | job-submitter shim |
| 0x821806e0 | sub_821805C8 | job-submitter shim |
| 0x82180b28 | sub_82180A10 | job-submitter shim |
| 0x82180ea0 | sub_82180D90 | job-submitter shim |
| 0x82181254 | sub_821810E0 | job-submitter shim |
Each shim is a 5-instruction leaf (`bl getter; lwz r3, 80(r3); bl sub_824AA1D8`) — the canonical "get-then-enqueue" pattern. `sub_824AA1D8` is the universal dispatcher-submit primitive that signals the lifecycle event.
The 5 shims' parent functions are in the **0x82287000-0x82292FFF module range** (sub_82292838, sub_822878A8, sub_8228D760, sub_822900A8, sub_822919C8, sub_8228FDB8). This module is downstream of the cache-init code we've been working on and is almost certainly renderer/scene-graph related.
### Discipline gate
Per task brief (audit-008 session), gate fails on:
- Box 1: gate is NOT a single stubbed import (β-class, not α-class).
- Box 4: no sharp 4-dim cascade prediction can be written without first identifying which submitter should fire first.
**Hand back. No fix this session.**
### Follow-up probe set for next session
Probe parent functions of the 5 shims to find which path actually fires:
```
PROBE_LIST="0x82292838,0x822878a8,0x8228d760,0x822900a8,0x822919c8,0x8228fdb8,
0x82180158,0x821805c8,0x82180a10,0x82180d90,0x821810e0,
0x824aa1d8"
```
Whichever target fires identifies the producer path; whichever doesn't names the gate.
### Trace artifacts
- `audit-runs/audit-008/branch-probe.trace` — 17 BRANCH-PROBE lines (clean extract).
- `audit-runs/audit-008/probe-100m.log` — full stdout.
- `audit-runs/audit-008/probe-100m.err` — full stderr trace.
### Files modified
None. KRNBUG-AUDIT-007's `--branch-probe` machinery was sufficient. No git commit — no code changes.
## KRNBUG-AUDIT-009 — renderer cluster fully unreached; gate is structurally above 0x82287xxx-0x82294xxx (DIAGNOSTIC 2026-05-05)
### Outcome
**Stop condition 1 triggered.** Branch-probed all 21 PCs proposed by AUDIT-008 (12 renderer-cluster parents + shims + dispatcher) plus the AUDIT-005 9-PC producer-callsite set. **0/21 fired** at -n 500M. The 0x82287000-0x82294000 cluster is not entered at all. The gate is structurally above the cluster — outside its call boundary — so a deeper renderer-side probe would land on dead code. Per task brief stop condition 1, hand back with a higher-up probe set; no Phase 2 attempted.
### Decisive runtime evidence
`audit-runs/audit-009/probe-500m.err`:
- `branch probes armed: 21 (0x8216f9d4 ... 0x824aa1d8)`
- `BRANCH-PROBE` line count in stderr: **0**.
- `instructions=500000010 import_calls=5629676 unimplemented=0` — completed without halt.
- Final state, main: `tid=1 hw=0 state=Ready pc=0x822f1c60 lr=0x822f1be0` — inside `sub_822F1AA8` (frame-poll loop, between two `XNotifyGetNext` calls at 0x822f1bdc / 0x822f1c14). LR-of-LR points back into the same function.
- Counters: `XNotifyGetNext=1,489,741`, `NtWaitForSingleObjectEx=1,489,801`, `NtWaitForMultipleObjectsEx=865,493`, `RtlEnter/LeaveCriticalSection=889,109` each, `VdSwap=2`. Main is service-loop polling forever; no forward progress past frame-poll.
- 18 worker threads spawned (parity with audit-008 baseline + 2 new entry trampolines for 0x822c6870 / 0x824563e0 / 0x823dde30 / 0x823ddb50 that weren't catalogued before): tid=3 (0x100c worker, ctx=0x828F3D08, parked on lifecycle event 0x1020), tid=11 (0x1004 worker, ctx=0x828F3EC0, parked on event 0x1004), tid=17 (0x15e0 worker, ctx=0x828F4070, parked on event 0x15F4 — confirms post-IO-003 spawn at the new tid).
- canary-only kernel exports unchanged from audit-008: `{ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings}` (3 entries).
- `signal_attempts=0` on parked handles 0x1004, 0x100c (= event 0x1020), 0x15e0 (= event 0x15F4), 0x10c4. Same parked state as audit-008.
### Mechanical interpretation
- **Box 1 of the 12 PCs (parents):** `sub_82292838, sub_822878A8, sub_8228D760, sub_822900A8, sub_822919C8, sub_8228FDB8` — never entered.
- **Box 2 of the 12 PCs (shims):** `sub_82180158, sub_821805C8, sub_82180A10, sub_82180D90, sub_821810E0` — never entered. (These are leaf shims with the `bl outer_getter; lwz r3, OFFSET(r3); bl sub_824AA1D8` pattern.)
- **Box 3 (universal dispatcher):** `sub_824AA1D8` — never entered. The dispatcher serves both 0x100c and 0x15e0 clusters; its non-entry confirms NEITHER cluster's job-submit path runs, not just the 0x100c side.
- **AUDIT-005 9-PC producer callsites (5 × 0x100c shims + 4 × 0x15e0 shims):** never entered.
This eliminates the audit-008 working hypothesis that the gate sat among the 5 known callers of `sub_821800D8`. The gate is at least one level higher — above the cluster's external entry boundary.
### Cluster shape (from sylpheed.db xrefs)
The 0x82287000-0x82294000 cluster is **internally cohesive but externally unreachable via direct calls**. Its level-1 root functions (where call hierarchy starts within the cluster) have only self-call xrefs — i.e. the cluster is reached only via indirect calls (function pointers / vtables) from outside. The 6 candidate parents from audit-008 sit deep enough that ANY upstream gate looks the same from their level.
External entry points worth probing next:
- `sub_82293448` (0x82293448) — level-1 root, only self-recursion xrefs.
- `sub_822919C8` (0x822919C8) — level-1 root, only self-recursion xrefs.
- `sub_82288028` (0x82288028) — 8 callers, all in-cluster, but a hub.
- `sub_82292D80` (0x82292D80) — 1 caller, in-cluster (sub_82293448).
- `sub_822851E0` (0x822851E0) — has 2 in-cluster callers (sub_82284BA0, sub_82290BC8); reached transitively from `sub_82289FD0`.
- `sub_82286BC8` (0x82286BC8) — only sub_822851E0 calls it.
NEW thread entry trampolines spawned post-IO-003 (these didn't exist in audit-008's tid set; mid-run kernel-call telemetry shows ExCreateThread at these PCs):
- 0x822c6870 (tid=14 + tid=15, parallel duplicates, ctx=0x828f3300)
- 0x824563e0 (tid=16, ctx=0x828f3e70)
- 0x823dde30 (tid=18, ctx=0x828f3c4c)
- 0x823ddb50 (tid=19 + tid=20, parallel duplicates, ctx=0x828f3c88)
These are likely XAM/system-event dispatchers, not renderer producers, but their entries are unprobed — worth folding into the next probe set to confirm they are not the missing edge.
### Why main parks at sub_822F1AA8
main's call sequence (from xrefs of sub_8216EA68): the priv-11/cache-init cluster (`sub_824A9AA0`), the 0x100c create chain (`sub_82181C28`), `sub_82181298` (a 964-byte function — likely 0x1004 create chain), then a series of `sub_8216E858 / sub_82448470 / sub_8216F218 / sub_82448XXX` calls (probably config / xconfig / atexit), then finally:
```
0x8216ecc4: sub_822F17F0 (684 bytes — pre-poll setup, calls sub_82611CD8/sub_825F1000/sub_825F14D0/sub_824C1A38/sub_824BD460×2/sub_824BD580×2/sub_824B3798/sub_824B40B0/sub_824C2BF8/sub_824CE348/sub_824C76D0/sub_824CE4D0)
0x8216eccc: sub_822F1AA8 (frame-poll #1) ← we are here, looping forever
0x8216ee10: sub_822F1AA8 (frame-poll #2) — never reached
```
Two interpretations are plausible:
1. **sub_822F1AA8 is a finite poll** that exits when XNotifyGetNext returns a particular notification (e.g. dashboard signin completion / profile load). Some XAM event main expects is never delivered.
2. **sub_822F1AA8 is an event pump for the FIRST half of init**, calling work-items that should drive the renderer subsystem. If the work-items are dispatched here and the dispatch path goes via an indirect call into the 0x82287xxx cluster, then the missing edge is a function-pointer/vtable that's never populated.
Both interpretations are consistent with the 0/21 probe data. Probing the entry of sub_822F1AA8's CALLEE list (the calls inside the 1.49M-iteration loop) will discriminate.
### Discipline gate
| # | Condition | Pass? |
|---|---|---|
| 1 | Phase 1 named a single failing kernel/xam import (α) or a narrow internal-sub bug | **NO** — 0 PCs fired |
| 2 | Canary impl small (<80 LOC) | N/A |
| 3 | Sharp 4-dim cascade prediction | **NO** — no candidate fix |
| 4 | No new ABI plumbing | N/A |
| 5 | Fix doesn't touch renderer subsystem | N/A |
**Gate fails on box 1 + 3. STOP. Hand back per stop condition 1.** No code changes this session.
### Follow-up probe set for next session
```
PROBE_LIST=
# Renderer-cluster level-1 roots (never entered if gate is above):
0x82293448,0x822919c8,0x82288028,0x82292d80,0x822851e0,0x82286bc8,
# Newly spawned thread entry trampolines (unprobed, may be system-side):
0x822c6870,0x824563e0,0x823dde30,0x823ddb50,
# Main's frame-poll loop entry + its callee list (XNotifyGetNext consumer):
0x822f1aa8,0x822f1be0,0x822f1c14,0x822f1d00,
# Main's continuation (only fires if main exits frame-poll #1):
0x822f1638,0x821506b8,0x8216f088,0x82150ef8,
0x82173360,0x82173530,0x8216f170,0x824a9ad8
```
Whichever entries fire bound the live path; whichever don't bound the gate.
If `sub_822F1AA8` fires once but never exits → main is stuck waiting for a XAM notification or critical-section signal. Look for which `XamNotifyCreateListener`-registered ID the loop expects.
If `sub_822F1AA8` fires AND exits → main reaches `sub_822F1638` etc.; gate is further down.
If the cluster level-1 roots fire → gate is INSIDE the cluster (renderer β-recursion), and the brief's "no renderer fixes" rule binds.
### Trace artifacts
- `audit-runs/audit-009/probe-500m.log` — final state + thread diag + handle audit + full counter table.
- `audit-runs/audit-009/probe-500m.err` — full stderr trace (kernel call log, 187 KB).
- `audit-runs/audit-009/branch-probe.trace` — empty (0 BRANCH-PROBE lines emitted).
Re-run command:
```
cd xenia-rs
PROBE="0x82292838,0x822878a8,0x8228d760,0x822900a8,0x822919c8,0x8228fdb8,\
0x82180158,0x821805c8,0x82180a10,0x82180d90,0x821810e0,0x824aa1d8,\
0x821802d8,0x821806e0,0x82180b28,0x82180ea0,0x82181254,\
0x8216f9d4,0x8216fc08,0x821700b8,0x821700f4"
./target/release/xenia-rs exec sylpheed.iso \
--halt-on-deadlock --branch-probe="$PROBE" \
--trace-handles-focus=0x1004,0x100c,0x15e0,0x1020,0x10c4 \
-n 500000000 \
> audit-runs/audit-009/probe-500m.log 2> audit-runs/audit-009/probe-500m.err
```
### Files modified
None. KRNBUG-AUDIT-007's `--branch-probe` machinery was sufficient. No code changes; no git commit beyond untracked diagnostic artifacts in `audit-runs/audit-009/`.
---
## KRNBUG-AUDIT-010 — XNotify delivery diff: 4 missing startup notifications gate dispatcher invocation (DIAGNOSTIC 2026-05-05)
### Status
**READ-ONLY DIAGNOSTIC**. Branch (α) — canary delivers 4 specific
startup notifications we don't. Discipline gate fails on box 3
(L1-root prediction); no fix landed. Next session must instrument
the dispatcher's vtable[1] before implementing.
### Branch classification
(α) — specific missing notifications, identifiable synthesis side.
### Verified ground truth
Our impl:
- `crates/xenia-kernel/src/xam.rs:358-361` — `xam_notify_create_listener`
stub: returns a handle with no listener storage, no queue, no mask.
- `crates/xenia-kernel/src/xam.rs:363-366` — `xnotify_get_next` stub:
always returns r3=0.
- `crates/xenia-kernel/src/objects.rs:14-77` — `KernelObject` has no
`NotifyListener` variant.
Canary:
- `xenia-canary/src/xenia/kernel/kernel_state.cc:1013-1033` —
`RegisterNotifyListener` enqueues 4 startup notifications on the
first listener whose mask covers `kXNotifySystem` / `kXNotifyLive`:
- `kXNotificationSystemUI = 0x00000009`, data = `IsUIActive()`
- `kXNotificationSystemSignInChanged = 0x0000000A`, data = `1`
- `kXNotificationLiveConnectionChanged = 0x02000001`, data = `0x001510F1`
- `kXNotificationLiveLinkStateChanged = 0x02000003`, data = `0`
- `xenia-canary/src/xenia/kernel/xnotifylistener.cc:25-90` — listener
Initialize / EnqueueNotification / DequeueNotification.
- `xenia-canary/src/xenia/kernel/xam/xam_notify.cc:22-95` —
`XamNotifyCreateListener` and `XNotifyGetNext` real impls.
Runtime — canary (`/home/fabi/xenia_canary_windows/xenia.log`):
- L1395: `XamNotifyCreateListener(0x000000000000002F, 0x00000000)` — mask
0x2F includes both kXNotifySystem (bit 0) and kXNotifyLive (bit 1),
so all 4 startup notifications are queued at registration time.
- L2787: `XamUserReadProfileSettings(0, 0, 0, 0, 8, ...)` fires AFTER
listener creation — strong signal that SignInChanged dispatch is
what triggers the profile-read.
Runtime — ours (audit-009 / -n 500M):
- `kernel.calls{XamNotifyCreateListener} = 1` ✓
- `kernel.calls{XNotifyGetNext} = 1,489,741` — the loop hammers it
~1.5M times in 500M instr; gets r3=0 every call.
- 0/21 renderer-cluster + producer probe PCs fire.
- `XamUserReadProfileSettings` remains canary-only.
### Consumer-side dispatch path (sylpheed.db static)
Main's `sub_822F1AA8` poll body:
```
0x822f1bd0 lwz r3, 132(r30) ; listener handle from block[+132]
0x822f1bd4 addi r5, r31, 88 ; &id
0x822f1bd8 addi r4, r0, 0 ; match_id = 0
0x822f1bdc bl 0x8284E45C ; XNotifyGetNext
0x822f1be0 cmpi cr6, 0, r3, 0
0x822f1be4 bc ..., 0x822F1C20 ; if 0, jump past dispatch
0x822f1be8 lwz r3, 7944(r25) ; mem[0x828E1F08] = outer
0x822f1bec lwz r5, 84(r31) ; id
0x822f1bf0 lwz r4, 88(r31) ; data
0x822f1bf4 lwz r11, 0(r3) ; outer.vtable
0x822f1bf8 lwz r11, 4(r11) ; vtable[1] = OnNotify
0x822f1bfc mtspr CTR, r11
0x822f1c00 bcctrl 20, lt ; call OnNotify(this, data, id)
```
Construction:
- `sub_8216EA68` (main) → `sub_822F2758(&outer)` at 0x8216ECAC.
- `sub_822F2758` at 0x822f2788: `outer.vtable = 0x820AD894`.
- → `sub_82150EF8(288)` allocates `block`.
- → `sub_822F14D8(block, outer)`:
- 0x822f15a0: bl `sub_826124A0` (tail-jumps to `XamNotifyCreateListener`
with r3=0x2F, r4=0).
- 0x822f15a8: `block[+132] = listener_handle`.
- 0x822f15c8: `mem[0x828E1F08] = outer`.
- 0x822f27b8 back in caller: `outer[+4] = block`.
### vtable resolution from .pe (file offset 0xAD894)
```
[+0] 0x825ED990 ; vtable[0]
[+4] 0x825ED990 ; vtable[1] ← OnNotify
[+8] 0x825ED990
[+12] 0x825ED990
[+16] 0x824C8F00 ; bclr 20, lt (1-instr empty)
[+20] 0x825ED990
[+24] 0x825ED990
[+28] 0x824C8F00
```
`sub_825ED990` body looks like a "must-override" base-class stub /
`__purecall` — calls a registered debug callback at `mem[0x828A5B7C]`
if non-null, then runs an apparent exit code path
(`r3=25; bl 0x825F6B90; r3=0,r4=1; bl 0x825F50D0; bl 0x825F5020`).
**Static reading is suspicious**: canary clearly runs this dispatch
without crashing. Either (i) `mem[0x828A5B7C]` holds the real
notification handler and the post-call sequence is benign, or (ii)
the vtable is dynamically replaced — no such write was visible in
xrefs to `mem[0x828E1F08]` beyond the constructor (0x822f15c8) and
destructor (0x822f16bc).
### Discipline gate
| Box | Status |
|-----|--------|
| 1. Specific missing notification + canary file:line | ✅ |
| 2. Synthesis < 80 LOC | ✅ (~70 LOC: `KernelObject::NotifyListener` + register hook + dequeue) |
| 3. Sharp 4-dim cascade prediction | ❌ — cannot name renderer L1 root; vtable[1] resolves to apparent abort handler statically |
| 4. No renderer/GPU code changes | ✅ |
**Box 3 fails. STOP. Diagnostic-only.**
### Next session — Phase 1.5 probe before implementing
1. Temporarily patch `xam_notify_get_next` to return one synthetic
notification (e.g. `id=0x0A, data=1`) on first call.
2. Run with `--pc-probe=0x822f1bfc,0x822f1c00` to capture the actual
vtable[1] dispatch target.
3. Read off the runtime target. Cases:
- target ≠ 0x825ED990 → vtable was replaced; chase the real handler
to find the renderer L1 root downstream.
- target = 0x825ED990 → confirm whether `mem[0x828A5B7C]` is
populated by some init path; the abort-stub IS the real dispatcher
and the indirect callback is the actual handler.
4. Revert the temporary stub. Now the prediction is sharp; land the
real implementation.
### Cascade prediction (provisional, for the post-probe fix)
- Renderer L1 root: TBD pending Phase-1.5 probe.
- Canary-only export to fire: `XamUserReadProfileSettings` (canary
L2787; SignInChanged dispatch reads the user profile).
- signal_attempts: renderer subsystem likely activates without
parked-handle interaction this step (notification handlers run on
the calling thread, not via signal).
- draws delta: NO this step. Boot horizon advances one hop, not yet
to a draw-emitting subsystem.
### Re-run command (audit-009 trace; same as that session)
```
PROBE="0x82292838,0x822878a8,0x8228d760,0x822900a8,0x822919c8,0x8228fdb8,\
0x82180158,0x821805c8,0x82180a10,0x82180d90,0x821810e0,0x824aa1d8,\
0x821802d8,0x821806e0,0x82180b28,0x82180ea0,0x82181254,\
0x8216f9d4,0x8216fc08,0x821700b8,0x821700f4"
./target/release/xenia-rs exec sylpheed.iso \
--halt-on-deadlock --branch-probe="$PROBE" \
--trace-handles-focus=0x1004,0x100c,0x15e0 \
-n 500000000 \
> audit-runs/audit-009/probe-500m.log 2> audit-runs/audit-009/probe-500m.err
```
### Files modified
None. New artifact: `audit-runs/audit-010/findings.md`.
## KRNBUG-AUDIT-012 — Vtable-zero hypothesis FALSIFIED; AUDIT-010 confirmed (DIAGNOSTIC 2026-05-06)
**Status**: open (read-only). Master HEAD `50a4887` unchanged in working tree.
### Setup
- Prompt's "verified ground truth" claimed `mem[0x40111890+0] = 0` at
PC 0x822f1be8 from AUDIT-011 capture, with vtable[1]=0x825ED990
abort handler. Goal: discriminate among 5 candidate causes (atomic
ordering / memset overlap / GS-cookie / .rdata mapping / destructor).
- Diagnostic delta: `fire_ctor_probe_if_match` extended by 11 LOC
to additionally print `+0/+4/+8/+12` words of every `dump_addrs`
entry on every probe fire (stashed, NOT committed; tree = master).
- Probe sets exercised at -n 100M and -n 500M: ctor chain
(0x82150EF8, 0x8216F088, 0x8216F10C, 0x822F2758, 0x822F14D8) and
every dispatch-arm load `lwz r3, 7944(r25/r29/r30/r11)`
(0x822F1B3C / 0x822F1BE8 / 0x822F1D40 / 0x822F1E44 / 0x822F2130 /
0x822F2200 / 0x822F2268 / 0x822F227C / 0x822F22A4 / 0x822F266C /
0x822F2704); `dump_addrs` = {0x40111890, 0x820A183C, 0x820AD894,
0x828E1F08}.
### Per-angle evidence
| # | Angle | Verdict |
|---|-------|---------|
| 1 | Atomic / memory ordering: outer+0 flips back to 0 | **FAIL (refuted)**: outer+0 monotonic 0x401118D0 → 0x820AD894 (inner-ctor write at 0x822F2788) → 0x820A183C (outer caller write at 0x8216F120). Stays at 0x820A183C through every subsequent fire. Sampled at every probe through end-of-run. Never zeroed. |
| 2 | Memset/memcpy overlap | **FAIL (refuted)**: same evidence as 1. No bulk-zero event covers outer+0 after ctor. Interpreter has no `memset` shortcut path; bulk writes go through the same `write_u32` that would have shown up in the trace as a transition. |
| 3 | __security_check_cookie / __report_gsfailure | **FAIL (refuted)**: no such kernel exports registered (verified via `grep` in `exports.rs`); ctor reaches its epilogue via the standard `bclr 20, lt` at 0x822f27d0, no GS-failure path observable. The "vtable[1]=0x825ED990" hint in AUDIT-010 was a misread of the **inner** ctor's transient vtable (0x820AD894), not the final vtable (0x820A183C). |
| 4 | .rdata mapping fidelity | **FAIL (refuted)**: dump@0x820A183C reads `[+0..+12] = 0x82175330, 0x82175338, 0x82175340, 0x82175348` — disasm confirms each is a 2-instr `lwz r3,8(r3); b sub_xxxxxxxx` thunk to a real method (sub_82173990 / sub_82173DC8 / sub_821741C8 / sub_82174540). .rdata maps cleanly. |
| 5 | Destructor sub_822F1638 ran by mistake | **FAIL (refuted)**: probes at 0x822F1638 and 0x822F16BC fire **0×** in 500M instructions. Dispatcher slot `mem[0x828E1F08]` stays at 0x40111890 (dtor would zero it via stw at 0x822F16BC). Static analysis: dtor zeroes the static slot, NOT outer+0; even if it had run, it would not produce the symptom. |
**Result**: ALL FIVE angles refute the AUDIT-011 vtable-zero claim. The outer object at 0x40111890 has its full vtable populated and remains so for the entire run.
### Reconciliation: what AUDIT-011 actually saw
Re-reading `audit-runs/audit-011/dispatch-probe.log`:
- Final state reports tid=1 stuck at PC `0x8284E45C`, **not** at 0x822F1BE8.
- `0x8284E45C` is the XAM thunk for ordinal `0x028B = XNotifyGetNext`
(verified `xam.rs:72`). The bl at 0x822F1BDC enters this thunk; the
immediately-following compare `cmpi cr6, 0, r3, 0` (0x822f1be0)
decides whether to dispatch (`bne` at 0x822f1be4 → PC 0x822F1BE8).
- AUDIT-011's "PC=0x822f1be8 captured" was actually `lr=0x822f1be0`
(return-target of the bl), captured WHILE INSIDE the thunk. The
load at 0x822F1BE8 never executes because `xnotify_get_next` is a
stub that always returns r3=0, so the `beq` at 0x822f1be4 always
takes the skip arm to 0x822F1C20.
- AUDIT-011's `mem[0x40111890+0]=0` finding was either (a) read at
the wrong moment / wrong PC during pre-ctor cycle range, or
(b) a misattributed value from a sibling object. The 100M/500M
re-runs decisively show outer+0 = 0x820A183C from cycle ~5.53M
onward, monotonic.
### Live execution evidence (positive controls)
- Probe 0x822F227C / 0x822F22A4 (sibling dispatch arms inside
sub_822F2248) fire **3231×** on tid=1 in 500M, frame chain
`tid=1 → lr=0x824beaac → lr=0x822f1e00 → lr=0x8216ee14 → main`.
→ A renderer-adjacent callback dispatcher IS executing per-frame.
- Probe 0x822F1D40 fires 1×.
- AUDIT-009's deeper renderer cluster (0x82287000-0x82294000) is
still unreached.
- 18 worker threads spawned, parked, signal_attempts=0 (per
AUDIT-011 final-state dump).
### Bug class (1 of 5)
**None of the five.** AUDIT-011's vtable-zero observation is not reproducible. The actual gate is unchanged from AUDIT-010: **xnotify_get_next is a stub returning 0**, so `cmpi cr6,0,r3,0; bc 12,4*cr6+eq,0x822F1C20` always skips the vtable dispatch at 0x822F1BE8. Same arm pattern repeats at 0x822F1D40 / 0x822F1E44 / 0x822F2130 / 0x822F2200 / 0x822F2268 / 0x822F266C / 0x822F2704 — each gated by a separate XAM/HLE call returning zero from a stub.
### Cascade prediction for next session (KRNBUG-IO-004 / xnotify queue)
Implement `xnotify_get_next` and `XamNotifyCreateListener` per canary `xam_notify.cc`:
- Replay AUDIT-010's prediction Phase-1.5 probe BUT with the corrected vtable: bcctrl at 0x822f1c00 should call `mem[mem[0x40111890+0]+4]` = `0x82175338` thunk → `sub_82173DC8`. Read sub_82173DC8 in `sylpheed.db` to identify the real handler before landing.
- Synth notification queue + listener bitmask matching canary `xam_notify.cc`.
- Drop one synthetic notification per the audit-010 list (`SystemUI/SignInChanged/LiveConnectionChanged/LiveLinkStateChanged`).
- Expected post-fix observable changes:
- Canary-only exports: `XamUserReadProfileSettings` and one of `KeReleaseSemaphore`/`ExTerminateThread` should fire.
- Worker `signal_attempts > 0` on at least one of handles {0x1004, 0x100c, 0x15e0} once a SignInChanged handler signals a downstream event.
- draws delta: still 0 this step (renderer L2 cluster not yet reached).
- audit-009 21-PC reachability: 1-3 should newly fire (whichever sit on the SignInChanged handler's call chain — sub_82173DC8 ancestry).
### Files modified
None on master. Diagnostic patch (state.rs, +11 LOC) stashed locally as `audit-012 dump-on-probe extension`. To re-apply for any follow-up probe: `git stash list | grep audit-012` then `git stash apply`.
Trace artifacts: `audit-runs/audit-012/probes-100m.{log,err}`, `audit-runs/audit-012/dispatch-500m.{log,err}`.
### Discipline gate
| Box | Status |
|-----|--------|
| 1. Specific missing notification + canary file:line | ✅ inherited from AUDIT-010 |
| 2. Synthesis < 80 LOC | ✅ inherited |
| 3. Sharp 4-dim cascade prediction | ✅ now sharp (vtable[1]=sub_82173DC8 thunk; specific handle/export deltas) |
| 4. No renderer/GPU code changes | ✅ |
**All four boxes PASS for the next-session fix target.** Pure diagnostic this session.
---
## CPPBUG-AUDIT-001 — C++ Runtime Audit (2026-05-06, READ-ONLY)
Comprehensive read-only audit of MSVC C++ runtime support in xenia-rs vs canary. Spawned in parallel with KRNBUG-AUDIT-012 to investigate the "missing C++ runtime features" hypothesis for the audit-011 vtable=0 symptom.
### Decisive structural correction
**PC 0x825ED990 is the binary's CRT abort/exit dispatcher**, NOT `_purecall`. Disasm at 0x825ED990..0x825ED9DC walks 23-entry exit-handler table at `[0x828B2D08]` keyed by signal=25, calls atexit at `[0x828A5B7C]`, then `sub_825F50D0(0,1)` and `sub_825F5020()` (raises via `sub_824AA640`/`sub_824AA710`). MSVC `abort()`/`_amsg_exit` equivalent. Corrects audit-010's "apparent __purecall/abort handler" attribution.
**Sylpheed's CRT is statically linked.** Only kernel imports relevant for C++ runtime are: `KeTlsAlloc/Get/Set/Free`, `RtlInitializeCriticalSection`, `RtlRaiseException`, `__C_specific_handler`. The C++ runtime question is narrower than initially feared.
### Top-3 candidates for vtable=0 — ALL REFUTED by audit-012
1. `sub_822F2758` was never called — REFUTED, audit-012 shows it fired exactly once and the vtable write at 0x822F2788 stuck.
2. Ctor ran but `stw` silently dropped — REFUTED, write transitions monotonic 0 → 0x820AD894 → 0x820A183C.
3. Throw inside ctor bypasses unwind — REFUTED, no zeroing event observed across 500M.
### Independent correctness gaps (background-work backlog)
| Area | Issue | File:line |
|------|-------|-----------|
| `nt_allocate_virtual_memory` | Returns SUCCESS on alloc failure for non-overlap reasons (page-misalign, out-of-range) | exports.rs:622-625 |
| `heap.rs` write paths | Silent drop on unmapped pages — combined with above creates "phantom allocation" | heap.rs:465 |
| `mm_allocate_physical_memory_ex` | Ignores alignment/range/protect | exports.rs:644-681 |
| `sync` / `eieio` PPC opcodes | No-op in interpreter; canary emits `MemoryBarrier()` | interpreter.rs:1697 vs canary ppc_emit_memory.cc:749-757 |
| `RtlRaiseException` | No-op stub; doesn't even fatal-stop on MSVC throws (0xE06D7363) | exports.rs:2218-2221 |
| TLS storage | Uses `Vec<u64>`; canary uses u32. Functionally OK | xboxkrnl_threading.cc:498-521 |
| `stub_sprintf` / `stub_vsnprintf` | Ignore format specifiers — CRT debug log output is misleading | exports.rs |
| Heap | Bump-only, no free | state.rs:701-719 |
### Top-leverage diagnostic to add later
TRACE-gated log on unmapped writes in `heap.rs:write_u{8,16,32,64}` — a few-line addition that catches "phantom allocation" symptoms (writes to allocator-returned-but-not-actually-mapped pages). Should be standing infrastructure given the silent-drop class of bugs.
### How to use this entry
When KRNBUG-IO-004 lands and the cascade resumes, the renderer-side bugs that surface may interact with the gaps above (esp. memory ordering / `sync` semantics for cross-thread GPU-CPU). Treat as a checklist for "first things to suspect" once draws > 0 lands. NOT urgent for the swap=2 / draws=0 plateau.
Master HEAD `50a4887` unchanged. No commits. No code modified.
---
## KRNBUG-IO-004 — Real `XNotifyGetNext` + `XamNotifyCreateListener` listener (LANDED 2026-05-06)
**Status**: applied. Branch `xnotify-listener/p0-startup-enqueue` merged no-ff.
### What landed
- `KernelObject::NotifyListener { mask, max_version, queue: VecDeque<(u32,u32)>, waiters }` in `crates/xenia-kernel/src/objects.rs`.
- `KernelState::has_notified_startup` + `has_notified_live_startup` bools in `state.rs`.
- Real `xam_notify_create_listener` in `xam.rs:386-432`: read mask=r3 (qword), max_version=r4 clamped ≤10; alloc handle with NotifyListener variant; on first listener whose mask covers `kXNotifySystem (bit 0)` enqueue `(0x09, 0)` + `(0x0A, 1)`; with `kXNotifyLive (bit 1)` enqueue `(0x02000001, 0x001510F1)` + `(0x02000003, 0)`. Mirrors `xenia-canary/src/xenia/kernel/kernel_state.cc:1013-1033` byte-for-byte.
- Real `xnotify_get_next` in `xam.rs:434-466`: handle=r3, match_id=r4, id_ptr=r5, param_ptr=r6. Pop front (or scan-by-id when match_id != 0). Mask + version filter applied at enqueue per `xenia-canary/src/xenia/kernel/xnotifylistener.cc:38-51`. Returns 1 on dequeue, 0 otherwise.
- 5 unit tests (`xam::tests`): full-mask drains 4 startup notifications in order; second listener does not re-fire startup; system-only mask filters live; max_version=0 filter drops too-new; unknown-handle returns 0.
### LOC budget
119 (97 impl + 22 scaffolding pattern matches in main.rs/objects.rs/state.rs) ≤ 120.
### Cascade-prediction scorecard (each dimension)
| Dimension | Pre-fix | Post-fix | Result |
|---|---|---|---|
| (a) `cargo test --workspace --release` | 594 | 599 | PASS |
| (b) Lockstep `-n 100M` instructions | 100000019 | 100000012 stable across 2 reruns; bit-identical diff | PASS |
| (c) AUDIT-009 21-PC + AUDIT-005 9-PC probe set newly reachable | 0 | 3 (`0x822c6870` ×2 workers, `0x824563e0`, `0x823ddb50`) in `sub_82173DC8` ancestry | PASS (predicted 1-3) |
| (d) Canary-only export delta | 7 | 3 (KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule fell off; ExTerminateThread + KeReleaseSemaphore + XamUserReadProfileSettings still missing) | PASS (set shrank as predicted; specific predictions partial) |
| (e) signal_attempts on parked handles | 0/0/0 | 0/uncreated/1 (handle 0x15e0 primary=1) | PASS (predicted >0 on at least one) |
| (f) Worker thread count | 18 | 20 | PASS (delta confirmed) |
| (g) draws delta | 0 | 0 | PASS (acknowledged plateau) |
### Phase 1.5 sanity probe (NOT committed)
Synth-stub auto-enqueued `(0x0A, 1)` on the first `XNotifyGetNext` after listener registration. Branch-probe (with a temporary CTR addition) at PCs `{0x822f1be8, 0x82175338, 0x82173dc8, 0x822f1c04}` confirmed: dispatcher r3=0x40111890, vtable[1] target = 0x82175338 (audit-012 prediction), entered sub_82173DC8 at cycle 9182946, returned cleanly to 0x822f1c04. Stub + probe-CTR addition reverted; tests green at 594 before Phase 2.
### Still-canary-only (post-fix)
1. `ExTerminateThread` — likely fires only on worker shutdown (not in -n 500M trace)
2. `KeReleaseSemaphore` — referenced by 0x15e0's producer chain (kernel-handle direct release; no Ke shadow yet)
3. `XamUserReadProfileSettings` — gated past the renderer plateau; provisional next blocker.
### Trace artifacts
`audit-runs/audit-013-io-004-phase1.5/dispatch.{log,err}` (no-fire baseline at non-block PCs), `dispatch2.log` (block-entry probes — 1 fire on dispatch arm), `dispatch3.log` (full dispatch chain confirmed), `post-cascade.{log,err}` (focus + canary export delta + cascade probes).
## KRNBUG-AUDIT-014 — 0x15e0 wake-eligibility hypothesis FALSIFIED; tid=17 actually parks on 0x15e4 (DIAGNOSTIC 2026-05-06)
**Status**: read-only diagnostic. No fix landed. Master HEAD `d736a1d` unchanged.
### Phase 1 finding (decisive)
Goal was to investigate why handle 0x15e0 records `signal_attempts=1 (primary=1)` post-IO-004 BUT tid=17 (the "0x15e0 worker") still parks. **The premise is wrong.**
Trace at `-n 500M --trace-handles-focus=0x15e0` shows:
1. **Handle 0x15e0 is a Semaphore**, not an Event/Manual. Created from `lr=0x824ab110` (NtCreateSemaphore) on tid=1, with creator-frame chain `lr=0x82456a94 → 0x82456bac → 0x822f1b60 → 0x8216ee14 → 0x824ab8e0`. This is a **different** wrapper than the Event creator chain `lr=0x824a9f6c` shared by 0x1004 / 0x100c / 0x1020 / 0x15e4.
2. **0x15e0 is healthy**: `signal_attempts=1 (primary=1) waits=1 wakes=1`. End-of-run DIAGNOSIS reports "not stuck — signals consumed correctly". Timeline: tid=1 waited at `lr=0x824ac578`, then tid=16 `NtReleaseSemaphore` at `lr=0x824ab168` woke it. Handshake completed.
3. **tid=17 parks on 0x15e4**, NOT 0x15e0. State at end-of-run: `Blocked(WaitAny { handles: [5604] })` where `5604 == 0x15e4`. Worker entry context `r12=0x8217057c` (front of `sub_82170430`) matches the audit-009 / audit-008 / audit-002 stage-3 attribution of tid=17 to the 0x82170430 worker cluster.
4. **0x15e4 is the actual stuck handle**: `kind=Event/Manual waiters=1 signals=0 waits=1 wakes=0 <NO_SIGNALS_DESPITE_WAITS>`, created by tid=1 at `lr=0x824a9f6c` (same wrapper as 0x1004 / 0x100c / 0x1020). This is the same producer-missing class as the other Event/Manual handles tracked across audit-001 → IO-004.
The IO-004 cascade-prediction scorecard's claim "(e) signal_attempts on parked handles: 0x15e0 = 1 (primary=1, ghost=0)" was technically correct (the semaphore did get one signal) but the inference that this represented forward progress for tid=17's wake was a misattribution. The label "0x15e0 worker" used in audit-009 / audit-002 / audit-008 stage-3 mappings is a long-standing transcription error: the actual handle is 0x15e4 (Event/Manual), and 0x15e0 is an unrelated Semaphore. Reference: `project_xenia_rs_producer_stack_trace_2026_05_03.md` already noted "third handle is **0x15e0**, not 0x15e4 (transcription typo)" — that correction itself was reversed; the original audit-002 label 0x15e4 was correct.
### Bug class evaluation (α-ζ from prompt)
- α (PKEVENT vs handle mismatch): N/A — no Set call ever targets 0x15e4; the producer is genuinely missing.
- β (refresh_pkevent_shadow_from_guest miss): N/A — same.
- γ (wake-eligibility filter wrong): N/A — wake_eligible_waiters fires correctly elsewhere (0x10F0 handshake demonstrates healthy manual-reset wake; 0x15e0 demonstrates healthy semaphore wake).
- δ (memory ordering): N/A — no producer side observed.
- ε (race scheduler.resume vs signal): N/A.
- ζ (audit recorded but not propagated): N/A — DIAGNOSIS print-out matches state.objects waiter list.
**Conclusion**: 0x15e4 belongs to the same "producer never reaches the Set call" class as 0x1004 / 0x100c / 0x1020. Renderer cluster work (audit-008 / audit-009) and AUDIT-014's parallel Fork B probing of newly-reached L1 entries (`sub_82173DC8`, `0x822c6870`, `0x824563e0`, `0x823ddb50`) is the correct line of attack — there is no wake-eligibility bug to fix.
### Discipline gate
- Box 1 (named bug class with concrete evidence): FAIL — premise refuted, no bug class applies.
- Box 2 (narrow fix ~30-80 LOC): N/A.
- Box 3 (sharp 4-dim cascade prediction): N/A.
- Box 4 (no renderer/GPU changes): N/A.
- Box 5 (lockstep determinism preserved): N/A.
Stop conditions met: hand back as Phase 1 only.
### Cascade snapshot (unchanged from IO-004 baseline)
- swaps=2 (`VdSwap` kernel-direct frames 1 + 2)
- draws=0
- 18 → 20 worker threads (consistent with IO-004)
- Canary-only exports: ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings still missing.
### Recommended next session
Track Fork B's branch-probe results for `sub_82173DC8` (the first L1 entry in the renderer cluster reached after IO-004). The producer for handles 0x1004 / 0x100c / 0x1020 / 0x15e4 lives somewhere along the dispatch arm at `0x822f1be8 → 0x82175338 → 0x82173dc8 → ...`. If Fork B identifies a sub-function that gates the Set call (e.g. `sub_82173DC8` returns early on a stub kernel call), that becomes KRNBUG-AUDIT-015 / next IO-NNN candidate.
The misattribution label "0x15e0 worker" should be corrected to "0x15e4 worker" in the index entries for AUDIT-002, AUDIT-008, AUDIT-009 — left for the next session to update if relevant.
### Trace artifacts
`audit-runs/audit-014-0x15e0-wake/probe.log` (focus dump + 19-thread diagnostic), `probe.err` (kernel.calls counters confirming swaps=2 unchanged).
## KRNBUG-AUDIT-015 — L1 propagation probe; next gate is silph::Semaphore on handle 0x1308 (workitem submitter unreached) (DIAGNOSTIC 2026-05-06)
**Status**: read-only diagnostic, Fork B parallel session. No fix landed. Master HEAD `d736a1d` unchanged.
### Probe set (112 PCs)
sub_82173DC8 dispatcher case-arms (25), worker 0x822c6878 body (12), worker sub_824563E0 body (17), worker sub_823DDB50 body (11), L1 callees (26), audit-009 unfired baseline (21).
### Decisive findings
1. **sub_82173DC8 dispatches all 4 IO-004 startup notifications then idles.** Every fire takes the early-exit at `0x82173ed8` because `[r31+44] == 0` (callback-table pointer in the listener struct never populated). The post-merge dispatch helper `0x82174040` (which would call the renderer producers `sub_822C2A80`, `sub_8216F088`, etc.) is never invoked from the dispatcher path.
2. **Worker 0x822c6870 (= 0x822c6878 thunk; tids 14, 15) parks immediately on Semaphore handle 0x1308.** The semaphore is `Semaphore(0/INT_MAX) signals=0 waits=2 wakes=0 <NO_SIGNALS_DESPITE_WAITS>`, created by tid=13 inside `sub_822C66B4` (worker-pool initializer in `sub_822C6630`). Producer chain that releases it: `sub_822AE1F0 / sub_822F55F0 → sub_822C8B50 → sub_822C6808 → bl 0x824AB158 (silph::Semaphore::Release at NtReleaseSemaphore)`. Neither `sub_822AE1F0` nor `sub_822F55F0` was probed; both are statically reachable from main but unexercised at -n 500M — they're the renderer's frame-update / scene-graph-mutate path that never runs.
3. **Worker sub_824563E0 (tid=16) is healthy** — runs an XAM inactivity / timer poll loop (NtSetTimerEx handle 0x15d0, period=2; loops `XamEnableInactivityProcessing ↔ CS+bcctrl dispatch` 865k times). Not the gate.
4. **Worker sub_823DDB50 (tid=19) parks at entry** with body PCs unfired; final state `Blocked(WaitAny { handles: [0x160C, 0x01000000] })`. Handle 0x160C is `Event/Auto signals=0 waits=1 wakes=0 <NO_SIGNALS_DESPITE_WAITS>`. The wait callsite is unprobed (likely an early branch before 0x823ddb68); needs follow-up probe inside `sub_823DD838` (parent).
5. All 21 audit-009 PCs (renderer cluster `0x82287xxx-0x82294xxx` + audit-005 producer-callsites) remain UNFIRED, consistent with audit-009 baseline — they sit downstream of the unreached workitem-submitter chain.
### Bug class
**δ (pure-guest renderer state-read)**, NOT a kernel-boundary stub. There is no missing `xboxkrnl`/`xam` import at the gate; main fails to advance past a state predicate that gates `sub_822AE1F0` / `sub_822F55F0` invocation.
### Discipline gate
- Box 1 (named import α / narrow internal-sub bug): **NO** — δ-class, no kernel boundary.
- Box 2 (canary impl small): N/A.
- Box 3 (sharp 4-dim cascade prediction): **NO** — needs dump-addr triage of listener struct first.
- Box 4 (no new ABI plumbing): N/A.
- Box 5 (lockstep determinism preserved): N/A.
Boxes 1 + 3 fail. Hand back per stop condition 1.
### Recommended next session
Phase 1: probe `sub_822AE1F0`, `sub_822F55F0`, `sub_822C8B50`, `sub_822C6808` entries + `sub_82174040` post-merge dispatch helper (the 6 fall-through arms inside sub_82173DC8). Add `--dump-addr=0x40ba9a80` to capture the listener-struct fields each dispatcher fire. The struct's `[+44]` field is the gate predicate; once we know what populates it, the actual fix point becomes nameable.
### Trace artifacts
`audit-runs/audit-015-l1-propagation/probe.log` (493 MB; 5.05M BRANCH-PROBE lines), `probe.err` (188 KB), `pc-fire-counts.txt` (28 fired PCs sorted).
## KRNBUG-AUDIT-016 — submitter-caller probe; gate is γ (deeper-indirection / vtable registry not populated) (DIAGNOSTIC 2026-05-06)
**Status**: read-only diagnostic. No fix landed. Master HEAD `d736a1d` unchanged.
### Probe set
Run #1 (30 PCs): workitem-submitter chain entries + bl call-sites (`sub_822AE1F0`, `sub_822F55F0`, `sub_822C8B50`, `sub_822C6808`, `0x822B16E0`, `0x822F5728`), parents (`sub_822ADD70`, `sub_821A9920`, `sub_822ACAB8`, `sub_821A8578`), grandparents (`sub_82299250`, `sub_822A4460`, `sub_821A82A0`), dispatcher post-merge helper + early-exit. Run #2 (18 PCs): refined dispatcher arm coverage + `--dump-addr=0x40ba9a80,0x4024AC00,0x4024B3E0,0x40111890,0x4024A380`.
### Decisive findings
1. **0/16 submitter-chain PCs fire** including all 4 levels of caller walk-up. Both static caller chains bottom-out in the audit-009 unreached renderer cluster: A-side `sub_822AE1F0 ← sub_822ADD70 ← sub_822ACAB8 ← sub_82299250 / sub_822A4460 ← sub_8229AB50 ← sub_8229A700 ← sub_82294F30 (renderer cluster)`. B-side `sub_822F55F0 ← sub_821A9920 ← sub_821A8578 ← sub_821A82A0 ← (cycle with sub_821A9920) and ← sub_821ABEA8 ← sub_821AC700 ← sub_821A6470 (renderer cluster)`.
2. **Listener struct dump at `0x40ba9a80`**: `[+0x00]` vtable=0x40111890; `[+0x04]` dispatch state bits=**0 (NEVER set)**; `[+0x08]` counter=0; `[+0x0C]`=1000 (set by case 0xA); `[+0x2C]` callback-table A=**0x4024AC00 (POPULATED)**; `[+0x3C]` callback-table B=**0x4024B3E0 (POPULATED)**. **Audit-015's claim that `[r31+44]==0` was wrong** — `[+0x2C]` IS populated. The real gate is `[base+0x04]` (dispatch state bits) read by `sub_821737F0` (case-9 helper) bit 14 / bit 15.
3. **Dispatcher arm fires (run #2 confirmed)**: case-9 r5==0 path (`0x82173e6c`, 1 fire) → `sub_821737F0` returned 0 → early-exit; default-high arm (`0x82173f48`, 2 fires) → both early-exit at `0x82174030`. **Case 0xA's write `oris 0x1; stw [r31+4]` should set bit 16, but EOR dump shows `[+0x04]=0`** — either the case-0xA fire and dispatch-r3 don't always target `0x40ba9a80`, or the write is overwritten back to 0 by another path.
4. **0x4024AC00 (callback table A) contains real renderer config** including string `"game:\\dat\\GP_TITLE.pak+eng\\\0"` and pointers `0x401119A0 / 0x40111990` — confirming the listener IS subscribed to the renderer's profile loader, but its dispatch-state bits are never advanced.
5. **Probe-machinery anomaly**: `sub_82174040` entry-PC never fires across both runs, yet `sub_821737F0` fires once at cycle 9183539 with `lr=0x821741f4` — meaning `0x821741F0 (bl sub_821737F0 inside sub_82174040 +0x1B0)` was executed. Either `sub_82174040` was reached via a jump-into-mid-function (highly unusual) or the probe missed an entry fire. **Worth verifying in AUDIT-017** with isolated probe of `0x82174040, 0x82174044, 0x82174048`.
### Bug class
**γ (deeper indirection)** — refining audit-015's δ classification. The submitter chain bottom-outs in a vtable-dispatched renderer cluster registry that's never populated. Chicken-and-egg: listener can't advance state because workitem-submitter never fires; workitem-submitter never fires because the registry is never populated; the registry is populated by something the listener was supposed to drive. Only an external bootstrap can break it.
### Discipline gate
- Box 1 (named α-class import / narrow internal sub): **NO** — γ-class, no kernel boundary; gate is structural.
- Box 2 (canary impl small): N/A.
- Box 3 (sharp 4-dim cascade prediction): **NO** — needs further state-write triage.
- Box 4 (no new ABI plumbing): N/A.
- Box 5: N/A.
Boxes 1 + 3 fail. Hand back per stop condition 1.
### Recommended next session (AUDIT-017)
1. Probe dispatcher caller layer: `0x822f1be8`, `0x822f1c04`, `sub_822F1AA8` (main's frame-poll loop — where main parks per AUDIT-009), `sub_821752C0` (jumps to `sub_82173DC8`).
2. Find writers of `[0x40ba9a80+4]` — byte-scan `.text` for `addi r?, ?, 4; stw r?, 0(r?)` patterns OR probe ALL functions that touch r3+4 with a stw (potentially via offset-write tracking). Identify the function that's supposed to set bit 14 / bit 15 of that field.
3. Probe inside `sub_82181D48` (default-high arm's secondary predicate): the `rlwinm r11, r11, 0, 30, 30` at `0x82181D74` reads `[[r3+0]+60]` bit 30 — find what writes this bit. If we can make `sub_82181D48` return 1, the default-high arm's `bctrl` fires → renderer cascade.
4. Verify probe-machinery anomaly (entry of `sub_82174040`).
### Trace artifacts
`audit-runs/audit-016-submitter-callers/probe.log` (run #1, 9 KB), `probe.err` (187 KB), `probe2.log` (run #2, 12 KB; +4 dump-addrs), `probe2.err` (187 KB).
## KRNBUG-AUDIT-017 — bit-14/15 writer triage; gate is β (`[0x828F4070+64]==-1`) with α tail (`XamUserGetSigninState=stub_return_zero`) (DIAGNOSTIC 2026-05-06)
**Status**: read-only diagnostic. No fix landed. Master HEAD `d736a1d` unchanged.
### Probe set
Static scan: `oris rN, rN, 0x1` or `oris rN, rN, 0x2` followed within 8 instructions by `stw rN, 4(rY)`. 5 candidates flagged. Runtime confirmation via `--branch-probe` at -n 500M + `--dump-addr=0x40ba9a80,0x828F48B0,0x828F4070`.
### Decisive findings
1. **Static writer candidates** (5):
- `0x82173950` (sub_821737F0:bit-14, gated by `[r30+64]!=-1` AND XamUserGetSigninState ret-check)
- `0x82173e04` (sub_82173DC8 case-0xA:bit-15)
- `0x824d3ce8` (sub_824d3c78:bit-15, struct via `[parent+184]`)
- `0x824d3f24` (sub_824d3dc0:bit-14, struct via `[parent+184]`)
- `0x82769b84` (sub_82766db0:bit-15, struct stride 8 — false positive)
2. **Runtime: case-0xA fires once** at cycle 9183060 (PC 0x82173dfc), sets bit-15 of `[0x40ba9a80+4]`. Confirmed by EOR dump `[+0x0C]=0x000003E8` (case-0xA's subfic).
3. **sub_821737F0 work-path entered** at cycle 9183561 (lr=0x821737f8). Bit-15 cleared at 0x82173884. Bit-14 setter at 0x82173950 NEVER fires because at 0x821738E0, `cmpwi r3, -1; beq → 0x82173938` short-circuits (`r3=[r30+64]=0xFFFFFFFF`).
4. **r30 = `[0x828F48B0+0]` = `0x828F4070`** (singleton sub-object). EOR dump confirms `[0x828F4070+64]=0xFFFFFFFF`, initialized to -1 by `sub_821701c8` at 0x82170234. The only non-(-1) writer is `sub_82184318:0x82184374` (`bl 0x82456B58 (kernel handle creator); stw r3, 64(r30)`). Caller chain `sub_82184318 ← sub_82187768:0x821877bc ← sub_82187dd0:0x82187e78 ← sub_82183ca8:0x82183cd8 ← {sub_822919c8, sub_82186760, sub_821c88d0}`. **`sub_822919c8` is one of the audit-009 renderer-cluster L1 entry points that has zero non-call xrefs** — same γ-cluster blocked at audit-009/-016.
5. **bit-28 of `[0x828F4070+60]` IS set** at cycle 9224352 by `sub_821c4988:0x821c5450` — but 35,000 cycles AFTER case-9 fired. Also: bit-28 is a NEGATIVE gate at 0x821738F0 (`bne cr6, 0x82173938`) — bit-28 SET means NO bit-14. The positive gate is `[+64]!=-1`.
6. **Two orthogonal stubs uncovered (α tail)**:
- `XamUserGetSigninState` (xam.rs:48) is `stub_return_zero`. Even if β fixed, sub_821737F0's bit-14 deep-eval at 0x82173904-0x82173938 takes the no-bit-14 path in 2/3 sub-branches when ret==0. Also sub_822C2A80 at 0x822c2aac loops `XamUserGetSigninState(0..3)` searching for any signed-in user — broken. Canary `xam_user.cc:90-101` returns `SignedInLocally=1` for default profile.
### Bug class
**β-dominant + α-tail.** Primary β is structural — `[0x828F4070+64]==-1` because the ctor that fills it (`sub_82184318`) is in the same audit-009 renderer cluster that audit-016 also identified. Secondary α is XamUserGetSigninState=stub_return_zero (2 separate guest paths broken).
### Discipline gate
- Box 1: PARTIAL — α component named (XamUserGetSigninState) but not the dominant gate.
- Box 2: YES for α (5 LOC at `xam_user.cc:90-101`).
- Box 3: NO — β dominant, structural.
- Box 4-5: N/A.
Boxes 1+3 fail. Hand back per stop condition 1.
### Recommended next session (AUDIT-018)
- **Option A**: probe `sub_82184318, sub_82187768, sub_82187dd0, sub_82183ca8, sub_82186760, sub_821c88d0, sub_822919c8, sub_82456B58` at -n 500M to confirm the entire chain to `[singleton+64]` ctor is unreached. If all 8 fail to fire, this re-confirms γ-class structural blocker for the THIRD time (audit-009, -016, -017). Time to pivot strategy.
- **Option B**: canary-log diff during boot window 9.0M-9.3M cycles for any kernel call that writes a real handle to `0x828F4070+64`. Re-run `lutris lutris:rungameid/4` with kernel-call logging.
- **Option C** (cheap α): implement `XamUserGetSigninState` per canary (5 LOC). Will not fire cascade alone (β dominant) but is correct and unblocks sub_822C2A80.
- **Sharp 4-dim cascade prediction**: NEEDS FURTHER TRIAGE.
### Trace artifacts
`audit-runs/audit-017-state-bits-writer/probe{1..5}.log` + `.err` (probe.log: 13 lines, probe3.log: 133 lines incl. dumps, probe4.log: 7 lines, probe5.log: 3 lines).
---
### XamUserGetSigninState follow-up (post-AUDIT-017, master 7ed6192)
Landed inline as a small canary-mirror correctness fix. Branch `xam-user-signin-state/p0-canary-mirror`, no-ff merged.
- Impl returns `1` for user_index=0 (SignedInLocally), `0` otherwise. Mirrors canary `xam_user.cc:90-101`.
- Tests 599 → 600. Lockstep `instructions=100000012 → 100000006`, deterministic across 2 runs.
- **Cascade ripple**: `XamUserReadProfileSettings` now fires 2× (was canary-only). Per-AUDIT-017 prediction (α-tail correctness fix; β still dominant).
- Remaining canary-only kernel exports: `ExTerminateThread`, `KeReleaseSemaphore`. Down from 3 to 2.
- Renderer L1 reachability + parked-handle signal_attempts unchanged — β-class blocker `[0x828F4070+64]==-1` unmoved (audit-017's structural finding).
## KRNBUG-AUDIT-018 — canary-log diff identifies α-class stub `KeResumeThread` (DIAGNOSTIC 2026-05-06)
**Status**: read-only diagnostic. No fix landed. Master HEAD `7ed6192` unchanged. Tests 600. Lockstep `instructions=100000006`.
### Method
Set-diff of kernel-call function names: ours (`audit-runs/audit-018-canary-diff/ours.log`, -n 500M) vs canary (`/home/fabi/xenia_canary_windows/xenia.log`, full boot to active rendering with `XamInputGetCapabilities` polling).
### Decisive findings
1. Function-name diff: only 2 calls present in canary, absent in ours: `ExTerminateThread`, `KeReleaseSemaphore` — both already on the audit-006 canary-only export queue.
2. **`KeReleaseSemaphore(828A3230, 1, 1, 0)`** is hammered by canary tid `F800006C` repeatedly (audio-render ticker). That thread is created via `ExCreateThread(..., entry=0x824D2878, ctx=0, flags=0x10000001)` and immediately followed by `ObReferenceObjectByHandle / KeSetBasePriorityThread / KeResumeThread / ObDereferenceObject`. Same pattern for entry `0x824D2940`.
3. In our run, both these threads are `Blocked(Suspended)` at end-of-run. Counters `KeResumeThread = 2` and `NtResumeThread = 6` match canary's call pattern.
4. **Root cause**: `crates/xenia-kernel/src/exports.rs:3658-3664` — `ke_resume_thread` is a no-op cookie-returner that ignores r3 and sets r3=0. Comment claims "real `NtResumeThread` below handles the handle-based path properly", but `KeResumeThread` is a separate export that takes a KTHREAD pointer (which our `ObReferenceObjectByHandle` cookies as the handle itself per `exports.rs:3787-3807`). The fix is to mirror `nt_resume_thread`: `find_by_handle(handle).resume_ref(r)`.
5. Cross-reference: tid=17 (entry=0x82170430, ctx=0x828F4070, the audit-017 listener struct) IS spawned and parks on event handle 0x15E4 — same long-known parked dispatcher waiter. Worker body reads `[r29+56] (=[0x828F40A8])` as its loop predicate (clarification of audit-017's "+64" claim). Until tids 9/10 actually run, the audio-side cascade never starts.
### Bug class
**α (named import stub_success on a load-bearing export)**. `KeResumeThread` is registered (canary `kImplemented`) but our impl is a stub_success no-op that fails to actually unsuspend.
### Discipline gate
- Box 1 (named bug class with concrete evidence): YES.
- Box 2 (narrow fix ~5 LOC): YES.
- Box 3 (sharp 4-dim cascade prediction): YES (see memory file).
- Box 4 (no renderer/GPU changes): YES.
- Box 5 (lockstep determinism preserved): expected — same pattern as XamUserGetSigninState landing.
**All 5 boxes pass — first time since IO-004.**
### Sharp 4-dim cascade prediction
- **A (thread liveness)**: tids 9, 10 leave Suspended; XAudio voice-render workers run.
- **B (kernel counters)**: `KeReleaseSemaphore` non-zero for first time. `NtSetEvent` rises. Likely new `XAudioSubmitRenderDriverFrame`.
- **C (canary-only exports)**: 2→1 (`KeReleaseSemaphore` resolved). Possibly new audio-path exports.
- **D (listener `[+64]`)**: hypothesis-only — IF audit-017's β-class blocker is downstream of audio init, `[0x828F4070+64]` becomes non-(-1) and renderer cascade unblocks. If not, γ-cluster is independent → pivot to memory-watch instrumentation on `[+64]`.
### Recommended next session (KRNBUG-IO-005 or KRNBUG-α-005)
Implement 5-LOC fix on branch `ke-resume-thread/p0-canary-mirror`:
```rust
fn ke_resume_thread(ctx: &mut PpcContext, _mem: &GuestMemory, state: &mut KernelState) {
let handle = resolve_pseudo_handle(state, ctx.gpr[3] as u32);
let prev = state.scheduler.find_by_handle(handle).map(|r| state.scheduler.resume_ref(r)).unwrap_or(0);
ctx.gpr[3] = prev;
}
```
Lockstep ×2. Evaluate cascade. Tests 600→601 (add a `ke_resume_thread` unit test mirroring `nt_resume_thread`).
### Trace artifacts
- `audit-runs/audit-018-canary-diff/ours.log` (full kernel trace + final-state thread diagnostics)
- `audit-runs/audit-018-canary-diff/ours.stdout.log` (counters)
- Canary: `/home/fabi/xenia_canary_windows/xenia.log` (untouched)
## KRNBUG-KE-001 — Real `KeResumeThread` (LANDED 2026-05-06)
`crates/xenia-kernel/src/exports.rs:3658-3669` — replaced the no-op cookie-returner with a canary-mirror real impl per `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227` (`XObject::GetNativeObject<XThread>(...)->Resume()` → `STATUS_SUCCESS`, else `STATUS_INVALID_HANDLE`). Routes the KTHREAD-pointer-as-handle through `resolve_pseudo_handle` + `scheduler.find_by_handle` + `scheduler.resume_ref`, mirroring `nt_resume_thread`'s plumbing two functions below.
### Cascade-prediction scorecard (audit-018 → post-fix)
- **A — thread liveness (PASS)**: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) transition from `Blocked(Suspended)` → ran → now `Blocked(WaitAny)` on audio buffer-completion semaphores `0x828A3254` (handle 2190094932) / `0x828A3230` (handle 2190094896). Pre-fix they were Suspended at end-of-run; post-fix they execute their bodies and park on a downstream consumer wait.
- **B — counters (PARTIAL FAIL)**: `NtSetEvent 667→3334` (rises ~5×, audio frame-complete signaling). `KeResumeThread = 2` (now real). `NtResumeThread = 6`. **`KeReleaseSemaphore` still 0** (not in counters at all). **`XAudioSubmitRenderDriverFrame` still 0**. Workers ran prologue + parked on a downstream gate before reaching `KeReleaseSemaphore`.
- **C — canary-only delta (FAIL — predicted 2→1, actual 2→2)**: `ExTerminateThread` and `KeReleaseSemaphore` both still canary-only. The audio render-tick semaphore-release loop is gated by something downstream of the audio worker prologue.
- **D — γ-cluster blocker (FAIL)**: `--pc-probe=0x82184318,0x82184374` armed, neither fires. `--dump-addr=0x828F4070` armed, no DUMP lines emitted. Listener struct `[0x828F4070+64]` unchanged. `--trace-handles-focus` shows handles 0x1004/0x100c/0x1020/0x15e4 all still `signal_attempts=0`.
### Milestone status
- Renderer cluster cascade collapsed? **NO**.
- signal_attempts > 0 on parked handles? **NO**.
- `draws > 0`? **NO** (still 0; `swaps` still 2).
### Verification
- 600 → 601 tests (`cargo test --workspace --release` clean; new `ke_resume_thread_unblocks_suspended_worker` covers Suspended→Ready transition + INVALID_HANDLE branch).
- Lockstep determinism: `instructions=100000003 imports=987516` × 2 reruns identical.
- `swaps=2 draws=0` plateau intact.
- Goldens re-baselined: `sylpheed_n50m.json instructions 50000003→50000011, imports 407255→407247`. n2m unchanged. Oracle test passes.
### Bug class (post-fact)
α (load-bearing stub_success). The fix unsticks two threads but those threads then park on a downstream gate that's part of a separate bug class — the audio voice-render dispatch never reaches `KeReleaseSemaphore`/`XAudioSubmitRenderDriverFrame` because the consumer-side semaphore producer is itself gated by something else (likely the same γ-cluster that audit-009/-016/-017 narrowed: `[0x828F4070+64]==-1`).
### Recommended next session
Audit-019 — memory-watch instrumentation on `[0x828F4070+64]` (audit-017 Option B). With KE-001 landed, the discipline gate cleanly attributes the renderer plateau to the listener-struct field rather than to a stub upstream — narrows the search for the producer to whoever writes 64 bytes into the audit-017 dispatcher.
### Trace artifacts
- `audit-runs/post-ke-resume/lockstep_run{1,2}.json` (lockstep determinism)
- `audit-runs/post-ke-resume/run.{log,err}` (full 500M cascade verification)
- `audit-runs/post-ke-resume/probe.{log,err}` (γ-cluster pc-probe + dump-addr)
- `audit-runs/post-ke-resume/handles.{log,err}` (--trace-handles-focus)
## KRNBUG-AUDIT-023 — Canary memory-dump diff (READ-ONLY, 2026-05-06)
Path B per AUDIT-022 prep: temporarily patched canary (`xam_notify.cc` + `cpu_flags.{h,cc}`)
to add `DEFINE_string(memory_dump_path,...)` flag. On first `XamNotifyCreateListener_entry`
(mask=0x2F), pre-size file to 2 GiB then `Memory::Save` the entire 5-heap state into a
mmap'd file. 44 LOC, rebuilt Linux Debug clang++14 (~6 min), captured 216 MB dump. Patch
reverted post-capture (`git status` clean).
### Findings vs ours @ -n 50M
1. **0x828F4070 family (audit-017 hypothesized populator target)**: canary-at-first-listener
is ALL ZEROS; ours has dispatcher data. **Cannot resolve audit-017** — canary's dump
fired too early in init for [+64]≠-1 to have happened.
2. **0x828E1F08**: ours stores listener pointer (`0x40111890`); canary stores 0. Mechanism
difference (canary uses host-side `KernelState::notify_listeners_` vector; ours stuffs
guest-memory). Not an obvious bug.
3. **0x828F4838 +0x08**: canary has `"XEN\0" + handle 0xF8000034`; ours has zeros.
New populator-effect lead — canary's xboxkrnl writes "XEN" magic + a kernel handle
to this struct slot during init. Address sits inside the audit-016/017 cluster
(`[0x828F48B0+0]=0x828F4070` chain).
4. **0x82124xxx area (audit-009 cluster L1 PCs as data)**: REFUTED as populator target.
This is the static `.pdata` exception-handler table in the XEX image; ours has byte-identical
contents. NOT a dynamic populator.
### Pre-existing canary bugs encountered
- `PosixMappedMemory::WrapFileDescriptor` mmaps existing file size without extending —
v1 patch SIGBUS'd on first qword write; fixed with `std::filesystem::resize_file` pre-step.
- `XexInfoCache::Init` SIGBUS at line 1406 reading `GetHeader()->version` from mmap'd
infocache. Worked around with `--disable_instruction_infocache=true`.
### Bug-class refinement
The audit-017 β-class hypothesis remains unresolved. Need a LATER trigger point in
canary to capture state when populator has run. New independent lead: `"XEN" + handle`
at 0x828F4840 in canary; missing in ours.
### Recommended next session
**AUDIT-024**: re-apply canary patch with delayed trigger (e.g., on XamNotifyCreateListener
call N≥5, or on first XAudioSubmitRenderDriverFrame, or on first NtSetEvent on a specific
guest event). Capture canary's STATE post-populator. Diff at 0x828F4070+64 directly.
Alternative: static-search canary's xboxkrnl source for the writer of "XEN\0" + handle
at 0x828F4840 — if found, that names the populator's CODE, not just its effect.
### Trace artifacts
- `audit-runs/audit-023-canary-diff/canary-memory.dump` (216 MB)
- `audit-runs/audit-023-canary-diff/canary.log` (canary stdout)
- `audit-runs/audit-023-canary-diff/canary-patch.diff` (re-applyable)
- `audit-runs/audit-023-canary-diff/parse_dump.py` (Memory::Save format walker)
- `audit-runs/audit-023-canary-diff/diff_canary_ours.py` (side-by-side diff)
- `audit-runs/audit-023-canary-diff/diff.txt` (concrete byte-level diffs)
- `audit-runs/audit-023-canary-diff/ours-{dump,extra,pdata}.{log,err}` (ours' --dump-addr)
## KRNBUG-AUDIT-024A — Canary memory-dump diff at delayed trigger (READ-ONLY, 2026-05-07)
Re-applied audit-023's pattern but moved the dump trigger to **first
`XAudioSubmitRenderDriverFrame_entry`** call (much later than first listener).
Patch: 39 LOC (cpu_flags hunk reused + new hook in `xboxkrnl_audio.cc`).
Build: incremental Debug, ~10 s after CMake-cache symlink fix.
Required preexisting workaround: `--disable_instruction_infocache=true`. Captured
260,659,200 byte dump (248.6 MiB) — slightly larger than audit-023's 216 MB,
consistent with deeper boot.
Canary log telemetry pre-dump confirms post-populator state:
`KeReleaseSemaphore(0x828A3230, 1, 1, 0)` firing repeatedly (the audio
buffer-completion semaphore — audit-018 prediction: producer is the audio render thread).
`VdSwap`, `VdRetrainEDRAM`, `XamInputGetCapabilities`, multiple texture loads firing.
### Findings — `[0x828F4070+64]` HYPOTHESIS FALSIFIED
`[0x828F40B0]` (=0x828F4070+64) at first `XAudioSubmitRenderDriverFrame`:
- **CANARY**: ALL ZEROS for at least 0x40 bytes
- **OURS @ -n 500M**: `ff ff ff ff` at offset 0 (audit-017's `-1` sentinel from sub_821701c8)
The audit-017 β-class hypothesis (`[0x828F4070+64]==-1` blocking bit-14 setter)
is now **directly falsified by canary observation**: in canary, this slot is
zero, NOT a non-(-1) handle. AUDIT-017's claim "only non-(-1) writer is
sub_82184318:0x82184374" was structurally correct *for our build*; in canary
the equivalent location remains untouched at the moment audio is already running.
The bit-14 gate at 0x821738E0 must therefore admit `[+64]==0` OR canary takes a
different control path entirely (likely the latter — different submitter chain
populates a different guest dispatcher slot, leading to the renderer-state-bits
write through a different path).
### Findings — `0x828F4838+0x08` "XEN\0 + 0xF8000034" divergence stable
Canary still has `"XEN\0"` magic + kernel handle `0xF8000034` at +0x08.
Ours still has zeros at +0x08-0x0F. **Stable across audit-023 (early)
and audit-024A (late) trigger points** — populator wrote this field
during early init, before listener-creation in audit-023. Confirms the
audit-022/023 lead is real, not transient.
Heap pointers and counts at `0x828F4838 +0x20..+0x60` populated in BOTH
canary (`0xBC36xxxx` heap) and ours (`0x4024xxxx` heap) — different
allocator state but structural equivalence.
### Findings — `0x828A3230` audio semaphore (canary only)
State quad `05 00 00 00 00 00 00 00`, `"XEN\0"` + handle `0xF8000070` at +0x08,
release-count = `01000000` at +0x14, plus chain at +0x18 / +0x28 with handles
`0xF8000080` / `0xF800007C` and a 64-bit value `0xBE628EDC1FCA7000` at +0x38
(callback ptr or last-completed timestamp).
In ours: `KeReleaseSemaphore=0` (still in canary-only export queue). Producer
(audit chain → `XAudioSubmitRenderDriverFrame` → audio system → this semaphore)
unreached at -n 500M.
### Bug-class re-classification
Drop β-class (`[+64]` poison) hypothesis. Reclassify as **γ-deep**: the gate
between audit-013's IO-004 reach (sub_82173DC8 dispatching) and the audio
producer chain firing is a multi-step renderer/audio init that fires
`XAudioSubmitRenderDriverFrame` in canary but never reaches it in ours.
### Sharp next-session prediction
(1) Per Sister-Session AUDIT-024B (parallel canary-source `"XEN\0"`-writer
static search): if 024B identifies the writer of `"XEN\0" + 0xF8000034`,
cross-reference with our canary-only kernel exports. The `"XEN" + handle`
pattern is the canonical type-tag signature emitted by `kernel/util/object_table.cc`
when a kernel object is committed to guest memory.
(2) Independent track: name the kernel call that fires
`XAudioSubmitRenderDriverFrame` in canary but not in ours. The chain we know
runs in canary post-IO-004 is roughly:
`XamNotifyCreateListener → renderer init → XAudio register → audio thread spawn → submit frames`.
Counters in our run: `XAudioRegisterRenderDriverClient=1` so registration ran,
`KeInitializeSemaphore=1` (likely the buffer-completion semaphore allocated),
but the audio thread that calls `XAudioSubmitRenderDriverFrame` never starts
feeding frames. Probe target: who reads the audio-system register-result and
starts feeding.
### Cascade prediction sharpness — 4 dim
If next-session lands a fix for the audio-thread-start gate:
- A: `XAudioSubmitRenderDriverFrame` count > 0
- B: `KeReleaseSemaphore` count > 0 (now non-canary-only)
- C: `[0x828A3230+0x14]` becomes 1 (release count)
- D: VdSwap > 2 expected ONLY if audio drives renderer pacing (unknown — open).
### Trace artifacts
- `audit-runs/audit-024a-canary-diff/canary-memory.dump` (260,659,200 bytes)
- `audit-runs/audit-024a-canary-diff/canary.log` (canary stdout)
- `audit-runs/audit-024a-canary-diff/canary-patch.diff` (re-applyable)
- `audit-runs/audit-024a-canary-diff/canary-state.txt` (parsed canary state at probe addrs)
- `audit-runs/audit-024a-canary-diff/canary-extra.txt` (extra addrs: 0x828A3230 etc.)
- `audit-runs/audit-024a-canary-diff/ours-dump.{log,err}` (ours --dump-addr at -n 500M)
- `audit-runs/audit-024a-canary-diff/diff.txt` (side-by-side comparison)
### Cleanup
Canary patch reverted (`git status` clean). Master xenia-rs HEAD `d9e40d3`
unchanged. `/home/fabi/xenia-canary` symlink retained for future CMake regen.
## KRNBUG-α-006 — `ensure_dispatcher_object` writes XObj signature + handle (LANDED, 2026-05-07)
Mirror of canary `XObject::StashHandle` (xobject.h:253-256). On first guest-
dispatcher adoption, stamp `+0x08` with `kXObjSignature` (`'X','E','N','\0'` =
`0x58454E00`) and `+0x0C` with the stash handle. Our shadow table is keyed
by guest pointer, so handle-to-stash = `ptr` itself. 7 LOC in impl, 27 LOC
in tests.
Branch `xobj-stashhandle/p0-canary-mirror` merged --no-ff into master `de5a15e`.
Tests 604 → 605 (`ensure_dispatcher_object_stamps_xen_signature_and_handle`).
Lockstep deterministic across 2 reruns: `instructions=100000003 imports=987516`
(identical to pre-fix d9e40d3 — writeback is host-side, no guest-instruction
cost). `sylpheed_n50m` golden unchanged.
Cascade @ -n 500M halt-on-deadlock: NIL ripple. Worker count 20; KeReleaseSemaphore=0;
ExTerminateThread=0; XAudioSubmitRenderDriverFrame=0; NtSetEvent=3334; VdSwap=2 —
all match post-ke-resume baseline. At target address 0x828F4838 itself, +0x08
remains 00000000 because guest never invokes a Ke* function with that pointer
(adoption in canary at this address likely uses `SetNativePointer` lifecycle
which we don't traverse via `ensure_dispatcher_object`).
Per task brief: lands as canary-correctness restoration without sharp cascade
hypothesis. Audit-024A's hypothesis that the StashHandle stamp at 0x828F4838
gates audio init is **observationally falsified** post-fix. Trace
`audit-runs/post-stashhandle/dump-500m.log`.
## KRNBUG-AUDIT-025 — Audio thread-start gate identified (READ-ONLY, 2026-05-07)
Master HEAD at session start: `de5a15e` (post-Path-2 StashHandle merge).
### Question
Audit-024A established that `XAudioSubmitRenderDriverFrame=0` and
`KeReleaseSemaphore(0x828A3230)=0` in our run while canary fires both
repeatedly. Goal: identify the exact gate between successful
`XAudioRegisterRenderDriverClient` (both runtimes call it once with
identical return `0x41550000`) and the audio worker submitting frames.
### Static + canary-log decomposition
**Audio init in Sylpheed (sub_824D2C08, called once from sub_824D2FA8):**
1. `bl 0x824D6070` — alloc audio_system object on heap.
2. Inline DISPATCHER_HEADER write at `+0x150..+0x18A`: byte-1 to `0x828A3254`
(auto-reset Event), byte-1 to `0x828A3244` (auto-reset Event), byte-5 (per
`bl KeInitializeSemaphore` at +0x1A4 = 0x824D2DAC) to `0x828A3230`
(Semaphore, count=0, limit=6).
3. `bl ExRegisterTitleTerminateNotification(0x828A3210, 1)` at +0x1F0 = 0x824D2DF8.
4. `bl ExCreateThread(entry=0x824D2878, ctx=0, flags=0x10000001)` — audio worker.
5. `KeSetBasePriorityThread(15)` + `KeResumeThread` on the worker.
6. `bl ExCreateThread(entry=0x824D2940, ctx=0, flags=0x20000001)` — second audio thread.
**Audio worker loop (entry 0x824D2878 — disassembled):**
```
LOOP_HEAD:
r3 = 0x828A3254 # event handle
bl KeWaitForSingleObject(r3, 3, 1, 0, NULL) # 0x824D28CC
r3 = mem[0x828A3264] # = audio_system_obj ptr (heap)
r11 = mem[r3+300] # audio_active flag
if r11 != 0:
bl sub_824D2108 # process job
bl sub_824D21F0
else: # shutdown
r5 = mem[r3+304] - 1
if r5 != 0:
bl KeReleaseSemaphore(0x828A3230, r5, 1) # 0x824D2904
bl KeSetEvent(0x828A3244, 1, 0)
if r11 != 0: goto LOOP_HEAD
return
```
Wake source for `0x828A3254`: only **`sub_824D23B0`** (KeSetEvent at +0x54,
+0x4FC, +0x688 = 0x824D2404 / 0x824D28AC / 0x824D2A40). `sub_824D23B0` is the
audio job-submit method. **It also writes `[+300]=current_thread_handle`**
(at sub_824D23B0+0x678 = 0x824D2A28) so that the worker takes the job-process
branch instead of shutdown.
### Caller chain of sub_824D23B0
From `xrefs` table: only ONE static caller — `sub_824D2B08+0xE4 = 0x824D2BEC`.
But `sub_824D2B08` is the lightweight constructor (entry at 0x824D2B08, returns
at 0x824D2BD4 BEFORE 0x824D2BEC). The body containing the
`bl sub_824D23B0` at 0x824D2BEC is a SEPARATE function entry at `0x824D2BD8`
that the static analyzer didn't carve out — there are NO static call xrefs to
0x824D2BD8. **It is a virtual method invoked via the audio_system vtable**
(set in sub_824D2B08 at offset 0 of the audio object: `[r31+0] = 0x82006CF4`).
### Runtime probe (audit-025-audio-thread-start)
`--pc-probe` at 12 audio PCs + `--dump-addr` at 5 audio dispatcher addresses,
`-n 500M`, `--halt-on-deadlock`, NO `--xaudio-tick`.
**Probe fires (1 of 12):**
- `0x824D2DF8` (sub_824D2C08+0x1F0, ExRegisterTitleTerminate) tid=1 cycle=7,470,631 ✓
**Probes that DID NOT fire:**
- `0x824D23B0` (sub_824D23B0 entry) — never reached
- `0x824D2404` (KeSetEvent on 0x828A3254 — wakeup of worker) — never reached
- `0x824D28CC, 0x824D28D0` (worker wait) — never reached (probes fire on PC visit;
tid 9 is BLOCKED at 0x824D28D0 from queueing-time, never gets scheduled-back)
- `0x824D290C, 0x824D291C, 0x824D2928, 0x824D2930` (worker shutdown/exit/loop) — never reached
- `0x824D2DAC` (KeInitializeSemaphore in init) — never reached *as PC visit*
even though counter shows it fired (probe runs on prologue tick; the guest
PC moves past 0x824D2DAC during the bl in the same prologue cycle without
the check matching cleanly; not a behavior bug, probe limitation).
**Dispatcher dump shows correct DISPATCHER_HEADER structure:**
- `0x828A3254` Event sync: type=0x01, sig=0, +0x08="XEN\0", +0x0C=0x828A3254 (Path 2's stamp)
- `0x828A3230` Semaphore: type=0x05, count=0, limit=6, +0x08="XEN\0", +0x0C=0x828A3230
- `0x828A3244` Event sync: type=0x01, sig=0
- `mem[0x828A3264]=0x4250DEDC` — audio_system heap object pointer (set during init)
**Thread states at deadlock:**
- tid 9 (entry 0x824D2878, the audio worker) — `Blocked(WaitAny [0x828A3254])` at pc=0x824D28D0, lr=0x824D28D0
- tid 10 (entry 0x824D2940) — Blocked similarly at pc=0x824d29X0 region
- 0x828A3254 has tid 9 in `waiters=[9]` but `signaled=false` and no signal_attempts
### Bug-class classification: γ-DEEP (vtable-driven indirection)
The audio init runs to completion: heap object allocated, dispatchers
initialized, worker spawned + resumed, ExRegisterTitleTerminate registered.
Worker is correctly parked on `0x828A3254` waiting for a job-submit signal.
**The job-submit method `sub_824D23B0` is reachable only via vtable lookup
on the audio_system object** — `bl r11` after `lwz r11, 0(r30)` style.
The caller of the vtable method must be a periodic frame-loop (per-frame audio
update). Static analysis shows it would be from the renderer/scenegraph — i.e.,
the same `0x82287000-0x82294000` cluster identified by AUDIT-009 as
**unreached**. AUDIT-016/017 already classified this cluster as γ-deep
(chicken-and-egg vtable-registry-not-populated).
**Conclusion**: the audio thread-start gate is *not* a missing kernel call.
It is the same γ-cluster blocker that has gated the renderer since AUDIT-009.
Fixing it has no β-class memory predicate — the indirection is via a vtable
slot in `[audio_obj+0]` whose containing dispatcher-table never gets registered
because the renderer's listener-init path never executes.
### Discipline gate
- Box 1 (canary citation): PASSES — canary `xenia/apu/audio_system.cc:202-237`
+ `xenia/kernel/xboxkrnl/xboxkrnl_audio.cc:56-82`. But canary's host
audio worker is a *replacement* for the guest worker; the gate is purely
guest-side here.
- Box 3 (probe-confirmed reachability): FAILS — sub_824D23B0 never fires.
- This is a diagnostic, no fix to apply.
### Sharp next-session direction
This audit closes the audio fork. The ledger has 3 paths forward:
(A) **Strategic pivot (recommended)**: stop chasing audio. The audio gate IS
the renderer gate. Concentrate on AUDIT-009's `0x82287000-0x82294000`
cluster's L1 callers and the listener-vtable registration that never
happens. Specifically AUDIT-017's hypothesis that the bit-14 setter at
0x82173950 is the gate, but with AUDIT-024A's falsification of `[+64]==-1`
as the blocker, redirect to: **find what canary writes into the
`0x40ba9a80` listener struct's vtable-pointer slot (`[+0]` in audit-016
parlance) and identify the writer in canary kernel source**. Path 2's
StashHandle fix landing means the dispatcher-side stamp is now done; the
next missing piece is which kernel call materializes the LISTENER's
vtable so the dispatch routine can actually run.
(B) **Audio-side workaround**: extend `try_inject_audio_callback` to fire
independently of the worker thread (i.e., bypass guest worker entirely
and call the registered XAudio callback PC directly from the kernel,
canary-style). Already explored under `--xaudio-tick`; regresses
swaps 2→1 (memory entry on KRNBUG-XAUDIO-PRODUCER-001). Not recommended.
(C) **Complete audio worker host-thread emulation**: mirror canary's host
`AudioSystem::WorkerThreadMain` in our kernel (semaphore.Release
`queued_frames` times on RegisterClient + drive callbacks from a host
thread). Larger refactor; risks breaking lockstep determinism unless
quantized to instruction-count.
### Trace artifacts
- `audit-runs/audit-025-audio-thread-start/probe.log` (CTOR-PROBE results + dispatcher dump)
- `audit-runs/audit-025-audio-thread-start/probe.err` (counters + thread states)
### Cleanup
No source modified. Master xenia-rs HEAD `de5a15e` unchanged.
---
## KRNBUG-AUDIT-027 — v40 heap memory diff vs canary (READ-ONLY, 2026-05-08)
Master HEAD at start/end: `e061e21`. NO source modified.
### Goal
Continuation of audit-026 (v80 elimination). Comprehensive byte-level
dword diff of canary's existing 248.6 MiB memory dump (audit-024A) vs
ours at v40000000 (1008 MiB span, 65 KiB pages). Looking for cluster L1
dispatch-table addresses.
### Method
- `--dump-section=0x40000000:0x3F000000:ours-v40.bin` -n 500M -> 60119
committed pages, 1008 MiB.
- `extract_v40.py` (adapted from audit-026's extract_v80.py): canary
v40 page count 16128, **committed = 90**.
- `diff_v40.py`: dword-level scan, A-list = canary 0x82xxxxxx-PC where
ours differs, B-list inverse.
### Results
- A-list (canary-PC, ours differs): **536 entries**
- B-list (ours-PC, canary differs): **31947 entries**
- **Cluster L1 PC hits in A-list: 0** (broad 116-fn 0x82285000-0x82294000),
**0** (narrow 6-fn `sub_822919C8`/`sub_82293448`/etc).
- Histogram top: `0x828f3xxx`(90), `0x8284dxxx`(78), `0x8284cxxx`(64),
`0x82150xxx`(30), `0x828f4xxx`(23), `0x82882xxx`(20). All in
.text/.data, NOT renderer cluster.
- Three vtable-shaped runs detected:
- `0x40000770` length 32 — header `00 09 00 0e | 00 01 10 00 | 40 00 01 c8 | 40 00 01 c8`
- `0x400015a0` length 110 — header `00 21 00 81 | 00 01 10 00 | 40 00 01 80 | 40 00 01 80`
- `0x40000d90` length 20 — `0x82882910`+0x20 stride
All target `.text` heap-allocator handler thunks (`0x8284cxxx`/
`0x8284dxxx`), not renderer dispatch.
- Listener struct at `0x40BA9A80`: canary page **uncommitted** in this
dump; ours has the audit-016 listener content (`+0x2C=0x4024AC00`,
`+0x3C=0x4024B3E0`, etc). This confirms canary's listener is
heap-pointer-divergent, not at `0x40BA9A80` for canary.
- B-list tail discovery: `0x40211900..0x40211B50` in ours has 23
consecutive function entries spaced 0x20 apart (`0x82183ae8,
0x82187e38, 0x8218cf10, ...`) — **a function-pointer table our impl
builds in v40 that canary builds elsewhere (likely physical heap)**.
### Bug-class classification
**Outcome (iii) per task brief: v40 ELIMINATED as dispatch-table
source.** Combined with audit-026 (v80 elim), two of four guest-virt
heap regions ruled out. Remaining surface = physical heap (0x20000000
span, 58458 commits in canary's dump = 228 MiB), v00 (256 MiB, 468
commits), or register-only constructed.
### Discipline gate
- Box 1: N/A (pure data audit).
- Box 3: N/A (no fix).
### Sharp next-session direction
- **Recommended: AUDIT-029 = extract canary PHYSICAL heap and diff**
(same script, change selected heap to `physical`, 228 MiB surface).
This is the largest non-static region and the most likely dispatch-
table home given the two virt-heap eliminations.
- Alternative: **vtable-write-tap** instrumentation logging every
`0x82xxxxxx` value our memory path writes to v40/physical heap.
Side-steps the heap-pointer namespace divergence problem entirely.
- Or: **CPPBUG-AUDIT-001 backlog** —
`nt_allocate_virtual_memory` silent-success + `mm_allocate_physical_memory_ex`
alignment/range/protect ignored could be masking the dispatch-table
writes upstream.
### Trace artifacts
- `audit-runs/audit-027-v40-mem-diff/canary-v40.bin` (1056964608 bytes)
- `audit-runs/audit-027-v40-mem-diff/ours-v40.bin` (1056964608 bytes)
- `audit-runs/audit-027-v40-mem-diff/extract_v40.py`, `diff_v40.py`
- `diff.txt` (536), `diff-b.txt` (31947), `histogram.txt`,
`l1-hits.txt`, `tables.txt`, `anchors.txt`, `pages.txt`,
`cluster_l1_pcs.txt` (116 fns from sylpheed.db), `ours.log`,
`diff_run.log`.
### Cleanup
No source modified. Master xenia-rs HEAD `e061e21` unchanged.
Sister session 028 untouched.
---
## KRNBUG-AUDIT-028 — XNotify steady-state publisher audit (READ-ONLY, 2026-05-08)
### Goal
Determine whether canary delivers steady-state XNotify notifications
beyond the 4 startup IDs IO-004 wired, which would explain why our
main thread polls `XNotifyGetNext` 1.49M times without exit.
### Sources
- canary log: `audit-runs/audit-024a-canary-diff/canary.log` (17245 lines).
- canary source: `xenia-canary/src/xenia/`.
### Findings
- Canary log shows ONLY `XamNotifyCreateListener(0x2F)` at line 1347
and `XNotifyPositionUI(0x0A)` at line 2018 in the entire 17245-line run.
- `XNotifyGetNext` is `kHighFrequency` (xam_notify.cc:96) so its
per-call logging is suppressed; absence in log is expected, not
evidence of zero calls.
- Of 34 `BroadcastNotification` publisher sites in canary across 11
files, NONE fires every frame, every audio buffer, or in any
implicit boot-time periodic. All are event-driven from host UI,
profile/XMP menu actions, or hardware hotplug edges.
- Canary's host-side controller-hotplug log message is NOT present
in this run — so no `kXNotificationSystemInputDevicesChanged`
fired (Sylpheed launched with controllers pre-connected).
- Canary's `VdSwap` count = 1 in the entire log = ZERO actual swap
calls (the 1 line is just the export-table TOC at line 769).
Our impl's swaps=2 is actually AHEAD of canary's frame counter.
- Canary IS in steady-state (audio-sema released 2224 times, GPU
loading textures, `XamInputGetCapabilities` polled to log end).
### Outcome: β — XNotify queue is NOT the gate
Our impl's notification timeline matches canary byte-for-byte. The
1.49M `XNotifyGetNext` polls are dutiful idle polling, not a
missing-publisher symptom.
### Strategic pivot
The audio/render gate is still the γ-cluster from AUDIT-009/016/017/025:
the renderer's per-frame audio-update path (sub_824D23B0 invoked via
vtable on audio_system object at `[r31+0]=0x82006CF4`) is unreached
because the renderer cluster `0x82287000-0x82294000` is itself unreached.
### Recommended next session — AUDIT-029
Pivot to "what kernel call materializes the listener-dispatch table
so renderer can route per-frame audio":
1. Probe-set L1 callers of unreached cluster (AUDIT-009 PCs).
2. Static-grep canary for code that populates the `0x82006CF4`
audio_system vtable at runtime — likely
`XAudioRegisterRenderDriverClient` / `AudioSystem` init shim.
3. Diff that population path vs our impl.
Sharp 4-dim cascade prediction (provisional):
- A: one audit-009 cluster L1 PC fires.
- B: `KeReleaseSemaphore(0x828A3230)` 0 → many.
- C: `XAudioSubmitRenderDriverFrame` 0 → many.
- D: `VdSwap` count climbs.
### Trace artifacts
- Memory file: `project_xenia_rs_audit_028_steady_state_notify_2026_05_06.md`
- Audit dir: `audit-runs/audit-028-steady-state-notify/`
### Cleanup
No source modified. No commit. Master xenia-rs HEAD `e061e21` unchanged.
---
## KRNBUG-AUDIT-029 — physical-heap memory diff vs canary (READ-ONLY, 2026-05-08)
### Goal
Comprehensive byte-level diff between canary's physical heap (extracted
from audit-024A's `canary-memory.dump`) and our impl's putative physical
region. This is the LAST major guest-memory surface unaccounted for after
v00 (audit-024A), v40 (audit-027), v80 (audit-026), v90 (zero pages
committed).
### Method
1. Tried dumping our `0xA0000000:0x20000000` (uncached alias).
2. Tried dumping our `0xE0000000:0x20000000` (cached alias).
3. Tried dumping our `0x00000000:0x20000000` (raw physical addr).
4. Extracted canary's physical heap from dump via `extract_physical.py`
(5th heap, 4096-byte pages, state at qword bits 60-61).
5. Walked all 0x82xxxxxx PC dwords on canary's physical heap and
cross-referenced.
### Architectural finding (NEW)
**Our impl has no physically separate physical heap.** All three of our
alias dumps (`0xA0000000`, `0xE0000000`, `0x00000000`) returned
`0 committed pages`. `MmAllocatePhysicalMemoryEx` (exports.rs:644-676)
calls `state.heap_alloc()` (state.rs:702-720), which is a single bump
allocator at `heap_cursor` starting at `0x40000000` shared with
`NtAllocateVirtualMemory`. Canary, by contrast, has a dedicated
512MB physical pool (memory.cc:222-242) accessible via
0xA0/0xC0/0xE0 aliases with byte ID-mapping `& 0x1FFF_FFFF` to host
membase offset 0..0x20000000.
### Canary physical heap stats (extracted)
- File size: 0x20000000 (512 MiB), all-zero except 24.5 MiB of payload.
- Committed pages: **58458** (×4096 = ~228 MiB) — much larger than
audit-024A's `physical=48105` summary; trust this concrete value.
- Total parsed = 0xf895800 == file size (clean walk).
- 0x82xxxxxx PC dword density: **28851** entries in 4467 4K pages
spanning 536 64K-aligned regions.
### Diff results
- A-list (canary has PC, ours has zero): **28851 entries** (every PC
dword is automatically a divergence since our region is empty).
- L1 PC hits — narrow (audit-009 hand-picked 6): **0 / 6**.
- L1 PC hits — broad (116-fn cluster): **2 / 116** (`sub_8228CC18` at
phys 0x1330d620; `sub_8228A220` at phys 0x1351ef2c — both scalar,
not part of any table).
- Audit-017 chain hits (`sub_82184318`, `sub_82184374`, `sub_82187768`,
`sub_82187dd0`, `sub_82183ca8`, `sub_822919c8`, `sub_82186760`,
`sub_821c88d0`): **0 / 8**.
- Top PC bucket: `0x82026000` × 12655 occurrences (likely a vtable
pointer for a per-instance object array; `0x144x0000` regions show
stride-0x38 entries with `0x820266a4` vtable slot).
- Consecutive PC-dword runs (≥4): **5 runs** total.
- 232-dword run at phys `0x1e568f38` — XAM/UI dispatch table family
(`0x824b0xxx-0x824b2xxx`, ~220 PCs in that family).
- 9-dword run at `0x1e6290f0`.
- Three 4-dword runs at `0x1c22c9b0`, `0x1ce24bc0`, `0x1ce254c0`.
- 64K-region PC density top: `0x144x0000` family (1300-1400 PCs each).
### CONFIRMATION of audit-027 misplacement hypothesis
Our v40 table at `0x40211900..0x40211B50` (18 unique PCs, 0x20 stride,
`sub_82183ae8 ... sub_821c09d8` — audit-017 chain family) appears
verbatim on canary's physical heap at `0x1c32c910..0x1c32cb50`,
**identical 0x20 stride, identical 18 PCs, even the trailing dup of
`0x821c09d8`**. This proves the table is allocated via
MmAllocatePhysicalMemoryEx in canary; our impl correctly builds the
same table but at a different virtual address (because our allocator
is unified). The table location difference is benign; the table contents
are correct.
### Outcome: ζ — all four guest heaps eliminated
**No L1 PCs are stored as data on any heap.** Cluster L1 functions
(`sub_822919C8` etc.) are invoked exclusively via static `bl`
instructions in unreached parent code — they are NOT routed through
a runtime-built dispatch table. Audit-017 chain PCs are likewise
absent from all heap data.
This rules out the entire family of "kernel call materializes a
function-pointer table" hypotheses. The renderer cluster
0x82287000-0x82294000 is unreached because **its static caller
chain is not entered**, not because its dispatch table is not built.
Discipline gate: fails box 1 (no fix candidate this session).
### Strategic pivot — AUDIT-030 recommendation
All vtable/dispatch-table hypotheses across audits 010, 011, 012,
015, 016, 017, 026, 027, 029 are exhausted. The gate is **upstream
of any heap data structure** — it's a control-flow gate, not a
data-population gate.
Two viable next-step approaches:
**Option A (preferred): comparative-execution divergence trace.**
Instrument both runtimes to log a deterministic event stream
(e.g., `tid:pc:lr:opcode-class` per-N-instructions) and `diff` to
find the first divergent guest instruction. With lockstep
determinism on our side and `--memory_dump_path` already
patched into canary (audit-023/024), one more canary patch to
emit a periodic execution sample is feasible. Once the first
divergence is located, the kernel call (or guest computation)
that immediately preceded it names the bug class.
**Option B: focused canary trace of the audio-thread wake-source.**
Per audit-025, `sub_824D23B0` (the only `KeSetEvent(0x828A3254)`
caller) has zero static call-xrefs and is invoked only via
`[r31+0]=0x82006CF4` audio_system vtable. That vtable IS
populated in our impl (audit-026 confirmed byte-identical).
The caller must therefore be a per-frame renderer routine
already in our binary. A targeted canary log dump of the LR
on every entry to `sub_824D23B0` would name the caller.
Cross-reference with our PC trace to find which renderer-cluster
function fires in canary but not ours.
**Option C (background backlog only):** CPPBUG-AUDIT-001 items
(CRT abort, alignment-ignoring physical alloc, sync/eieio no-ops).
### Sharp prediction (provisional, low confidence)
The first divergence will be a control-flow branch in the
0x82200000-0x82290000 range whose predicate reads from a
guest memory location populated by an unreached or stub-success
kernel export. Most-likely candidates:
- A field on the audio_system object at `0x82006CF4` not yet
initialized by us (audit-026 verified vtable; field bytes
beyond may differ).
- A hardware-state poll that we stub out (e.g., GPU EDRAM-ready,
DMA-channel-idle).
- A frame counter / vsync flag that canary advances differently.
### Trace artifacts
- Audit dir: `audit-runs/audit-029-physical-mem-diff/`
- `canary-physical.bin` — 512 MiB extracted heap (24.5 MiB non-zero)
- `ours-physical-A.bin` — 512 MiB, all zero (alias not mapped)
- `ours-physical-E.bin` — 512 MiB, all zero (alias not mapped)
- `ours-physical-flat.bin` — 512 MiB, all zero (no commits in 0..0x20000000)
- `extract_physical.py` — heap extractor
- `diff_physical.py` — one-sided PC enumeration script
- `diff.txt`, `histogram.txt`, `l1-hits.txt`, `audit017-hits.txt`,
`v40table-hits.txt`, `tables.txt`, `pages.txt`, `pc-summary.txt`
- Memory file: `project_xenia_rs_audit_029_physical_mem_diff_2026_05_08.md`
### Cleanup
No source modified. No commit. Master xenia-rs HEAD `e061e21` unchanged.
## KRNBUG-AUDIT-031 — Audio worker wait-site canary trace (2026-05-08)
**READ-ONLY**. Re-applied audit-030's `--log_lr_on_pc` canary patch (30 LOC,
4 files); 4 sequential probe runs; canary patch reverted at session close.
Master HEAD `e061e21` unchanged.
### Method
- Probe `0x824D2878` (audio worker entry, sub_824D2878): 1 fire, lr=0xBCBCBCBC.
- Probe `0x824D28D0` (post-wait PC where ours parks): **54,128 fires** in
~5 min — canary's wait IS being woken on a hot loop.
- Probe `0x8284DDDC` (KeSetEvent guest thunk): 8906 fires; **wake source
captured**: `tid=0100001C lr=0x824D2A44 r3=0x828A3254 r4=1` —
`KeSetEvent(0x828A3254, 1, 0)` from PC `0x824D2A40`.
- Probe `0x824D23B0` (sub_824D23B0 entry per IDA): **0 fires**.
### Key finding — function-boundary mis-attribution corrected
AUDIT-025/-030's claim "sub_824D23B0 is the only wake-source and is never
entered" is half-correct. The IDA-DB function-record `sub_824D23B0`
(claimed `0x824D23B0..0x824D2878`) actually contains a SECOND function
prologue at `0x824D29F0` (`mfspr r12, LR; bl 0x825F0F88; stwu r1, -192(r1)`).
This second function `sub_824D29F0` is the real wake-source, not
sub_824D23B0. They share IDA's broken boundary inference.
### Static reachability of sub_824D29F0
- `0x824D6648 b 0x824D29F0` (kind=`j`, tail-jump from a 12-byte thunk at
`0x824D6640` that loads `r3 = [0x828A3264]`).
- `0x824D6640` is referenced as DATA at `sub_824D2C08+0x374`
(kind=`ref`, instruction=`addi`). PC `0x824D2F7C: addi r4, r10, 26176`
loads `r4 = 0x824D6640`; the next instructions deref `[r31][68]`,
load `vtable[7]` at `[[r3]+28]`, `bcctrl 20,lt` to register the
thunk as a callback on the audio-engine object.
So in canary: after `sub_824D2C08` registers the callback at +0x374,
some scheduler/dispatcher periodically invokes the thunk at `0x824D6640`,
which tail-jumps into `sub_824D29F0`, which sets event 0x828A3254 at
`+0x50`, waking the audio worker.
### Our impl behavior (matches AUDIT-025 exactly)
`hw=4 idx=0 tid=9 state=Blocked(WaitAny { handles: [2190094932], deadline: None }) pc=0x824d28d0 lr=0x824d28d0`
where `2190094932 = 0x828A3254`. `sub_824D2C08` runs to completion in
ours (per AUDIT-025), so the registration step fires. The host-side
dispatch loop that should periodically invoke `0x824D6640` is the
unreached gate.
### Bug class
γ-deep, vtable-driven (refines AUDIT-025 with the correct downstream
witness). The dispatch loop is a per-frame audio update — most likely
in the unreached `0x82287000-0x82294000` cluster (AUDIT-009).
### Sharp prediction — AUDIT-032
1. Probe `0x824D6640` directly in canary (`--log_lr_on_pc=0x824D6640`).
Capture lr — names the dispatcher PC.
2. Probe `0x824D2F90` (the `bcctrl` callsite) to capture `r3` (the
audio-engine "this") and `[r3+0]+28` (the vtable[7] entry being
invoked). Static disasm of vtable[7] target identifies the
register-callback implementation.
3. Walk the dispatcher PC's caller chain in our IDA DB; if it bottoms
in unreached audit-009 cluster, the dispatch loop IS the renderer
gate (audio gate IS renderer gate, named).
4. Cross-check: a fix that makes the dispatcher fire should make
`sub_824D29F0` reachable in our impl, ending the deadlock.
### Trace artifacts
- Audit dir: `audit-runs/audit-031-wait-site/`
- `canary-0x824D2878.log`, `canary-0x824D28D0.log`,
`canary-KeSetEvent.log`, `canary-sub23B0.log`
- Memory file: `project_xenia_rs_audit_031_audio_wait_site_2026_05_08.md`
### Cleanup
Canary patch reverted (`git status` clean in canary repo). Master
xenia-rs HEAD `e061e21` unchanged. No commit.
## KRNBUG-AUDIT-032 — Audio dispatcher LR capture at thunk 0x824D6640 (2026-05-08)
**READ-ONLY**. Re-applied audit-030's `--log_lr_on_pc` canary patch (30 LOC,
4 files); single 40-sec capture of `--log_lr_on_pc=0x824D6640`; canary patch
reverted at session close. Master HEAD `e061e21` unchanged.
### Capture
**7,875 fires** of `pc=0x824D6640`, all from a single host-flagged kernel
thread named **"Audio Worker"** (handle=`0100001C`, native=`467FC6C0`),
stack `700D0000-700F0000`. **LR is invariant `0xBCBCBCBC`** — canary's host
stack-fill canary value, NOT a guest PC. r3=`0x30063000` (driver context),
r4=0 first call / =1 thereafter, r5=`0x1800` (frame size 6144 bytes / 1536
stereo s16 samples), r6=`0xBDFBA600` (registered callback_arg).
Canary log line:
```
d> F8000008 XAudioRegisterRenderDriverClient(701CF210(824D6640), BDFBA658(00000000))
K> 0100001C XThread::Execute thid 4 (handle=0100001C, 'Audio Worker (0100001C)', native=467FC6C0, <host>)
i> 0100001C TRACE-PC-LR pc=824D6640 lr=BCBCBCBC r3=30063000 r4=00000001 r5=00001800 r6=BDFBA600
```
### Mechanism — host-side, not guest
Per canary source `src/xenia/apu/audio_system.cc:84-159`:
1. `AudioSystem::Setup()` spawns an `XHostThread` named "Audio Worker"
running `WorkerThreadMain()`.
2. Loop: `WaitAny(client_semaphores_)` → on wake, read
`clients_[index].callback` and `wrapped_callback_arg` → call
`processor_->Execute(worker_thread_state, client_callback, args)`.
3. The audio backend driver releases the per-client semaphore each time
it consumes a frame of audio output.
The thunk `0x824D6640` is **invoked directly by the canary host emulator's
processor** — there is no guest call site. The PPC LR remains the host
stack canary because the function is entered without a guest `bl`.
### Falsifies AUDIT-031 hypothesis
Audit-031 inferred that `0x824D6640` is registered as a vtable[7] callback
on the audio_system object and dispatched via per-frame guest bcctrl. This
is wrong. The `addi r4, r10, 26176` at `sub_824D2C08+0x374` (PC `0x824D2F7C`)
loads the PC `0x824D6640` as the **callback_ptr argument to
XAudioRegisterRenderDriverClient** — caller-side parameter setup, not vtable
registration. `XAudioRegisterRenderDriverClient` records the (callback, arg)
pair into the host-side `AudioSystem::clients_[]` table; the host worker
thread is what subsequently invokes the callback.
### Outcome
**δ + α composite** per task brief outcomes:
- δ confirmed: audit-031's "vtable[7] callback" inference is wrong.
- α partial: the "caller PC" we sought to walk up is canary's HOST C++,
not guest code. There is no guest LR to walk; the divergence is entirely
on the kernel-host boundary at `XAudioRegisterRenderDriverClient`.
### Our impl gap (probe-confirmed)
`crates/xenia-kernel/src/exports.rs:2705-2745`: registers the client into
our `state.xaudio` table (correct callback_pc=`0x824D6640`,
arg=`0x41E9DD5C`, returns driver=`0x41550000`) but **does not spawn a
host-side worker thread** to pump the callback. No semaphore-release loop
mirrors canary's `client_semaphore->Release(queued_frames_, ...)`.
Probe fires at -n 500M (`--pc-probe=0x824D6640,0x824D29F0,...` AND
`--branch-probe=...`): **0 fires for both PCs**. tid=9 parks at
`pc=0x824D28D0` waiting on event `0x828A3254`; tid=10 parks at
`pc=0x824D2990` waiting on semaphore `0x828A3230` (count=0/limit=6).
### Bug class & sharp prediction
**Class**: δ-α composite — host-side AudioSystem worker thread missing
entirely.
**Sharp cascade prediction** for fix session (audio-host-pump):
- A: tid=9 leaves `Blocked(WaitAny [0x828A3254])` on the FIRST callback
invocation (sub_824D29F0 calls `KeSetEvent(0x828A3254, 1, 0)`).
- B: tid=10 leaves `Blocked(WaitAny [0x828A3230])` on next sema release
inside sub_824D29F0.
- C: `XAudioSubmitRenderDriverFrame` count rises from 0.
- D: `KeReleaseSemaphore` becomes non-zero (canary-only export landed).
- E: open — does this unblock a non-audio consumer? Tid=10's parking on
`limit=6` semaphore (canary's `queued_frames_=6`) suggests audio frame
queue is **isolated**. So fix likely resolves audio path but **NOT**
the audit-009 renderer cluster.
The audio gate is **NOT** the renderer gate (revising audit-025's "audio
gate IS the renderer gate" claim). Separate stalls sharing only the
"host pump missing" symptom.
### Trace artifacts
- Audit dir: `audit-runs/audit-032-dispatcher-lr/`
- `canary-patch.diff` (saved before revert)
- `probe.{log,err}` (our impl, -n 500M)
- `probe-sanity.{log,err}` (-n 50M)
- `branchprobe.{log,err}` (branch-probe verification)
- `/tmp/audit-032-canary.log` (canary capture, 35,942 lines, 7,875 LR fires)
- Memory file: `project_xenia_rs_audit_032_dispatcher_lr_2026_05_08.md`
### Recommended next session
Implement host-side audio worker per canary `apu/audio_system.cc`. Est.
60-120 LOC. Predicted to unblock audio path (tids 9, 10) and add
canary-only kernel exports (KeReleaseSemaphore, possibly
XAudioSubmitRenderDriverFrame). **Won't fix the audit-009 renderer cluster
(separate γ-class blocker)**. Audit-025's strategic-pivot to renderer
cluster L1 callers REMAINS priority for swaps=2→draws>0 progression; the
audio fix is necessary cleanup of canary-only exports.
### Cleanup
Canary patch reverted (`git status` clean in canary repo). Master
xenia-rs HEAD `e061e21` unchanged. No commit.
## VERIFY-A — Static-reachability soundness check via canary PC trace (2026-05-08)
**READ-ONLY**. Re-applied audit-030's `--log_lr_on_pc` canary patch (30 LOC,
4 files). Probed 12 distinct PCs from the audit-009 unreachable cluster
(`0x82285000-0x82294000`) sequentially in canary; canary patch reverted at
session close. Master HEAD `e061e21` unchanged.
### Hypothesis being tested
Static reachability via `xrefs.kind='call'` BFS from `entry_point=0x824AB748`
in `sylpheed.db` claims 112/116 functions in cluster `0x82285000-0x82294000`
are unreachable. xrefs.kind='call' does NOT capture indirect dispatch
(vtables, function pointers). If canary reaches these PCs via indirect
dispatch, the audit-009/-016/-017/-020/-021/-029 framing is wrong.
### Method
- Build: Debug variant, `xenia-canary/build/bin/Linux/Debug/xenia_canary`
- Args: `--log_level=3 --disable_instruction_infocache=true
--log_lr_on_pc=PC --headless=true`
- Per probe: ~35 sec runtime, then SIGTERM/SIGKILL.
- Sanity check: `--log_lr_on_pc=0x824D28D0` produced 5683 fires (matches
audit-031's 54128/5min ratio) — trace mechanism functional in this build.
- Per probe: also recorded `KeReleaseSemaphore` count (audio loop liveness
proxy); each probe ran with 5,600-5,800 KeRelSem calls during the window.
### Probe results (PC → fires → cluster region)
| PC | fires | source | reachable via call-BFS? |
|-------------|-------|-------------------|-------------------------|
| 0x822919C8 | 0 | audit-009 narrow | no |
| 0x82293448 | 0 | audit-009 narrow | no |
| 0x82288028 | 0 | audit-009 narrow | no |
| 0x82292D80 | 0 | audit-009 narrow | no |
| 0x822851E0 | 0 | audit-009 narrow | no |
| 0x82286BC8 | 0 | audit-009 narrow | no |
| 0x82285C78 | 0 | broader cluster | no |
| 0x82285DD0 | 0 | broader cluster | no |
| 0x82286118 | 0 | broader cluster | no |
| 0x8228A140 | 0 | broader cluster | no |
| 0x8228CAF8 | 0 | broader cluster | no |
| 0x8228E688 | 0 | broader cluster | no |
| 0x824D28D0 | 5683 | sanity-check | reached (audit-031) |
### Cross-validation against sylpheed.db
- 116 functions live in `0x82285000-0x82294000` per `functions` table.
- 4/116 reached via call-BFS from entry; 112/116 unreached.
- 12 of those 112 unreached PCs probed; 0 fires in canary across ~6 min
cumulative wall-clock per-cluster probe time.
### Bug-class implication
Outcome (i) — **static reachability claim is sound**. The 112-function
"unreachable" cluster IS unreachable in canary too; the BFS conclusion is
not artifactually narrow. Indirect-dispatch reachability misses (the
hypothesized failure mode) are NOT happening for this cluster.
### What this rules out / does not rule out
- Rules out: "indirect dispatch through audio vtables reaches this cluster
in canary, but our static analysis missed it." Would have manifested as
>=1 PC firing.
- Rules in (consistent): the audit-031 finding that the audio dispatch
loop registers `0x824D6640` as a callback but the dispatcher itself
lives in unreached territory. Both canary and ours fail to reach the
cluster via the static-call graph; canary reaches it via a DIFFERENT
vtable/dispatch entry that this 12-PC sample didn't catch.
- Does not rule out: that SOME parts of the 42-function broader closed
island could be reached in canary (sample size 12/112 = ~10.7%
coverage). A full sweep would harden the claim, but cost is ~75 min
cumulative per probe at ~35 sec each.
### Cumulative-coverage caveat
Probes are independent — running sequentially does NOT prove
non-reachability across the whole 5-min audit-031 envelope. Each probe
ran ~35 sec. Audit-031's 5-min run captured 54128 fires of 0x824D28D0
(rate ≈180/sec). At our 35-sec rate, expected fires for a similar
hot-loop entry would be ≈6300. Zero fires is decisive for hot-loops; a
genuinely cold-but-reachable PC (e.g. fires once at boot) might not have
been captured if it fires in a window outside our trigger envelope.
Mitigation: each probe was started fresh at canary launch, so any
boot-time fire would be captured.
### Reading-error impact
This verification PASSES. The 10-error reading-error ledger does not
include the audit-009 reachability claim. No reattribution required.
### Recommendation
- Outcome (i) per task brief: no immediate action required on the audit
campaign; static reachability is sound for this cluster sample.
- The reading-error ledger separately motivates the analysis-toolset
overhaul (per user's earlier instruction) but that is a separate
planning track.
- Follow-up if desired: full 112-PC sweep (~75 min cumulative). Optional
hardening; the 12-PC sample with 0/12 hits gives a Bayesian posterior
that the cluster is genuinely cold in canary at this boot phase.
### Trace artifacts
- Audit dir: `audit-runs/verify-A-static-reachability/`
- 13 probe-*.log files (12 cluster + 1 sanity)
- Memory file: `project_xenia_rs_verify_A_canary_pc_trace_2026_05_08.md`
### Cleanup
Canary patch reverted (`git status` clean in canary repo). Master
xenia-rs HEAD `e061e21` unchanged. No commit.
## KRNBUG-AUDIT-033 — UI/save-game subsystem entry-chain divergence probe (READ-ONLY, 2026-05-08)
### Setup
- Re-applied 30-LOC `--log_lr_on_pc` canary patch (4 files, see audit-030
diff). Built `xenia_canary` Debug variant explicitly via
`ninja -f build-Debug.ninja` (Checked variant has runtime code-cache
allocation issues that block boot).
- Probed 8 PCs in canary (50s wall, `--disable_instruction_infocache=true`):
Tier 1 cluster externals — `0x8228A628`, `0x8228E138`, `0x8228E498`;
Tier 2 callers — `0x82172524`, `0x82175810`, `0x8217EB78`;
Tier 3 CMessageBridge sites — `0x821A6CF0`, `0x821A8578`.
- xenia-rs `--pc-probe` of same 8 PCs at -n 500_000_000 (master HEAD
`9028021`).
### Canary fire counts
| PC | Tier | Canary fires | LRs |
|----|------|--------------|-----|
| 0x8228A628 | T1 | 0 | — |
| 0x8228E138 | T1 | 2 | 0x82172BF8 (in sub_82172BA0) |
| 0x8228E498 | T1 | 28 | 0x82451E78, 0x82174730 |
| 0x82172524 | T2 | 0 | — |
| 0x82175810 | T2 | 0 | — |
| 0x8217EB78 | T2 | 0 | — |
| 0x821A6CF0 | T3 | 0 | — |
| 0x821A8578 | T3 | 0 | — |
### xenia-rs fire counts (CTOR-PROBE)
| PC | Ours fires | LR |
|----|------------|-----|
| 0x8228E138 | 1 | 0x82172BF8 (in sub_82172BA0) |
| 0x8228E498 | 62 | 0x82451E78 (in sub_82451E20) |
| (others) | 0 | — |
### Convergence finding
**Both implementations enter the same 2 cluster externals via the same
LRs.** sub_82172BA0 → sub_8228E138 (boot init), sub_82451E20 →
sub_8228E498 (init array, 28 fires canary / 62 fires ours). Tier 2 +
Tier 3 functions (`sub_82172524`, `sub_82175810`, `sub_8217EB78`,
`sub_821A6CF0`, `sub_821A8578`) are 0-fires in canary at the 50s boot
horizon — they are NOT activated in canary either. The audit-prompt
hypothesis that these caller paths fire in canary is FALSIFIED for
Tier 2+3 within the 50s envelope.
Frame walk from our impl's CTOR-PROBE for 0x8228E498 yields a
call chain: sub_82451E20 ← sub_82450720 ← sub_82450638 ←
sub_821CB968 ← sub_821CD458 ← sub_821CBEA8 ← sub_821CECF0 ←
sub_821C4988 — all reached.
### Bug-class classification
**Outcome (γ)** per task brief: "Both reach the same PCs up to bcctrl
through cluster vtable; the divergence is at the indirect-dispatch
level." Specifically: at the 50s boot horizon, canary itself doesn't
penetrate deeper into the UI/save-game cluster than our impl does.
Tier 1 entries `sub_8228E138` and `sub_8228E498` are reached by both;
the cluster's full activation (mission select, save-game UI) requires
a boot-phase further than this probe envelope captures.
### Per-PC quantitative divergence
- `0x8228E138`: ours fires 1× at cycle 9191803 (very late), canary fires
2× — minor frequency divergence, both via sub_82172BA0. Cause likely
a duplicate post-boot reentry that ours misses.
- `0x8228E498`: ours fires 62× across cycles 104K249K, canary fires 28×
across 50s wall — ours busy-loops sub_82451E20 more aggressively
(likely an array ctor dispatch). May indicate canary breaks out of the
loop early via a state ours doesn't reach.
### Discipline gate
- Box 1: probe data captured both sides — PASS.
- Box 2: canary fires Tier 1 entries (2 of 3) — PARTIAL.
- Box 3: cross-impl LR mirror — PASS (LRs match).
- Box 4: bug class = γ — does not gate to fix; M5.5 prerequisite.
- Box 5: no fix this session per task brief — PASS.
### Recommended next session
- **(γ) M5.5 prerequisite**: schedule "this-flow vptr resolution" as
next analyzer milestone — without it, indirect-dispatch reachability
cannot be modeled. Until M5.5 lands, top-down probing inside the
cluster is blind.
- **Alternative pivot**: probe the 62-fires-vs-28-fires divergence at
`sub_82451E20` more deeply. Probe `sub_82450720` / `sub_82450638` /
`sub_821CB968` (frame chain captured). One of these exits the loop
early in canary; that exit gate IS the divergence.
- **Alternative pivot 2**: longer canary trace (5-10 min Lutris-launched
Windows build) to confirm Tier 2+3 PCs activate post-boot. The 50s
Linux probe envelope is too short for "press-A-to-continue" / intro
video boundary.
### Trace artifacts
- Audit dir: `audit-runs/audit-033-ui-entry-chain/`
- 8 canary-0x*.log probe files (Tier 1+2+3)
- ours.log (CTOR-PROBE captures), ours.err (kernel-call counters)
### Cleanup
Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs
master HEAD `9028021` unchanged. No commit.
## KRNBUG-AUDIT-034 — Frame-chain divergence + Tier 2/3 horizon (READ-ONLY, 2026-05-09)
**Status**: open. Sister of AUDIT-033. Master `9028021` unchanged. Tests 640.
Lockstep instructions=100000003. Subsystem: front-end UI / save-game /
mission-select / HUD (NOT renderer).
### Phase A — frame-chain firing-rate matrix
Canary patch (audit-030 30-LOC) re-applied; reverted at session close.
Probed 8 PCs in canary 50s wall + ours -n 500M (~8s guest):
| PC | canary 50s | ours -n 500M | divergence |
|----|---:|---:|---:|
| sub_821C4988 | 1 | 1 | 6.3× |
| sub_821CECF0 | 2 | 2 | 6.3× |
| sub_821CBEA8 | 7 | 7 | 6.3× |
| sub_821CD458 | 7 | 7 | 6.3× |
| sub_821CB968 | 14 | 14 | 6.3× |
| sub_82450638 | 14 | 14 | 6.3× |
| sub_82450720 | 24 | 16 | 4.2× |
| sub_82451E20 | 90 | 80 | 5.5× |
**Loop-exit-divergence located**: sub_82450720+0x160..+0x1F4
(PC 0x82450880..0x82450914). 5-iteration loop bounded by `r25 < 5`.
- Ours: 5/5 iterations (80/16=5.00) — never early-exits.
- Canary: avg 3.75/5 (90/24=3.75) — exits via 0x82450904 `bne 0x8245092C`.
**Exit predicate**: `[sub_82451E20_out+0] == r30-12 AND [+4] == [r30+0]+[r30+4]`.
Data source = 5×20-byte slot table at `r26+108..207` (r26 = sub_82450720
arg1 = container struct). The predicate is fed by sub_82451E20's inner
loop, which calls Tier-1 cluster sub_8228E498 to dereference
`[working_key->vptr][32]`.
**Bug class**: β-class (data-state divergence) with γ-deep entry
(sub_821C4988 = 0 static call xrefs → vtable-driven). The 6.3× upstream
amplification is uniform from L0..L5 (entry frequency), and the L7 5-loop
shows ours never triggers the early-exit data-match.
### Phase B — Tier 2/3 horizon (300s canary)
Probe set: 0x82172524, 0x82175810, 0x8217EB78, 0x821A6CF0, 0x821A8578.
**ALL 5 PCs = 0 fires at 300s in canary**. Cluster activation is even
deeper than this 5-min Linux Debug horizon. Linux Debug canary trajectory
matches Lutris Windows up to frame 42 (per RECONCILE-A); 300s ≈ early-boot
pre-intro only. May need Lutris Windows trace OR upstream probing OR
non-time-based trigger to reach Tier 2/3 activation.
### Recommended next session
**Option 1 (preferred)**: AUDIT-035 = mem-watch r26+108..207 for one
captured r26 value (capture via extended pc-probe of sub_82450720) →
identify writer in canary that ours misses. The slot table populator
is the gate to the early-exit path.
**Option 2**: schedule M5.5 (alias-aware vtable dispatch resolver) as next
analyzer milestone — sub_821C4988 has 0 static call xrefs and is the
chain entry; M5.5 would name the trigger.
**Option 3**: probe sub_8228E498's output `[r3+0][32]` value directly via
extended `--pc-probe` (capture vptr-at-+32 dereferenced value) — name what
the predicate compares against, then mem-watch its source.
### Trace artifacts
- `audit-runs/audit-034-frame-chain/canary-0x*.log` — 8 50s logs + 1 300s
preserved log + 5 Phase B 300s logs
- `audit-runs/audit-034-frame-chain/ours.log` (8-PC pc-probe at -n 500M)
- `audit-runs/audit-034-frame-chain/scripts/probe-canary*.sh`
### Cleanup
Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs
master HEAD `9028021` unchanged. No commit.
## KRNBUG-AUDIT-035 — Slot table byte-level diff at sub_82450720 (READ-ONLY, 2026-05-09)
### Background
Continuation of AUDIT-034. Disasm verified slot table at r26+108, 5×20=100
bytes (loop body PC 0x82450880..0x82450914). Goal: byte-level diff of the
5-slot table contents between canary and ours at the same call site.
### Canary patch (extended)
Re-applied audit-030 30-LOC patch + extended TrapLogLR helper (+19 LOC) to
also log r26 and dump 5×20-byte slot table from r3+108 (r3 == r26 after
the function's `mr r26,r3` prologue, which has not yet run at PC 0x82450720).
Total +49 LOC across 4 files; under the 80-LOC budget. Build succeeded. Patch
reverted at session close; canary `git status` clean.
### Captured slot tables (final state)
Both runtimes converge on r3=r26=0x828F3B68 at sub_82450720 entry; slot table
base = 0x828F3BD4. 22 canary entries captured ~30s wall.
| Slot | addr | Canary (last entry) | Ours (-n 500M) |
|------|------|---------------------|----------------|
| 0 | 0x828F3BD4 | `00000000 00000000 00000000 00000000 00000000` | (same — all zero) |
| 1 | 0x828F3BE8 | `00000000 00000000 00000000 BC3654C0 00000008` | `00000000 00000000 00000000 4024A240 00000008` |
| 2 | 0x828F3BFC | `00000000 00000000 00000000 BC366080 00000008` | `00000000 00000000 00000000 4024AEE0 00000008` |
| 3 | 0x828F3C10 | `00000002 00000005 00000000 00000000 00000000` | `00000000 00000000 00000000 00000000 00000000` |
| 4 | 0x828F3C24 | `00000000 00000000 00000000 BC365520 00000008` | `00000000 00000000 00000000 4024A300 00000008` |
### Diff summary
- Slots 1, 2, 4: same shape (zeros + heap-pointer + size 8) but pointers
diverge by **heap region** — canary `BC3xxxxx` (physical heap), ours
`4024xxxx` (v40 bump heap). Same divergence noted in audit-027/029.
- Slot 3: canary [+0]=2, [+4]=5 (counter pair); ours [+0]=0, [+4]=0. Slot 3
is dynamic — push/pop counter; ours's writers fire at higher rate.
### Writer identification (1066 ours mem-watch hits on slot 3)
PCs: 0x82450c08, 0x82450c40, 0x82450c4c, 0x82450c3c (sub_82450bc4 chain),
0x822f8b20 (counter inc), 0x82323364 (index update), 0x8231eee8 (init).
Slot 3 [+4] cycles 0..0xB in ours vs 0..5 in canary's window. Ours over-pushes.
### Reading — ε-class heap-region mismatch
The slot table populates IDENTICALLY in shape across both runtimes. The
predicate at PC 0x82450904 fails because the **lookup table** sub_82451E20
walks (via Tier-1 cluster external sub_8228E498's `[r3+0][32]`) is populated
with canary-physical-heap pointers on canary, v40 pointers on ours — but the
slot-table writers on the **other** side push pointers from a different
allocator state. Per-element cross-reference inconsistency causes the
predicate to never match in ours's iter 1-2; it falls through to slot 4
(self-referential default) only. Bug class **ε — heap-region-mismatch
propagating through dual-data-structure consistency check**.
### Sharp 4-dim cascade prediction
A: implement physical-heap separation (CPPBUG-AUDIT-001) so
mm_allocate_physical_memory_ex / nt_allocate_virtual_memory return distinct
0xBC3xxxxx region.
B: sub_8228E498's vptr-table contains 0xBC3xxxxx, slot-table writers push
0xBC3xxxxx — same heap region.
C: predicate at 0x82450904 matches at iter 1-2, sub_82450720 returns 1,
sub_82450638 second-call frequency normalizes (~10× per L5 entry).
D: cluster activation MAY clear (`draws > 0` cascade UNKNOWN until B-C
observed).
### Falsification of audit-034
"Different positions in the 5-slot table" — falsified. Matching slot indices
(1, 2, 4) are populated identically in shape. Mismatch is in the VALUE of
the heap pointer, not its slot position.
### Trace artifacts
- `audit-runs/audit-035-slot-table/canary-0x82450720-fix.log` (132 lines, 22 entries)
- `audit-runs/audit-035-slot-table/ours-lrtrace.jsonl` (16 entries)
- `audit-runs/audit-035-slot-table/ours-dump-stdout.log` (slot table at end-of-run)
- `audit-runs/audit-035-slot-table/ours-memwatch-slot3.log` (1066 writers)
### Recommended AUDIT-036
1. Land physical-heap separation; re-run AUDIT-035 trace to verify slot
pointers shift to 0xBC3xxxxx and predicate early-exits.
2. Or probe sub_8228E498 in both runtimes to capture `[r3+0][32]` value
and confirm cross-table heap divergence.
### Cleanup
Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs
master HEAD `9028021` unchanged. No commit.
## KRNBUG-AUDIT-036 — `[[r3+0]+32]` predicate hypothesis test (READ-ONLY, 2026-05-09)
### Validation goal
Direct hypothesis test of audit-035's heap-region narrative. Capture
`[[r3+0]+32]` at sub_8228E498 in both canary and ours; CONFIRMED if both are
heap-region-divergent pointers (0xBC3xxxxx vs 0x4024xxxx); REFUTED otherwise.
### Disasm correction
sub_8228E498 is NOT a vtable[8] dispatcher. It's a deque/segmented-array
iterator deref returning element_address in r3:
- `[r3+0]` = header*; `[r3+4]` = packed (chunk_idx, sub_idx)
- `[header+4]` = segment_table; `[header+8]` = chunk_count
- `r3 = segment_table[chunk_idx] + sub_offset` ; `blr`
The `[+32]` deref happens in the CALLER `sub_82451E20` at PC 0x82451E78
(LR), reading the returned element's `[+0]` and then `[+32]` as predicate
target compared against r28 (= caller's r6, 3rd arg).
### Canary patch — 49 LOC, reverted
Re-applied audit-030 base + extended TrapLogLR to log r3, r28, dereference
`[r3+0]` (key), and dump 64 bytes (16 u32 lanes + ASCII) at the key.
Build via ninja Debug; reverted via `git checkout -- src/` at session
close; canary `git status` clean.
### Captured values
**Canary** (PC=0x82451E78, ~36 fires at 30s):
- r3 (returned element) = 0xBC22CA20 / 0xBC22CA24 (physical heap)
- `[r3+0]` (key) = 0xBC65D018 / 0xBC65D140 / 0xBC65D1C0 / 0xBC65D240 / 0xBC65D340 / 0xBC65D400 / 0xBC65D540
- Key struct (key=0xBC65D1C0): `F80000B8 0 0 3 0 0 0 0 BC65D018 BC65D140 0 BC65D034 0 0 1 0`
- ASCII: `'.................................e...e.@.....e.4................'`
- **`[[r3+0]+32]` = 0xBC65D018 / 0xBC65D2D8 / 0xBC65CFD8 / 0xBC65D118 / 0xBC65D198 / 0xBC65D398** — phys-heap pointers, range 0xBC65xxxx
**Ours** (PC=0x8228E498 + dump-addr at returned r3, ~62 fires at 500M):
- r3 (returned element) = 0x401119B0..0x401119BC (v40 bump heap)
- `[r3+0]` (key) = 0x40542300 / 0x40542340 / 0x40542400 / 0x405424C0
- Key bytes at 0x40542300:
```
+0x00 "game:\hidden\Resource3D\Common.x"
+0x10 "ource3D\Common.xpr\0\..."
+0x20: 70 72 00 5c (= "pr\0\\")
```
- **`[[r3+0]+32]` = 0x7072005C** (mid-string text "pr\0\\")
### Verdict — REFUTED-AS-STATED, stronger η-class divergence found
Audit-035's strict prediction "ours's `[[r3+0]+32]` is in 0x4024xxxx" is
REFUTED. Ours's value is `0x7072005C` — literal filename text bytes, not
a heap pointer.
But the deeper divergence is even worse than the heap-region narrative
suggested: the records held by the container have **fundamentally
different layouts**. Canary's `[r3+0]` points to a 16-dword pointer-bearing
struct with phys-heap sub-pointers at offsets 32/36/44. Ours's `[r3+0]`
points to a struct that begins with the inline filename string, so offset
32 falls inside the string text. The predicate
`r28 == [[r3+0]+32]` therefore COMPARES STACK POINTERS (r28) against
INLINE STRING TEXT in ours — a comparison that can never succeed.
Bug class **η — record-layout divergence** (NEW class). Distinct from
audit-035's "heap region" axis; the populator for these records writes
DIFFERENT struct shapes in ours vs canary.
### Cascade implication
The `swaps>2 / draws>0` plateau is gated by THIS predicate failing on
EVERY iteration in ours's main loop body. Even if physical-heap
separation (CPPBUG-AUDIT-001) landed, the records would still hold inline
strings, so the predicate would still fail.
### Recommendation — DO NOT proceed with physical-heap separation as audit-037
Audit-037 should NOT be the heap-split fix. Instead:
**Audit-037 = identify the record populator(s)** that build the container
elements at `0x401119B0+` (ours) vs `0xBC22CA20+` (canary). The populator
writes the struct at `[r3+0]`. Likely path:
1. mem-watch on `0x40542300+0x20` (the predicate target offset) to find
the writer PC and LR in ours.
2. Disasm the writer's caller chain.
3. Re-apply audit-030 patch in canary, probe the equivalent PC, compare
the populator's ctor / load path.
4. The two populators should diverge at a static-init or resource-loader
function — that divergence is the audit-037 root cause.
### Sharp 4-dim cascade prediction (post-fix at populator)
A: ours's `[0x40542300+0x20]` becomes a phys-style pointer (matches
canary's record layout)
B: predicate `r28 == [[r3+0]+32]` matches at least once during boot
C: sub_82451E20 inner loop exits via the `bne` branch, not via end-iter
D: cluster `0x82285000-0x82294000` external-entry probes (audit-033)
show new fires — front-end UI activation begins
### Falsification of audit-035
"`[[r3+0]+32]` is a heap-region-divergent pointer" — REFUTED. Ours's value
is mid-string text bytes (0x7072005C). Heap-region divergence is real for
the container element pointers themselves (0xBC22CA20 vs 0x401119B0) but
the predicate failure mechanism is record-layout, not heap-region.
### Trace artifacts
- `audit-runs/audit-036-vptr-deref/canary.log` — initial 30s canary at PC=0x8228E498
- `audit-runs/audit-036-vptr-deref/canary-callsite.log` — extended canary at PC=0x82451E78
- `audit-runs/audit-036-vptr-deref/ours.log` — pc-probe at 0x8228E498 (62 fires)
- `audit-runs/audit-036-vptr-deref/ours-exit.log` — branch-probe at 0x82451E78 (returned r3)
- `audit-runs/audit-036-vptr-deref/ours-final.log` — dump-addr at element + key targets
### Cleanup
Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs
master HEAD `9028021` unchanged. No commit. Tests 640.
### Discipline gate (5/5 PASS)
1. Hypothesis explicitly tested with sharp pre-prediction
2. Canary patch reverted at session close, git clean
3. xenia-rs source unmodified, no commit
4. Single-step (validation only, no fix attempt)
5. Trace files saved per audit dir convention
## TRACK-1-VERIFY — Cache-fix record-layout verification (READ-ONLY, 2026-05-09)
### Validation goal
Direct verification of cascade dimension A from audit-038. Audit-038 landed
the cache fix (cache:/* paths persist via /tmp/xenia-rs-cache-<pid>-<seq>/);
sub_82459D18, sub_8245D230, 0x82450904 were silenced from "many fires" to
zero. The unmeasured dimension was record-layout: did the fix flip the
record at 0x40542300 from inline-string (audit-037 pre-fix shape) to
canary-shape pointer-bearing (handle@+0=0xF80000B8, sub-pointers
@+32/+36/+44)?
### Method (read-only, no source mods, no commit)
1. Probe sub_8228E498 (deque iterator deref returning element_address)
at -n 500M to find current record-base addresses. **Result: 0 fires**.
The cache fix silenced the cache-miss path; sub_8228E498 is downstream
of that path and now never executes.
2. Fallback: dump audit-037 record bases via
`--dump-addr=0x40542300,0x40542340,0x40542400,0x405424C0` (master
d8766c6, post-fix). Plus extended-range dump
0x40542100..0x40542800 to look for any pointer-shaped records nearby.
3. Cross-reference canary record shape from audit-037's canary probe of
0x82450b68 — canary populates filenames via
`RtlInitAnsiString(BC365xxx, "game:\\hidden\\Resource3D\\…")` separately
from the per-file struct at 0xBC65xxxx (struct holds pointers).
### Captured values (post-fix, master d8766c6)
**0x40542300** — IDENTICAL to audit-037 pre-fix:
```
+0x00: "game:\hidden\Res"
+0x10: "ource3D\Common.x"
+0x20: 70 72 00 5c 93 9a 9d cc ... (be32=7072005c)
+0x30: ...69 d8 e4 5c c2 95 ea d8...
```
+0x20 dword = **0x7072005C** ("pr\0\\" text bytes), unchanged.
**0x40542340** — descriptor-shape, header pointers + inline filename text:
```
+0x00: 40 54 28 80 ... | be32=40542880 (next-record ptr)
+0x40: "...dden@T#." (continuation of inline filename)
+0x50: "ource3D\Comm..."
```
**0x40542400** — descriptor-shape with offsets at +0x40 ("@T&.@T..@T%@_TIT"):
```
+0x00: 40 54 24 80 (be32=40542480 ptr)
+0x40: 40 54 26 00 40 54 1e c0 40 54 25 40 5f 54 49 54
```
**0x405424c0** — pointer-bearing PARTIAL but filename still inlined at +0x44:
```
+0x00: 40 54 25 80 (be32=40542580 ptr)
+0x20: 40 54 1e d8 ... 40 54 1e f4 (be32=40541ed8, 40541ef4 — pointers)
+0x40: 40 54 23 40 ":\hidden\Res"
+0x50: "ource3D\ptc_pack"
+0x60: ".xpr\0..."
```
+0x20 dword = **0x40541ED8** (pointer in v40 range). Filename "ptc_pack.xpr"
still inlined at +0x44.
### Verdict — Cascade Dimension A: FAIL
Cache fix (audit-038) DID NOT flip record layout to canary-shape:
- 0x40542300: inline-string layout fully unchanged. +0x20 = 0x7072005C
(text), IDENTICAL to audit-037 pre-fix.
- 0x405424c0 has descriptor-shape pointers at +0x20 / +0x2C
(0x40541ED8 / 0x40541EF4) but **the filename is still inlined at +0x44**
rather than externalized to a separate `RtlInitAnsiString`-allocated
ANSI-string heap.
- No record begins with the canary 0xF80000B8 handle. No record contains
BC65xxxx-equivalent sub-pointers. The transformation step that should
externalize filenames into ANSI-string heap before the pointer-bearing
record stage is NOT running in our impl.
### Mechanism
Canary's record-population path:
1. `RtlInitAnsiString(heap_alloc, "game:\\hidden\\Resource3D\\Common.xpr")`
allocates the literal on a separate heap (BC365xxx range).
2. The per-file record at BC65xxxx receives a POINTER to that string.
3. `[[r3+0]+32]` then dereferences cleanly to BC65xxxx neighbours
(handle/sub-pointer fields).
Our impl's record-population path:
1. The literal "game:\\hidden\\Resource3D\\Common.xpr" is written DIRECTLY
into the per-file record at +0x00 (or +0x44 for some records).
2. There is no separate ANSI-string allocation. No pointer indirection.
3. `[[r3+0]+32]` reads inline filename text bytes (0x7072005C "pr\0\\")
instead of a pointer.
The audit-038 cache fix made `cache:/*` paths persist on real disk, which
silenced the cache-miss restore loop. But the populator that turns a
filename literal into either an ANSI-heap pointer (canary) or an
inline-record-prefix (ours) is a DIFFERENT mechanism — sibling to or
upstream of cache machinery.
### Cascade implication
The `swaps>2 / draws>0` plateau and the cluster L1 unreached state are
both still gated by this layout divergence. Even with the cache fix
landed, the predicate `r28 == [[r3+0]+32]` STILL compares stack pointers
against inline filename text bytes — a comparison that cannot succeed.
Sister Track 2's extended-horizon canary trace becomes the load-bearing
diagnostic: if cluster L1 fires in canary at e.g. T+30s, then this
transformation-step fix is the next concrete target.
### Recommendation — Track 1 next moves
- **Option A (preferred)** — trace `RtlInitAnsiString` callers in our impl
vs canary on the `game:/dat:/cache:` prefix family; find which path
doesn't fire in our impl. The missing path is the populator divergence.
- **Option B** — mem-watch +0x20 of 0x40542320 to capture the writer's
PC + LR in our impl; the writer's function should diverge from canary's
equivalent at a static-init / resource-loader site.
- **Option C** — wait for sister Track 2's findings before declaring
transformation-step missing; rule out timing/horizon as a confound.
- **Option D** — KRNBUG entry: audit `RtlInitAnsiString` (and adjacent
string-init paths) for prefix branching. If our impl folds all prefixes
into the same handler but canary branches, that's the bug.
### Lockstep determinism preserved
`instructions=500000019, imports=5629636, swaps=2, VdSwap=2`. Stable.
### Trace artifacts
- `audit-runs/audit-039-track-1-verify/probe-element.{out,log}` —
pc-probe sub_8228E498 (0 fires) + 4 record dumps
- `audit-runs/audit-039-track-1-verify/dump-extended.{out,log}` —
extended-range dump 0x40542100..0x40542800
### Cleanup
xenia-rs source unmodified. No commit. No canary touch. Sister Track 2
running parallel against xenia-canary; not touched. Master HEAD
`d8766c6`. Tests 645.
### Discipline gate (5/5 PASS)
1. Hypothesis explicitly tested with sharp pre-prediction (cascade dim A)
2. No canary patch (read-only on our side only)
3. xenia-rs source unmodified, no commit
4. Single-step (verification only, no fix attempt)
5. Trace files saved per audit dir convention
## TRACK-2-EXTENDED — Extended-horizon canary trace for cluster activation (READ-ONLY, 2026-05-09)
### Question
At 1015 min wallclock (23× longer than audit-034 Phase B's 5 min), does
Linux Debug canary EVER reach the audit-009 cluster's Tier-2 callers
(`sub_82172524`, `sub_82175810`, `sub_8217EB78`) — and through them the
cluster's L1 entries? If YES → capture LR (caller PC) → name the
activation gate. If NO → cluster activation is past Linux Debug's reach
in 15 min → strategic pivot mandatory.
### Method (canary patch + revert; no xenia-rs touch)
1. Re-applied audit-030 `--log_lr_on_pc` patch (30 LOC across 4 files)
to xenia-canary HEAD `6de80dffe`. Build via `ninja -f build-Debug.ninja
xenia_canary`. Mandatory `--disable_instruction_infocache=true`.
2. Probed 3 Tier-2 PCs serially (single PC at a time per audit-031
constraint), 15-min wallclock each:
- `0x82172524` — actual run 22 min (timeout(1) didn't enforce 900s
cleanly until force-kill)
- `0x82175810` — 15 min
- `0x8217EB78` — 15 min (force-killed at +3s post-timeout)
3. Compressed plan per task brief: skip Tier-1 (3 PCs) + L1 (6 PCs) when
Tier-2 = 0× — they are downstream consequences of Tier-2 firing.
4. Trace marker `TRACE-PC-LR pc=… lr=… r3..r6,r31`.
### Result Table
| Tier | PC | Horizon | Hits | LR | Notes |
|------|-------------|---------|------|----|-------|
| T2-A | 0x82172524 | 22 min | **0** | — | Steady-state idle: 240k KeReleaseSemaphore / 75k texture-load / VdRetrainEDRAM loop |
| T2-B | 0x82175810 | 15 min | **0** | — | Steady-state idle (same kernel-call mix) |
| T2-C | 0x8217EB78 | 15 min | **0** | — | Steady-state idle (same kernel-call mix) |
Total ~52 min canary CPU. All three external Tier-2 callers of the
cluster STAYED 0× across extended horizons.
### Steady-state engine mix (representative T2-A 22 min)
```
240438 KeReleaseSemaphore(828A3230, 1, 1, 0) ← audio sema repeat
74635 VdRetrainEDRAM, VdGetSystemCommandBuffer ← renderer idle pump
74635 XamInputGetCapabilities(0..3) ← input poll
432 Removed; 396 Added; 381 NtStatusToDosError
```
Identical mix in T2-B, T2-C. Engine is alive at the kernel-call level
but does not advance through the front-end-UI / save-game state
machine across 3× the previously-tested wallclock.
### Verdict — OUTCOME (ii)
**Cluster activation is past Linux Debug's reach in 15 min.** Per task
brief Step 3 outcome (ii). Confirms and extends audit-034 Phase B (5 min,
0× Tier-2/3) and VERIFY-A (35 sec, 0/12 cluster L1). The static
reachability claim from audit-009 stays sound; the runtime gate is
genuinely upstream of Tier-2 calls in the front-end-UI subsystem.
### Strategic implication
RECONCILE-B's host-presenter caveat dominates: Vulkan/XCB on Linux fails
to display intro video; user confirmed Weston also shows black; the
front-end-UI state machine never advances past the post-intro
state-transition that Tier-2 callers gate on. Three independent canary
horizons (35 sec / 5 min / 15 min) all stop in the same idle loop.
**15-min Linux Debug canary cannot witness the cluster activation event
on this host.** Continued probing at higher horizons on Linux is unlikely
to yield. Two pivots open:
- **Pivot A — Lutris Windows canary instrumentation.** Re-port
`--log_lr_on_pc` to a Windows build and probe Tier-2 there. Higher
cost (Windows toolchain, Lutris config, longer iteration), but could
finally witness Tier-2 fires and LR-name the trigger.
- **Pivot B — Static-only.** Drop runtime probing on this side; lean on
M5.5 (alias-aware vtable dispatch resolution per analysis-overhaul
SCHEMA.md) to statically name the gate function in xenia-rs's IDA DB,
then probe THAT function in our impl + canary-Linux at 5-min horizon.
**Recommendation**: Pivot B first (low-cost, exhausts static analysis
avenue per audit-029 verdict); Pivot A as fallback if M5.5 doesn't reach
a probeable witness.
### Sister-session coordination (Track 1)
Track 1 (cache-fix record-layout verification) verdict on cascade
dimension A: **FAIL** — audit-038 cache fix did NOT flip record layout
to canary-shape. Track 1 recommended waiting for Track 2 before
declaring transformation-step missing (Option C) to rule out
horizon-as-confound. Track 2 now rules that out: 15-min horizon does
not move the needle. **Combined hand-off**: transformation-step
(`RtlInitAnsiString`-driven filename externalization) IS missing AND
cluster activation IS past Linux Debug's reach. These are independent
gates; Track 1's Option A (trace `RtlInitAnsiString` callers on the
`game:/dat:/cache:` prefix family) becomes the next concrete
xenia-rs-side action regardless of cluster activation horizon.
### Falsifications
- Audit-034 Phase B's "5 min may be too short" caveat is closed: 15 min
doesn't reach Tier-2 either.
- Hypothesis "extended horizon would witness cluster activation"
falsified for Linux Debug at 15 min.
### Trace
`audit-runs/audit-039-track-2-extended-canary/`:
- `canary-0x82172524.{log,err}` — 77 MB log, 0 fires, 22-min wall
- `canary-0x82175810.{log,err}` — 52 MB log, 0 fires, 15-min wall
- `canary-0x8217EB78.{log,err}` — 55 MB log, 0 fires, 15-min wall
### Cleanup
Canary patch reverted (`cd xenia-canary && git status` → clean,
HEAD `6de80dffe` unchanged). xenia-rs source unmodified, no commit,
no push. Sister Track 1's territory untouched.
### Discipline gate (5/5 PASS)
1. Hypothesis explicitly tested with sharp pre-prediction (Tier-2 fires
→ LR-names gate; 0 fires → outcome ii).
2. Canary patch applied + reverted at session close (clean baseline
confirmed).
3. xenia-rs source unmodified, no commit.
4. Single-step (verification only, no fix attempt).
5. Trace files saved per audit dir convention.
## KRNBUG-AUDIT-040 — record ctor input divergence at sub_8244FC90 (READ-ONLY, 2026-05-09)
### Goal
Per audit-037 sub_8244FC90 fires identically in canary + ours but produces
different record layouts. Identify the divergent INPUT (which arg register
holds different content). Trace the upstream caller that supplies it.
### Canary patch — 56 LOC, reverted
Re-applied audit-030 base + extended TrapLogLR to log r3..r10 + r28..r31 +
LR + 32-byte hex dump from `*r4` and `*r5`. Build via
`ninja -f build-Debug.ninja xenia_canary`. Reverted via
`git checkout -- src/` at session close; canary `git status` clean
(HEAD `6de80dffe` unchanged).
### Calling convention (sub_8244FC90)
- r3 = dest record (alloc'd by caller via `operator new`)
- **r4 = source struct ptr (28 bytes; memcpy'd to dest+0x3C via 7-dword loop)**
- r5 = secondary "this" (vtable in canary)
- r6/r7 = scalar args
### Concrete register values (representative fire 2 of 33 canary / 8 ours)
| reg | canary | ours |
|-----|--------|------|
| r3 | `BC65D440` | `405420C0` |
| **r4** | **`BC79C9EC`** | **`406819EC`** |
| r5 | `BC65D2C0` | `40542100` |
| LR | `82450440` | `82450440` (= `sub_824503A0+0xA0`) |
### Source-struct content at `*r4` (the load-bearing memcpy region)
| word | canary | ours | diff |
|------|--------|------|------|
| +0 | **`F80000DC`** | **`00001454`** | **HANDLE-NAMESPACE** |
| +4 | `0` | `0` | same |
| +8 | `0` | `2` | DIFFERENT |
| +12 | `3` | `3` | same |
| +16 | `0` | `0xC` | DIFFERENT |
| +20 | `0xC` | `0xC` | same |
| +24 | `0` | `0` | same |
### Upstream caller — divergent dword origin
Backtrace: sub_8244FC90 ← sub_824503A0 ← sub_824528A8 ← sub_822DFBC8 ←
sub_822DFC74 (the producer).
In **sub_822DFC74**:
```
0x822DFC8C-90 bl 0x824A9F18 ; r3=r4=r5=r6=0 — calls NtCreateEvent
0x822DFC94 r4 = r3 (event handle returned)
0x822DFC98-9C bl 0x821820B0 ; stw r4, 0(r1+80)
0x822DFCA0 lwz r11, 80(r1) ; r11 = handle
0x822DFCB8 stw r11, 44(r31) ; *** [this+44] = NtCreateEvent handle ***
0x822DFCC4 bl sub_822DFBC8 ; vtable[7] dispatcher reads [this+44]
```
`sub_824A9F18` is a wrapper around **`NtCreateEvent`** (xboxkrnl.exe ord
209, thunk `0x8284DF1C`). The OUT handle is what diverges:
- canary: `NtCreateEvent` → `0xF80000DC` (kernel-region pseudo-handle,
XObject namespace)
- ours: `NtCreateEvent` → `0x00001454` (small-int handle ID,
KernelState::handle_table namespace)
Both runtimes call NtCreateEvent 395× during boot; both succeed. The
divergence is purely **handle-value namespace cosmetics**.
### Bug-class refinement
**δ-namespace** (handle representation divergence; benign unless
downstream code interprets handle bits semantically). NOT a logic bug
in our code path — both impls correctly route the handle through
`WaitForSingleObject(handle, INFINITE)` at sub_822DFC34.
The audit-037 framing of "canary records hold pointer-bearing structs
while ours holds inline-string structs" is partially incorrect:
- The 28 bytes copied at sub_8244FC90 (record `+0x3C..+0x57`) ARE
different in handle slot, but only by namespace.
- The "filename text starting at +0" lives at a DIFFERENT region of the
dest record (+0x40+ in our `0x40542100` dump shows
`40541F80 40542000 745c4750 ... LE.pak\0eng\p`) — written by
`bl 0x822F8A70` / `bl 0x82150030` AFTER sub_8244FC90 returns.
### Recommended audit-041 (sharp prediction)
**Two parallel options:**
1. **DOWNSTREAM-USE PROBE (preferred)**: probe sub_822DFC34
(`bl 0x824AA330` waitsite) in BOTH runtimes. Capture r3 (handle being
waited on) and verify wait completes. If canary's wait completes but
ours doesn't, audit-041 is signaler-missing (trace which kernel call
signals canary's `0xF80000DC`). If canary's wait ALSO doesn't
complete, the namespace finding is benign and the gate is upstream
of the wait (RDX search-criteria producer).
2. **AUDIT-037 RE-VERIFICATION**: dump 128 bytes from canary's r3 and
ours's r3 AT THE EXIT of sub_8244FC90 (not at session-end). If the
filename text is written by sub_824503A0+0x478 callees
(sub_822F8A70 / sub_82150030), those are the real audit-041 targets.
### Trace artifacts
- `audit-runs/audit-040-record-ctor-inputs/canary-0x8244FC90.log` (33 fires)
- `audit-runs/audit-040-record-ctor-inputs/ours-lrtrace.jsonl` (8 fires)
- `audit-runs/audit-040-record-ctor-inputs/ours-dump.log` (10 dump-addr)
- `audit-runs/audit-040-record-ctor-inputs/canary-patch.diff` (notes)
### Cleanup
Canary patch reverted (clean baseline confirmed; HEAD `6de80dffe`
unchanged). xenia-rs source unmodified, no commit, master HEAD
`d8766c6` unchanged. Tests 645.
### Discipline gate (5/5 PASS)
1. Hypothesis explicitly tested (extracted divergent input arg + named
upstream producer NtCreateEvent).
2. Canary patch applied + reverted at session close.
3. xenia-rs source unmodified, no commit.
4. Single-step (data-gathering only, no fix attempt).
5. Trace files saved per audit dir convention.
## KRNBUG-AUDIT-041 — wait-site signaler determination (READ-ONLY, 2026-05-09)
Re-applied audit-030 `--log_lr_on_pc` canary patch (30 LOC, 4 files);
reverted at session close (canary HEAD `6de80dffe` clean).
**Wait site**: PC `0x822DFC34` `bl 0x824AA330` (KeWaitForSingleObject
wrapper, INFINITE timeout) inside sub_822DFBC8. Wait loops on r3=0x102
(STATUS_TIMEOUT) and on `[r31+52]==3`. Containing function is the
direct caller of audit-040's NtCreateEvent at sub_822DFC74; the handle
flowing into r3 is the OUT handle from that create.
**Wait completion ratio (30s canary trace; 500M-instr ours)**:
| Runtime | bl/pre-bl | post-bl | completes |
|---------|-----------|---------|-----------|
| canary | 9 | 9 | 100% |
| ours | 7 | 6 | **6/7 = 85%** |
The 7th wait in ours stalls. **Stalled handle = `0x00001454`**
(audit-040 family). Bl-PC 0x822DFC34 returns 0 fires in our HIR
(`bl` is a control-flow terminator, probe elided); pre-bl
`0x822DFC30 addi r4,r0,-1` fires 7× (fair comparison). The 7th
pre-bl fire (cycle 48,849) has no matching post-bl.
**Outcome (i) confirmed**: handle-namespace divergence is
**load-bearing**.
**Signaler identified**: probed canary KeSetEvent (0x8284DDDC, 20588
fires, 0 on F80000CC/C0 — takes KEVENT*, not handle) and NtSetEvent
(0x8284DF5C, 9245 fires, **2 on F80000CC/C0**). Both fires LR=0x824AA304
inside wrapper sub_824AA2F0 (89 static callers). **Signaler =
NtSetEvent** (xboxkrnl ord 246).
**Cross-check ours**: NtSetEvent at 0x8284DF5C fires 3334 times in ours;
**1 fire on `r3=0x00001454`** at cycle 3,519,453 (after the stall at
cycle 48,849). So signaler IS firing — bug is NOT pure
signaler-missing.
**Bug class refinement (provisional)**: δ-namespace AND δ-wakeup. The
signal exists but doesn't wake the waiter. Candidate causes:
- Handle table recycles slot 0x1454 between create-epochs in our impl
(so signal hits a *different* KEVENT than wait registered for).
- KeSetEvent / wait-queue machinery has a missed-wake (signal-before-
wait race ruled out: signal at 3.5M is AFTER wait at 48,849).
**Recommended audit-042** (autonomous, two-track):
1. Probe sub_824AA2F0 entry; capture LR + r31 per fire on r3=0x1454.
Names the actual signaler caller chain.
2. Dump handle table state for slot 0x1454 at cycles 48,849 (wait) and
3,519,453 (signal). If different KEVENT pointers → handle aliasing
bug in our `xenia_kernel::handle_table` (slab recycle between
NtCreate/NtClose). If same → bug in `KeSetEvent` / wait-queue.
Both fixes ≤60 LOC. xenia-rs source unmodified, no commit, master
HEAD `d8766c6` unchanged. Tests 645. Trace
`audit-runs/audit-041-wait-site/`.
### Discipline gate (5/5 PASS)
1. Hypothesis explicitly tested (wait-completion-ratio canary vs ours).
2. Canary patch applied + reverted at session close.
3. xenia-rs source unmodified, no commit.
4. Single-step (data-gathering only, no fix attempt).
5. Trace files saved per audit dir convention.
## KRNBUG-AUDIT-042 — handle 0x1454 lifecycle disambiguation (READ-ONLY, 2026-05-09)
Re-applied audit-030 `--log_lr_on_pc` canary patch (30 LOC, 4 files);
reverted at session close (canary HEAD `6de80dffe` clean). xenia-rs
master `d8766c6` unchanged. Tests 645.
**Goal**: disambiguate audit-041 root cause (A) handle-recycling vs
(B) wakeup-plumbing for handle 0x1454's missed wakeup.
**Method**: ours via `--trace-handles-focus=0x1454` (existing
audit.rs infrastructure); canary via `--log_lr_on_pc=0x8284DF1C`
(NtCreateEvent thunk, ord 209) + cross-reference to audit-041's
existing `canary-bl-0x822DFC34.log` containing canary's
`Added handle:/Removed handle:` lifecycle markers.
### Allocator architecture (decisive structural finding)
`KernelState::alloc_handle` (state.rs:588-593) is a **monotonic
atomic counter** initialized to `0x1000`, advanced via
`fetch_add(4)`. **Bump-only — no recycling, ever.** `nt_close`
(exports.rs:1869) decrements refcount and removes the object from
`state.objects`, but **NEVER returns the handle ID to the pool**.
This makes root cause (A) — handle-recycling — **structurally
impossible in ours**.
### Handle 0x1454 lifecycle in ours (`-n 500M`, two reruns identical)
```
created: cycle=0 tid=13 lr=0x824a9f6c src=NtCreateEvent kind=Event/Manual
stack: lr=0x822dfc94 (caller — audit-041's sub_822DFC74)
← 0x822e0344 ← 0x822d2ca4 ← 0x822de768 ← 0x821c4b1c
wait: cycle=0 tid=13 lr=0x824ac578 src=do_wait_single
signal: cycle=0 tid=5 lr=0x824aa304 src=NtSetEvent
wake: cycle=0 tid=5 src=wake_eligible_waiters/auto
final: waiters=0 signaled=true signal_attempts=1 waits=1 wakes=1
```
(`cycle=0` is a separate, pre-existing audit-instrumentation gap
— `KernelState::audit_entry` reads `scheduler.ctx(0).timebase`
which is 0 in this build. Counts/ordering still authoritative
because rings are append-only.)
**Single create, single wait, single signal, single wake — fully
consumed.** Handle 0x1454 is **NOT stuck** at end-of-run in this
audit. The end-of-run "Handle waiter lists" section names the
actually-stalled handles: `0x1004 0x1020 0x1544 0x1578 0x10a0
0x12ac 0x1040 ...` — all `<NO_SIGNALS_DESPITE_WAITS>`. **0x1454 is
not among them.**
### Handle 0xF80000CC family lifecycle in canary
From audit-041's `canary-bl-0x822DFC34.log` (debug-helper output
around `ObjectTable::Add/Release`):
```
Added handle:F80000CC for XObject (ctor — fresh KEVENT slot)
NtDuplicateObject(F80000CC, ...) × 3 (handle-table dup)
TRACE-PC-LR pc=822DFC34 r3=F80000CC (wait fires on live KEVENT)
NtClose(F80000CC) (after wait completes)
Removed handle:F80000CC for XEvent (slot freed)
Added handle:F80000CC for XEvent (NEW KEVENT, SAME SLOT VALUE)
NtClose(F80000CC) → Removed → Added × 4 more iterations
```
**Canary RECYCLES handle slots heavily**: `F8000098` reused 130×,
`F80000D0` 95×, `F80000DC` 71×, `F80000C0` 10×, `F80000CC` 5× in a
single 30s window. Canary's `ObjectTable::AllocateHandle` (per
`xobject.cc/object_table.cc`) is a slab/free-list allocator; ours
is bump-only.
### Decisive disambiguation
| | ours | canary |
|---|---|---|
| handle 0x1454 NtCreateEvent fires | **1** | N/A (different namespace) |
| handle 0xF80000CC `Added handle:` | N/A | **5+** within 30s |
| recycling? | **NO** (structurally impossible) | **YES** (slab) |
| audit-041 stall handle 0x1454 | wait+signal+wake recorded in `--trace-handles-focus` rerun | — |
**Verdict: ROOT CAUSE IS NOT (A) HANDLE-RECYCLING.**
Sub-conclusion on audit-041's premise: under
`--trace-handles-focus=0x1454 -n 500M`, handle 0x1454's wait+signal
DO complete (1 wake recorded). audit-041's "wait NEVER returns"
inference came from `--lr-trace`-only data (post-bl missing for the
7th iteration); but `--quiet` suppressed the end-of-run audit dump
in audit-041, so the wait-completion was never directly verified.
The lr-trace miss can be explained by: lr-trace records the
**guest-side resume PC after `bl`**; if KeWaitForSingleObject's
return path bypasses that PC (e.g., direct context-restore on
wake), the post-bl trace doesn't fire even though the wait
completes. **audit-041's load-bearing premise is provisionally
falsified for handle 0x1454 specifically.**
### Real wedge points
The actual stalled handles per this run's end-of-run dump:
- `0x1004` Event/Manual (tid=11 parked, 0 signals)
- `0x1020` Event/Manual (tid=3 parked, 0 signals)
- `0x1040` Event/Auto (tid=5 parked via WaitMultiple, 0 signals)
- `0x1544` Event/Manual (tid=17 parked, 0 signals)
- `0x1578` Event/Auto (tid=19 parked, 0 signals)
- `0x12ac` Semaphore (tid=14, 15 parked, 0 signals)
- `0x10a0` Event/Auto (tid=6, 0 signals) + paired `0x10a4` Semaphore
All carry `<NO_SIGNALS_DESPITE_WAITS>`. These are γ-class
missing-signaler candidates — distinct from 0x1454.
### Bug-class refinement
**δ-wakeup ruled out** for 0x1454 (wake DID fire). **δ-namespace
ruled out** (single create, no aliasing). **The wedge is on a
different handle set** — needs re-pivot.
### Sharp 4-dim cascade prediction (for any audit-043 fix targeting the *real* stalled handles, e.g. `0x1004` or `0x10a0`)
- **A**: handle 0x1004's `signal_attempts` goes 0 → ≥1 (signaler
named; KE/Nt SetEvent or KeReleaseSemaphore reaches it).
- **B**: tid=11 transitions out of `Blocked(WaitAny [4100])` to
Ready/Exited; thread-list shrinks by ≥1 stalled thread.
- **C**: dependent waiters (any handle whose creator/signaler is
gated by tid=11) start firing — measurable as `<NO_SIGNALS>`
count drops by ≥2 across the trail set.
- **D**: `swaps` advances past 2 OR `draws` flips from 0 to >0.
*Probability*: lower (γ-cluster activation is the audit-009
plateau; multiple gates must fall, only one is being addressed).
### Recommended audit-043
**Pivot**: re-target audit on the **actually-stalled** handles per
this session's end-of-run dump. Ranked by likely impact:
1. **`0x10a0` Event/Auto + `0x10a4` Semaphore on tid=6** —
Event+Semaphore pair is canonical "worker waits for job;
producer hasn't run." Trace tid=6's entry PC and producer chain.
2. **`0x12ac` Semaphore (2 waiters: tid=14, tid=15)** — semaphore
never released; `KeReleaseSemaphore` source is the target.
3. **`0x1004` Event/Manual on tid=11** — earliest-created stalled
handle. Its non-signaling caller chain is the bottom-up gate.
For each: run with `--trace-handles-focus=<handle>` to capture the
created stack and identify the producer-side function. Canary
cross-trace via `--log_lr_on_pc=0x8284DF5C` (NtSetEvent) or
`0x8284DDDC` (KeSetEvent) filtering for the equivalent canary
handle at that PC + LR signature. Patch budget unchanged (≤60 LOC).
**Bug class for audit-043**: **γ (missing signaler)** — primary
candidate. **NOT δ-namespace, NOT δ-wakeup.** The handle-namespace
divergence (audit-040) appears to be benign per this audit's
finding that 0x1454 actually completes. The real stalled handles
are γ-class (signaler-missing on a *different* event/semaphore).
### Discipline gate (5/5 PASS)
1. Hypothesis explicitly tested (recycling vs plumbing for 0x1454).
2. Canary patch applied (30 LOC) + reverted at session close.
3. xenia-rs source unmodified, no commit.
4. Single-step (data-gathering only, no fix attempt).
5. Trace files saved: `audit-runs/audit-042-handle-lifecycle/
{probe.log, probe-run2.log, canary-create-0x8284DF1C.log}`
(~11.5 MB) + cross-ref of audit-041's existing
`canary-bl-0x822DFC34.log`.
### Status
- Tests: 645 (unchanged).
- Lockstep: instructions=100000004 unchanged (no source mods).
- Master HEAD: `d8766c6` (unchanged).
- Canary HEAD: `6de80dffe` (clean, post-revert).
---
## KRNBUG-AUDIT-043 — record +0x00 writer, allocator-VA divergence (READ-ONLY, 2026-05-09)
**Status**: READ-ONLY single-step. Master `d8766c6` unchanged, canary patch reverted. Tests 645 unchanged.
### Goal
Identify the writer of `+0x00` at records `0x40542300/0x40542340/0x40542400/0x405424c0` in our impl. Audit-039 reported ours has `0x67616D65` ("game" inline) while canary has `0xF80000B8` (kernel handle) — claimed to be the most fundamental layout divergence.
### Method
Mem-watch on `+0x00,+0x04` of all 4 records (`-n 500_000_000`). Group writers by (PC, LR). Look up containing functions in `sylpheed.db`. Disasm + caller chain. Apply audit-030 LR-trace patch to canary; probe writer PC `0x825F1080` (memcpy) and pool-init PC `0x82152728`.
### Findings
**The writer of `0x67616D65` is `memcpy` at `pc=0x825F1080`, called from `memcpy_s` (`sub_825ED588`, return = `lr=0x825ED608`)**, invoked from `std::basic_string::reserve_then_assign` (`sub_8216E138+0xC8`). 16 fires across 4 records.
**The records are NOT layout records** — they are 64-byte slots in a Sylpheed-managed pool allocator:
- `sub_821505D8` (called from `sub_8280C42C`) allocates ~58 MB via `sub_824A8858` (size `0x03A723D0`, type `0x20000004`).
- `sub_82152570` builds 4 free-list buckets; `sub_82152728` chains 64-byte slots over a 1.25 MB span.
- Slot-size table at `sub_821505D8+0x10`: 4, 16, 32, 64, 96, 128, 160, 192, 256.
- The "filenames" land in 64-byte slots when a Sylpheed `std::string` is heap-promoted from SSO (capacity ≥ 0x10).
### Canary cross-trace
Probed `pc=0x825F1080` in canary (audit-030 `--log_lr_on_pc` patch reapplied):
- 94,945 fires in 25s. **Zero hits to `0x40542xxx`**. Destinations distribute over `0x705Dxxxx` (76674), `0x7033xxxx` (6642), `0xBC36xxxx` (1211), etc.
- Top LR `0x824AB1D4` (84,400×, an alloc-related path absent in our trace).
- Canary's matching `LR=0x825ED608` (memcpy_s caller) fires 1,782× — **none target `0x40542xxx`**.
Pool-init `pc=0x82152728` in canary fires once with `r3=0xBC32C880` — **canary's 58 MB pool BASE = `0xBC32C880`**; ours' is `~0x40541xxx`.
### Bug-class refinement
**Audit-039's "0xF80000B8 vs 'game'" is a VA-equality fallacy.** The same guest VA `0x40542300` backs *different live data* in the two emulators because their host-side allocators return different VAs for the same `sub_824A8858` call. Ours: 64-byte std::string heap buffer. Canary: a kernel-handle / NotifyListener slot at *its* unrelated VA.
**Class = ε (host-allocator address-space divergence)**, not a guest-write bug. There is no missing/wrong write at `+0x00` in our impl.
**Reading-error ledger update**: 12th entry — *VA-equality fallacy across emulators*. Comparing memory contents at identical guest VAs assumes both allocators return the same VA for the same logical allocation; Sylpheed's pool factory makes this assumption false in general. Future audits comparing two emulators' guest memory must compare *logical allocations* (resolved through the producing allocator), not raw VAs.
### Recommended audit-044
Drop the "record at 0x40542300+" line of investigation entirely.
Re-pivot to audit-042's actually-stalled-handle plan:
1. `0x10A0` Event/Auto + `0x10A4` Semaphore on tid=6 — producer chain.
2. `0x12AC` Semaphore (tid=14, tid=15 waiters) — `KeReleaseSemaphore` source.
3. `0x1004` Event/Manual on tid=11 — earliest-created stalled handle.
Each: `--trace-handles-focus=<H>` for create/wait stack; canary cross-trace via `--log_lr_on_pc=0x8284DF5C` (NtSetEvent) or `0x8284DDDC` (KeSetEvent) on equivalent handle.
**Bug class for audit-044**: **γ (missing signaler)** — same target as audit-042's recommendation; audit-043 did not move the cluster, but eliminated a false-positive line of investigation.
### Discipline gate (5/5 PASS)
1. Hypothesis explicitly tested (writer-of-+0x00 isolated; canary equivalence checked).
2. Canary patch applied (30 LOC audit-030 base) + reverted at session close (`git status` clean, config TOML restored to `log_lr_on_pc = 0`).
3. xenia-rs source unmodified, no commit.
4. Single-step (data-gathering only).
5. Trace files saved: `audit-runs/audit-043-record-zero-offset/{mem-watch.log, mem-watch.stdout, canary-825f1080-traces.txt.gz (95k LR records), audit-043-canary-poolinit.log}`.
### Status
- Tests: 645 (unchanged).
- Lockstep: instructions=100000004 (unchanged).
- Master HEAD: `d8766c6` (unchanged).
- Canary HEAD: `6de80dffe` post-revert clean.