Accumulated diagnostic notes from prior sessions that had stayed in the working tree without being committed. Spans 20 audit entries (KRNBUG-AUDIT-023 through KRNBUG-AUDIT-057) plus VERIFY-A and TRACK-1/TRACK-2 sub-audits, all read-only investigations dated 2026-05-06 through 2026-05-10. No code or schema changes. Pure documentation backfill so future sessions can cross-reference the full chain without depending on the auto-memory directory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8015 lines
462 KiB
Markdown
8015 lines
462 KiB
Markdown
# PPC Instruction Audit — Findings Tracker
|
||
|
||
**Started**: 2026-04-29 (single session, audit-only)
|
||
**Trigger**: `addis` 32-bit-ABI sign-extension fix surfaced a likely systemic class of bugs.
|
||
**Status**: in flight. Per-group reports live in `audit-out/`. This file is the consolidated, stable-ID index.
|
||
**Workflow**: audit only this session; fix session(s) reference these IDs.
|
||
|
||
## Conventions
|
||
|
||
- Every finding has an ID `PPCBUG-NNN` for cross-referencing.
|
||
- **Status**: `open` (audit found it, not yet fixed) | `applied` (fix landed) | `wontfix` (intentional) | `dup-of:NNN` (collapsed into another finding).
|
||
- **Severity**:
|
||
- **HIGH** = wrong arithmetic / control flow on plausible Xbox 360 user code.
|
||
- **MEDIUM** = wrong status flag / latent under broken upstream invariants / edge case.
|
||
- **LOW** = test gap / cosmetic / dead-code-only.
|
||
- All file:line refs are `xenia-rs/crates/xenia-cpu/src/interpreter.rs` unless otherwise noted.
|
||
- Suggested fixes are written as one-line patches where possible; see the per-group report for full context.
|
||
|
||
## Cross-cutting recommendation
|
||
|
||
The single recurring root cause is **violating the 32-bit ABI invariant that all GPR writes truncate to 32 bits**. The cleanest fix is to systematically apply `as u32 as u64` at every GPR writeback in every integer ALU op. The existing CA/CR0/OE helpers will then be correct without further changes (because their inputs become guaranteed-clean). The audit reports list each fix individually; the fix session may choose to apply them as one sweep or one-at-a-time.
|
||
|
||
A defensive secondary recommendation: even after the writeback truncation, instructions whose CA computation does its own internal arithmetic on 64-bit operands (`subfcx`, `subfex`, `addic`, `addicx`, `subficx`) should additionally truncate their compare operands. This guards against any future regression that re-pollutes the GPR file.
|
||
|
||
---
|
||
|
||
## Batch 1 — integer ALU (groups 1-5)
|
||
|
||
Per-group reports: `audit-out/group-01-add-imm.md`, `group-02-add-reg.md`, `group-03-sub-reg.md`, `group-04-multiply.md`, `group-05-divide.md`.
|
||
|
||
### PPCBUG-001 — addi sign-extension, no truncation
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:114-118
|
||
- **Symptom**: `addi rT, r0, -1` (= `li rT, -1`) writes `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`. Identical shape to addis.
|
||
- **Fix**:
|
||
```rust
|
||
ctx.gpr[instr.rd()] = ra_val.wrapping_add(instr.simm16() as i64 as u64) as u32 as u64;
|
||
```
|
||
- **Test gap**: existing `test_addi` only covers positive simm16. Add a test for `li rT, -1` and verify the upper 32 bits are zero.
|
||
|
||
### PPCBUG-002 — addic untruncated writeback + 64-bit CA compare
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:133-140
|
||
- **Symptom**: (a) GPR writeback not truncated (same shape as addi). (b) CA computed via 64-bit `result < ra` — Canary's `AddDidCarry` explicitly truncates both operands to int32 first.
|
||
- **Fix**:
|
||
```rust
|
||
let ra32 = ra as u32;
|
||
let imm = instr.simm16() as i32 as u32;
|
||
let result32 = ra32.wrapping_add(imm);
|
||
ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
|
||
ctx.gpr[instr.rd()] = result32 as u64;
|
||
```
|
||
- **Test gap**: zero unit tests for addic.
|
||
|
||
### PPCBUG-003 — addicx untruncated writeback + 64-bit CA + CR0 regression
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:141-150
|
||
- **Symptom**: same as PPCBUG-002 plus a CR0 regression: live code uses `update_cr_signed(0, result as i64)` (64-bit signed). The frozen snapshot in `ppc-manual/alu/addicx.md` shows the previously-correct `result as i32 as i64` form. Live code has drifted.
|
||
- **Fix**: PPCBUG-002 fix plus `update_cr_signed(0, result32 as i32 as i64)`.
|
||
- **Test gap**: zero unit tests.
|
||
- **Note**: confirms the manual's frozen snapshots are useful drift detectors — see if other opcodes have similarly regressed.
|
||
|
||
### PPCBUG-004 — mulli untruncated 64-bit signed product
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:159-164
|
||
- **Symptom**: RA read as full `i64`, product stored as `u64` without truncation. Per ISA in 32-bit ABI, both factors should be i32 and product should fit in 32 bits (overflow silently wraps per ISA).
|
||
- **Fix**:
|
||
```rust
|
||
let ra = ctx.gpr[instr.ra()] as i32 as i64;
|
||
let imm = instr.simm16() as i64;
|
||
ctx.gpr[instr.rd()] = (ra.wrapping_mul(imm) as u32) as u64;
|
||
```
|
||
- **Test gap**: zero unit tests.
|
||
|
||
### PPCBUG-005 — subficx untruncated writeback + 64-bit CA compare
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:151-158
|
||
- **Symptom**: (a) `imm.wrapping_sub(ra)` on 64-bit values writes poisoned upper bits; sign-extended `imm` for negative SIMM has bits 32-63 set. (b) CA `imm >= ra` is 64-bit unsigned compare; wrong relative to Canary's 32-bit form.
|
||
- **Fix**:
|
||
```rust
|
||
let ra32 = ra as u32;
|
||
let imm32 = instr.simm16() as i32 as u32;
|
||
let result32 = imm32.wrapping_sub(ra32);
|
||
ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 };
|
||
ctx.gpr[instr.rd()] = result32 as u64;
|
||
```
|
||
- **Test gap**: zero unit tests.
|
||
|
||
### PPCBUG-006 — negx active GPR poisoning + 64-bit OE overflow check
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:319-330
|
||
- **Symptom**: (a) `(!ra).wrapping_add(1)` unconditionally sets upper 32 bits to all-ones because `!ra` flips them. Even a clean `r3 = 5` produces `0xFFFFFFFF_FFFFFFFB` instead of `0x00000000_FFFFFFFB`. **This is active, not latent — every neg in 32-bit-ABI code poisons the GPR.** (b) `neg_ov_64` overflow predicate tests `ra == 0x8000_0000_0000_0000` (64-bit INT_MIN) instead of `ra == 0x0000_0000_8000_0000` (32-bit INT_MIN).
|
||
- **Fix**:
|
||
```rust
|
||
let result = (!(ra as u32)).wrapping_add(1);
|
||
ctx.gpr[instr.rd()] = result as u64;
|
||
if instr.oe() {
|
||
overflow::apply(ctx, (ra as u32) == 0x8000_0000);
|
||
}
|
||
if instr.rc_bit() { ctx.update_cr_signed(0, result as i32 as i64); }
|
||
```
|
||
- **Test gap**: existing `nego_sets_ov_only_on_int_min` tests 64-bit INT_MIN — add a 32-bit INT_MIN case.
|
||
|
||
### PPCBUG-007 — subfcx CA via 64-bit unsigned compare
|
||
- **Severity**: HIGH (defensive — same shape as the compare that broke addis)
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:258
|
||
- **Symptom**: `if rb >= ra { 1 } else { 0 }` is the exact 64-bit unsigned compare that the addis bug exploited. Wrong CA when either operand has poisoned upper 32 bits. Apply defensively even if all upstream sources are cleaned, because a wrong CA bit is unrecoverable downstream.
|
||
- **Fix**:
|
||
```rust
|
||
let ra32 = ra as u32;
|
||
let rb32 = rb as u32;
|
||
let result32 = rb32.wrapping_sub(ra32);
|
||
ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 };
|
||
ctx.gpr[instr.rd()] = result32 as u64;
|
||
```
|
||
- **Test gap**: zero dedicated unit tests for subfcx — the most critical opcode in Group 3 had no coverage. Add 6+ tests including the exact 0x828F3F98 / 0x828F3F68 case from the addis incident.
|
||
|
||
### PPCBUG-008 — subfex CA via 64-bit unsigned compare + `!ra` poisons writeback
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:268-284
|
||
- **Symptom**: (a) CA `if rb > ra || (rb == ra && ca != 0)` is 64-bit; same shape as PPCBUG-007. (b) Writeback uses `(!ra).wrapping_add(rb).wrapping_add(ca)` — `!ra` always sets upper 32 bits, guaranteed GPR poison even with clean inputs (same shape as PPCBUG-006).
|
||
- **Fix**:
|
||
```rust
|
||
let ra32 = ra as u32;
|
||
let rb32 = rb as u32;
|
||
let ca = ctx.xer_ca as u32;
|
||
let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca);
|
||
ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 };
|
||
ctx.gpr[instr.rd()] = result32 as u64;
|
||
```
|
||
|
||
### PPCBUG-009 — mullwx untruncated 64-bit signed product
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:331-344
|
||
- **Symptom**: 32x32 multiply produces 64-bit signed `i64` product, written to GPR via `as u64` without truncation. When product overflows i32 (which `mullw_ov` correctly detects), upper 32 bits are non-zero and corrupt downstream 64-bit unsigned compares — same class as addis.
|
||
- **Fix** (one line; OE handler unchanged):
|
||
```rust
|
||
ctx.gpr[instr.rd()] = product as u32 as u64;
|
||
```
|
||
|
||
### PPCBUG-010 — divwx quotient sign-extended to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: open (must be applied in same commit as PPCBUG-011)
|
||
- **Location**: interpreter.rs:373
|
||
- **Symptom**: `(ra / rb) as i64 as u64` sign-extends a negative i32 quotient. `-10 / 3 = -3` writes `0xFFFFFFFF_FFFFFFFD` instead of `0x00000000_FFFFFFFD`. Canary's `InstrEmit_divwx` uses `f.ZeroExtend(v, INT64_TYPE)` — explicit zero-extension.
|
||
- **Fix**: `ctx.gpr[instr.rd()] = (ra / rb) as u32 as u64;`
|
||
|
||
### PPCBUG-011 — divwx CR0 update breaks after PPCBUG-010 fix
|
||
- **Severity**: MEDIUM (coupled to PPCBUG-010 — must land together)
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:379
|
||
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.rd()] as i64)` accidentally works today because the sign-extended GPR has consistent sign in i64 view. After PPCBUG-010, GPR holds `0x00000000_FFFFFFFD` for `-3` and `as i64` reads positive — CR0.LT will be wrong for negative quotients.
|
||
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64);`
|
||
|
||
### PPCBUG-012 — addx writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:167-179
|
||
- **Symptom**: 64-bit `wrapping_add` result written to GPR untruncated. Latent: only triggers if upstream operands have poisoned upper 32 bits. With PPCBUG-001 etc. unfixed, that invariant is broken — addx amplifies the poison.
|
||
- **Fix**: `ctx.gpr[instr.rd()] = result as u32 as u64;`
|
||
|
||
### PPCBUG-013 — addcx writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:180-193
|
||
- **Fix**: same shape as PPCBUG-012.
|
||
|
||
### PPCBUG-014 — addex writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:194-209
|
||
- **Fix**: same shape as PPCBUG-012.
|
||
|
||
### PPCBUG-015 — addzex writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:210-224
|
||
- **Fix**: same shape as PPCBUG-012.
|
||
|
||
### PPCBUG-016 — addmex writeback not truncated (latent + edge case)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:225-240
|
||
- **Symptom**: same writeback issue plus the `wrapping_sub(1)` produces all-ones upper 32 bits when low 32 bits underflow — guaranteed poison even if inputs are clean (same shape as PPCBUG-006/008).
|
||
- **Fix**: truncate operands and result to 32 bits.
|
||
|
||
### PPCBUG-017 — subfx writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:241-253
|
||
- **Fix**: same shape as PPCBUG-012.
|
||
|
||
### PPCBUG-018 — subfzex writeback not truncated + `!ra` poisons
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:285-302
|
||
- **Symptom**: `(!ra).wrapping_add(ca)` flips upper 32 bits — guaranteed poison.
|
||
- **Fix**: truncate ra to u32, do arithmetic on u32, write `as u64`.
|
||
|
||
### PPCBUG-019 — subfmex writeback poisoning + always-true CA edge
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:303-318
|
||
- **Symptom**: (a) writeback poisoned via `(!ra)`. (b) CA predicate `(!ra) != 0` is always true when ra has clean upper 32 bits (because `!ra` flips them) — so CA is always 1, even in the documented edge case where 32-bit `ra == 0xFFFFFFFF && ca == 0` should yield CA=0.
|
||
- **Fix**: operate on u32, then `xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 }`.
|
||
|
||
### PPCBUG-020 — CR0 update uses 64-bit signed compare in all sub-register ops
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Locations**: interpreter.rs:250, 264, 281, 299, 315, 327, 341, 379, 396, 410, 419, 428, 445, 462 (every Rc=1 path in groups 2-5)
|
||
- **Symptom**: `update_cr_signed(0, result as i64)` views result as 64-bit signed. In 32-bit ABI, bit 31 determines LT/GT, not bit 63. A result like `0x00000000_80000000` is negative in 32-bit but positive in 64-bit — CR0.LT inverted.
|
||
- **Fix (catch-all)**: change to `result as u32 as i32 as i64` everywhere. Once PPCBUG-001..-019 truncate writebacks, the upper 32 bits of `result` are zero and this distinction becomes moot — but applying both is cheap and provides defense in depth.
|
||
- **Note**: this is one logical fix duplicated across all rc paths; the fix session should grep `update_cr_signed(0, .* as i64)` to find them all.
|
||
|
||
### PPCBUG-021 — OE overflow checks at bit 63 in all sub-register ops
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Locations**: throughout — `add_ov_64`, `sub_ov_64`, `sum_overflow_64`, `mullw_ov`, etc. (defined in `xenia-cpu/src/overflow.rs`)
|
||
- **Symptom**: signed-overflow check operates on 64-bit boundary. For 32-bit-ABI ops (`addo`, `subfo`, `subfco`, etc.), should check at bit 31. With PPCBUG-006 a tighter form was given for `negx`. The pattern probably needs systematic review across overflow.rs.
|
||
- **Fix**: open a follow-up audit of overflow.rs after batch B completes.
|
||
|
||
### PPCBUG-022 — mulld_ov missing INT_MIN * -1 edge case
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `xenia-cpu/src/overflow.rs` (`mulld_ov` helper)
|
||
- **Symptom**: 64-bit signed multiply overflow check doesn't handle `i64::MIN * -1`.
|
||
- **Fix**: add the special case to the helper.
|
||
|
||
### PPCBUG-023 — andisx CR0 update uses 64-bit signed compare; should use 32-bit
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:475
|
||
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` interprets the result as 64-bit signed. The `andisx` result is bounded by `0x0000_0000_FFFF_0000`, which is always non-negative in 64-bit view. In 32-bit ABI, bit 31 is the sign bit — results with bit 31 set (e.g. `andis. rA, rS, 0x8000` with rS=0x80000000 → result=0x80000000) should yield CR0.LT=1, but xenia-rs gives CR0.GT=1. The ppc-manual frozen snapshot for `andisx` shows the correct `as i32 as i64` form; the live code has drifted. Common trigger: `andis. rA, rS, 0x8000` to test the sign bit of a 32-bit word.
|
||
- **Fix**:
|
||
```rust
|
||
ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);
|
||
```
|
||
- **Test gap**: zero tests for `andisx`. Add at minimum: result with bit 31 set (expect LT=1), result with bits 0–30 set (expect GT=1), result=0 (expect EQ=1).
|
||
|
||
---
|
||
|
||
## Batch 2 — logical immediate (group 6)
|
||
|
||
Per-group report: `audit-out/group-06-logic-imm.md`.
|
||
|
||
Group 6 summary: only 1 new bug found. The `simm16` sign-extension pattern does not apply (all ops use `uimm16`). `ori`, `oris`, `xori`, `xoris`, and `andix` are ISA-correct; `andisx` has a CR0 interpretation bug (PPCBUG-023). All 6 opcodes have inadequate test coverage (LOW gaps for 5 of them, MEDIUM gap for `andisx` tied to the bug).
|
||
|
||
---
|
||
|
||
## Batch 3 — word rotate-and-mask (group 9)
|
||
|
||
Per-group report: `audit-out/group-09-word-rotate.md`.
|
||
|
||
Group 9 summary: core arithmetic is clean — `rlw_mask`, rotate logic, and result write are all ISA-correct. The single recurring defect is the Rc=1 CR0 path using `as i64` instead of `as u32 as i32 as i64` (instances of PPCBUG-020 specific to these three opcodes). `rlwimix` zeroes the upper 32 bits of RA instead of preserving them per ISA, but this is safe under 32-bit ABI invariant and classified LOW. Test coverage is poor: 1 partial test for `rlwinmx`, zero for the other two.
|
||
|
||
### PPCBUG-024 — rlwinmx CR0 update uses 64-bit signed compare; should use 32-bit
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:667
|
||
- **Symptom**: `update_cr_signed(0, ctx.gpr[instr.ra()] as i64)` — result is a zero-extended u32, so bit 31 set yields +2147483648 in 64-bit signed view but -2147483648 in 32-bit ABI. CR0.LT/GT inverted for results with bit 31 set. `rlwinm.` is the most common dot-form instruction in compiler output (all `slwi.`, `srwi.`, `clrlwi.`, bitfield-test-and-branch idioms).
|
||
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
|
||
- **Test gap**: `test_rlwinm` exists but non-Rc only, result has bit 31 clear. Add Rc=1 tests with bit 31 set in result.
|
||
|
||
### PPCBUG-025 — rlwimix CR0 update uses 64-bit signed compare; should use 32-bit
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:679
|
||
- **Symptom**: same class as PPCBUG-024. `rlwimi.` is compiler-generated for struct bitfield writes; when the inserted value occupies or sets bit 31 of RA, CR0.LT is wrong.
|
||
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
|
||
- **Test gap**: zero tests for `rlwimix`. Add basic insert (non-Rc) + Rc=1 with bit-31-set case.
|
||
|
||
### PPCBUG-026 — rlwnmx CR0 update uses 64-bit signed compare; should use 32-bit
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:690
|
||
- **Symptom**: same class as PPCBUG-024. `rlwnm.` is less frequent but used in variable-shift normalisation patterns.
|
||
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);`
|
||
- **Test gap**: zero tests for `rlwnmx`.
|
||
|
||
### PPCBUG-027 — rlwimix zeroes upper 32 bits of RA instead of preserving them (ISA deviation, LOW)
|
||
- **Severity**: LOW
|
||
- **Status**: open (no fix action required for 32-bit ABI emulation)
|
||
- **Location**: interpreter.rs:677-678
|
||
- **Symptom**: `let ra = ctx.gpr[instr.ra()] as u32` discards upper 32 bits; result written as `as u64` zero-extends. Per ISA, `(RA) & ¬MASK(MB+32, ME+32)` preserves upper 32 bits of RA. Canary confirms: `f.And(f.LoadGPR(i.M.RA), f.LoadConstantUint64(~m))` with `~m` non-zero in upper half.
|
||
- **Impact**: under 32-bit ABI, if the 32-bit GPR invariant holds, upper 32 bits of RA are already zero before `rlwimix`, so both behaviours are identical. The deviation is only observable if an upstream bug (PPCBUG-001..023) has leaked non-zero upper bits into RA — in which case `rlwimix` would silently clean them (beneficial side-effect). No isolated fix needed; resolves automatically when upstream bugs are fixed.
|
||
- **Note**: if 64-bit mode support is ever added, this will become a HIGH bug.
|
||
|
||
---
|
||
|
||
## Batch 2 — logical register (group 7) [renumbered from collision]
|
||
|
||
Per-group report: `audit-out/group-07-logic-reg.md` (note: report uses original IDs PPCBUG-023..029 from the subagent's local numbering; tracker uses PPCBUG-028..033 here to avoid collision with groups 6 and 9).
|
||
|
||
The group 7 subagent also flagged a CR0 regression across all 8 opcodes — that is an extension of PPCBUG-020 (catch-all for CR0 64-bit-signed regressions). Adding andx, andcx, orx, orcx, xorx, norx, nandx, eqvx Rc=1 paths to PPCBUG-020's scope rather than creating a new ID.
|
||
|
||
### PPCBUG-028 — orcx active GPR poisoning
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:509-513
|
||
- **Symptom**: writes `rs | !rb`. Rust's `!` on `u64` flips all 64 bits — the upper 32 bits of `!rb` are unconditionally all-ones, OR'd into the result. With clean inputs `orc r5, r3, r4` writes `0xFFFFFFFF_xxxxxxxx`. Active poisoning, same shape as PPCBUG-006/008.
|
||
- **Fix**: operate on u32, write `as u64`:
|
||
```rust
|
||
let result = (ctx.gpr[instr.rs()] as u32) | !(ctx.gpr[instr.rb()] as u32);
|
||
ctx.gpr[instr.ra()] = result as u64;
|
||
```
|
||
- **Test gap**: zero tests.
|
||
|
||
### PPCBUG-029 — norx active GPR poisoning (the `not` simplified mnemonic)
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:519-523
|
||
- **Symptom**: writes `!(rs | rb)` — outer `!` flips upper 32 bits unconditionally. **`nor rA, rS, rS` is the canonical `not` simplified mnemonic** used pervasively in PPC code; every `not` in 32-bit-ABI Xbox 360 binaries actively poisons the GPR.
|
||
- **Fix**: u32 arithmetic, write `as u64`.
|
||
|
||
### PPCBUG-030 — nandx active GPR poisoning
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:524-528
|
||
- **Symptom**: writes `!(rs & rb)` — same shape as norx. The simplified mnemonic `nand` is also `nand rA, rS, rS` (= `nor . . .` in some assemblers).
|
||
- **Fix**: u32 arithmetic.
|
||
|
||
### PPCBUG-031 — eqvx active GPR poisoning
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:529-533
|
||
- **Symptom**: writes `!(rs ^ rb)` — same shape. The idiom `eqv rA, rS, rS` "set rA to all-ones (i.e. -1 in 32-bit ABI)" produces `0xFFFFFFFF_FFFFFFFF` instead of `0x00000000_FFFFFFFF`.
|
||
- **Fix**: u32 arithmetic.
|
||
|
||
### PPCBUG-032 — andx / orx / xorx writeback not truncated (latent)
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Locations**: interpreter.rs:494-498 (andx), 504-508 (orx), 514-518 (xorx)
|
||
- **Symptom**: 64-bit bitwise on full GPR values. Latent — clean if both operands are clean; pollutes if either is poisoned upstream.
|
||
- **Fix**: `as u32 as u64` truncation at writeback. Once all upstream poison sources are fixed, these become unnecessary; until then, defensive truncation.
|
||
|
||
### PPCBUG-033 — andcx active poisoning via `!rb` sub-expression
|
||
- **Severity**: MEDIUM (the `!rb` always poisons; outer `&` masks it away when rs is clean — fully active when rs is poisoned)
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:499-503
|
||
- **Symptom**: writes `rs & !rb`. The `!rb` always has all-ones upper bits; if rs has clean upper bits (zero), the result is clean. If rs is poisoned upstream, the poison propagates AND the always-set bits in `!rb` make it look "guaranteed". This is closer to active than latent.
|
||
- **Fix**: `(rs as u32) & !(rb as u32)` then `as u64`.
|
||
|
||
## Batch 2 — sign-extend / count-leading-zeros (group 8) [renumbered]
|
||
|
||
Per-group report: `audit-out/group-08-extend-clz.md` (report uses local IDs PPCBUG-023..030; tracker uses PPCBUG-034..039).
|
||
|
||
### PPCBUG-034 — extsbx writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:537
|
||
- **Symptom**: `as i8 as i64 as u64` — a byte with high bit set (0x80) writes `0xFFFFFFFF_FFFFFF80` instead of `0x00000000_FFFFFF80`. Active poisoning on every negative byte. `extsb` is emitted by compilers to canonicalize signed-byte arguments — common code path.
|
||
- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i8 as i32 as u32 as u64;`
|
||
- **Test gap**: zero unit tests.
|
||
- **Note**: Canary's JIT does the same sign-extension but is rescued by x86's 32-bit-write zeroing the upper 32 of host registers. Pure interpreter has no such escape.
|
||
|
||
### PPCBUG-035 — extshx writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:542
|
||
- **Symptom**: `as i16 as i64 as u64` — same shape as PPCBUG-034 for halfwords.
|
||
- **Fix**: `ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i16 as i32 as u32 as u64;`
|
||
|
||
### PPCBUG-036 — extsbx CR0 coupling
|
||
- **Severity**: MEDIUM (must land in same commit as PPCBUG-034)
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:538
|
||
- **Symptom**: `update_cr_signed(0, ra as i64)` — currently latent because the unfixed sign-extended value's i64 sign matches bit 7 of the byte. After PPCBUG-034 lands, the truncated value's i64 view becomes always non-negative — CR0.LT will never fire for negative byte results.
|
||
- **Fix**: `ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64);` — must land with PPCBUG-034.
|
||
|
||
### PPCBUG-037 — extshx CR0 coupling
|
||
- **Severity**: MEDIUM (must land with PPCBUG-035)
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:543
|
||
- **Symptom**: same coupling shape as PPCBUG-036 for halfwords.
|
||
|
||
### PPCBUG-038 — extswx ISA-correct, document asymmetry
|
||
- **Severity**: LOW (informational / wontfix)
|
||
- **Status**: wontfix
|
||
- **Location**: interpreter.rs:547
|
||
- **Symptom**: `as i32 as i64 as u64` produces full 64-bit sign-extension. This IS the documented purpose of extsw — argument-register canonicalization in 64-bit mode. Behavior is intentional. After PPCBUG-034/035 land, document the asymmetry with extsb/extsh in a comment.
|
||
|
||
### PPCBUG-039 — cntlzdx counts upper 32 always-zero bits in 32-bit ABI
|
||
- **Severity**: LOW
|
||
- **Status**: open (probably dead code in Xbox 360 binaries)
|
||
- **Location**: interpreter.rs:556-562
|
||
- **Symptom**: counts leading zeros in full 64. If a 32-bit-ABI binary emits cntlzd, the result is `32 + cntlzw(low32)` not `cntlzw(low32)`. ISA-correct for 64-bit mode; only matters if the binary actually emits it.
|
||
- **Test gap**: zero tests.
|
||
|
||
#### Clean opcodes from group 8
|
||
|
||
- `cntlzwx` (interpreter.rs:551-555) — `(rs as u32).leading_zeros()` reads only low 32 bits, result range 0..=32, upper 32 zero. CR0 path benign because result is small. **Test gap only**, LOW.
|
||
- `extswx` CR0 path is correct per ISA (PPCBUG-038 wontfix).
|
||
|
||
## Batch 2 — shift (group 11) [renumbered]
|
||
|
||
Per-group report: `audit-out/group-11-shift.md` (uses local IDs PPCBUG-050..055; tracker uses PPCBUG-040..045).
|
||
|
||
### PPCBUG-040 — DECODER BUG: `sh64()` wrong bit order for sradi (HIGH)
|
||
- **Severity**: HIGH (this is a decoder-level bug, file:line is in `decoder.rs` not `interpreter.rs`)
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `xenia-rs/crates/xenia-cpu/src/decoder.rs:91-93` (the `sh64()` accessor method on `DecodedInstr`)
|
||
- **Symptom**: the XS-form `sradix` (sradi) shift amount is assembled as `SH[4:0] << 1 | SH[5]` instead of the correct `SH[5] << 5 | SH[4:0]`. **Every `sradi rA, rS, N` instruction where N is not 0 or 63 executes with a completely wrong shift count.** Example: `sradi rA, rS, 32` shifts by 1 instead. This is a silent, structural mis-decoding — none of the interpreter changes can paper over it.
|
||
- **Cross-reference**: Canary's `(i.XS.SH5 << 5) | i.XS.SH` pattern is the correct ISA encoding.
|
||
- **Fix**: in `decoder.rs:sh64()` body, swap the bit order:
|
||
```rust
|
||
pub fn sh64(&self) -> u32 {
|
||
// SH5 is at bit 30 of the encoded word; SH[4:0] is at bits 16-20.
|
||
let sh_lo = extract_bits(self.raw, 16, 20);
|
||
let sh_hi = extract_bits(self.raw, 30, 30);
|
||
(sh_hi << 5) | sh_lo
|
||
}
|
||
```
|
||
- **Impact**: `sradi` is used by compilers for arithmetic right shifts on 64-bit values. In Xbox 360 32-bit-ABI binaries it should not be common, but it's emitted by some compilers for sign-magnitude conversions and 64-bit fixed-point arithmetic. **This is the kind of silent decoder bug the user explicitly wanted the audit to catch.**
|
||
- **Test gap**: no decoder unit test pins `sh64()` for non-trivial SH values. Add fixture cases in `disasm_goldens.rs` for `sradi rA, rS, 1`, `sradi rA, rS, 32`, `sradi rA, rS, 63`.
|
||
- **Note**: any other instruction that uses the same XS-form SH split-encoding is suspect. Phase C decoder audit must verify `sradi` and `sradix` are the only consumers of `sh64()`.
|
||
|
||
### PPCBUG-041 — srawx writeback sign-extends to 64 bits
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Locations**: interpreter.rs:583, 588 (two writeback paths for the count<32 and count>=32 branches)
|
||
- **Symptom**: `result as i64 as u64` violates the 32-bit-ABI zero-extension convention. A negative shifted value writes `0xFFFFFFFF_xxxxxxxx` instead of `0x00000000_xxxxxxxx`.
|
||
- **Fix**: `result as u32 as u64` in both writeback paths.
|
||
- **Note**: subagent verified the CA computation is **independently correct** — uses `(rs as u32) << (32 - sh) != 0` which is the canonical ISA shifted-out-bits test on 32-bit operands. **Do not change CA logic.**
|
||
|
||
### PPCBUG-042 — srawix writeback sign-extends to 64 bits
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Locations**: interpreter.rs:600, 605 (same shape as PPCBUG-041 for srawi)
|
||
- **Fix**: `result as u32 as u64`.
|
||
|
||
### PPCBUG-043 — srawx / srawix CR0 coupling
|
||
- **Severity**: MEDIUM (must land with PPCBUG-041 and PPCBUG-042)
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Locations**: interpreter.rs:593, 607
|
||
- **Symptom**: currently masked by the sign-extended writeback (sign-extension makes the 64-bit and 32-bit sign agree). After truncating the writeback, `as i64` will misread the sign for negative results.
|
||
- **Fix**: `as u32 as i32 as i64` in both Rc=1 paths, applied with PPCBUG-041/042.
|
||
|
||
### PPCBUG-044 — slwx / srwx CR0 misclassifies negative 32-bit results
|
||
- **Severity**: LOW (zero-extended results have bit 31 set in low 32, but always positive in i64 view → CR0.LT never fires for slw/srw with bit-31-set results)
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Locations**: interpreter.rs:568, 576
|
||
- **Fix**: `as u32 as i32 as i64`.
|
||
|
||
### PPCBUG-045 — Zero unit tests for any shift opcode
|
||
- **Severity**: LOW (test gap only)
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:563-658 (entire shift group: slwx, srwx, srawx, srawix, sldx, srdx, sradx, sradix)
|
||
- **Recommendation**: add at least one functional test per opcode. Especially: `srawix r3, r3, 1` with rs=0xFFFFFFFE (CA should be 0), `srawix r3, r3, 1` with rs=0x80000001 (CA should be 1, result=0xC0000000); `sradix r3, r3, 32` (currently wrong per PPCBUG-040).
|
||
|
||
#### Clean opcodes from group 11
|
||
|
||
- `slwx` writeback at line 568 (zero-ext 32-bit result via `(rs as u32 << count) as u64`) — clean.
|
||
- `srwx` writeback at line 576 — clean.
|
||
- `sldx`, `srdx`, `sradx` — 64-bit ops, ISA-correct (probably dead in 32-bit-ABI binaries).
|
||
- `sradix` body logic is structurally correct; failure is solely from PPCBUG-040 giving it a wrong shift count.
|
||
|
||
## Batch 2 — doubleword rotate (group 10) [renumbered]
|
||
|
||
Per-group report: `audit-out/group-10-dword-rotate.md` (uses local IDs PPCBUG-027/028; tracker uses PPCBUG-046/047).
|
||
|
||
### PPCBUG-046 — DECODER BUG: wrong bit position for MB[5] in all 6 doubleword-rotate opcodes (HIGH)
|
||
- **Severity**: HIGH (decoder-level; impacts the canonical zero-extend-to-32 idiom)
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Locations**: interpreter.rs — every arm of `rldiclx`, `rldicrx`, `rldicx`, `rldimix`, `rldclx`, `rldcrx` (lines 693-754)
|
||
- **Symptom**: each arm computes `let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1)`. The bit at `(instr.raw >> 1) & 1` is **PPC bit 30**, which in MD form is `sh[0]` (the low bit of the shift amount) — NOT `mb[5]`. The high bit of the 6-bit MB field lives at PPC bit 26 = `(instr.raw >> 5) & 1`.
|
||
|
||
As written, the code computes `(mb[4:0] << 1) | sh[0]`. Ironically `disasm.rs:1256` (the `mb_md()` helper) has the correct formula. The interpreter was written independently with the wrong bit position — probably a copy-error from `sh64()` where bit 30 really is the split bit.
|
||
- **Concrete impact**:
|
||
- `clrldi r3, r4, 32` is the canonical "zero-extend low 32 bits" idiom emitted constantly in 32-bit-ABI PPC code. Encoded as `rldicl r3, r4, 0, mb=32`. With mb=32, `mb[5]=1, mb[4:0]=0`. The interpreter decodes mb=0 → mask is all-ones → instruction becomes a no-op. Any downstream 64-bit compare (subfcx CA, cmpld) on that register sees a polluted 64-bit value instead of a clean 32-bit zero-extended one. **This is the same class of bug that caused the addis/BST incident.**
|
||
- For `rldcr` (MDS form), the XO field's LSB at bit 30 is always 1 (Rc=0 opcode), so `me[5]` is forcibly set to 1 for every non-record-form invocation — effectively adding 32 to all me values.
|
||
- **Fix** (one line per opcode):
|
||
```rust
|
||
// Replace in all 6 arms:
|
||
let mb = (instr.mb() << 1) | ((instr.raw >> 1) & 1);
|
||
// With:
|
||
let mb = instr.mb() | (((instr.raw >> 5) & 1) << 5);
|
||
```
|
||
Or, cleaner: expose `mb_md()` (currently in disasm.rs:1256) as a method on `DecodedInstr` in `decoder.rs` and have the interpreter call `instr.mb_md()` — single source of truth for MD-form mb extraction.
|
||
- **Test gap**: zero execution tests for any of the 6 opcodes; only disasm-golden string-output tests.
|
||
- **Note**: this is the second decoder bug found by the audit (PPCBUG-040 / `sh64()` for `sradi` is the first). Phase C decoder audit must verify whether other MD/MDS/XS form accessors have similar bit-position errors.
|
||
|
||
### PPCBUG-047 — Zero execution tests for any doubleword-rotate opcode
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:693-754 (all 6 opcodes)
|
||
- **Recommendation**: at minimum, a `clrldi r3, r4, 32` test verifying the result is exactly the low 32 bits of r4. After PPCBUG-046 lands, this test would have caught the MB-reconstruction bug.
|
||
|
||
#### What's correct in group 10
|
||
|
||
- `sh64()` accessor — correctly reconstructs 6-bit shift from MD split encoding (cross-check: `disasm.rs` agrees).
|
||
- `rld_mask_left()` / `rld_mask_right()` mask helpers — verified against Canary's XEMASK.
|
||
- `rldicx`/`rldimix` mask formulas (`63 - sh` for right edge) — correct.
|
||
- `rldimix` read-modify-write merge — correct 64-bit mask-insert.
|
||
- CR0 `as i64` — correct here because these ARE genuine 64-bit ops (unlike word rotate).
|
||
- `rldcl`/`rldcr` register-shift extraction (`gpr[rb] & 0x3F`) — correct.
|
||
- No 32-bit writeback truncation needed: these are intentionally 64-bit; 32-bit-ABI compilers only emit them with masks that yield 32-bit-clean results.
|
||
|
||
## Batch 3 — branch (group 13)
|
||
|
||
Per-group report: `audit-out/group-13-branch.md`.
|
||
|
||
Group 13 summary: the branch implementation is substantively correct. All BO/BI bit masks,
|
||
CTR decrement-before-test ordering, AA absolute vs relative dispatch, LK unconditional write
|
||
(including not-taken path in `bcx`), LR-read-before-LR-write atomicity in `bclrx`, and
|
||
`get_cr_bit()` field indexing are all ISA-correct and match Canary. The only execution bugs
|
||
are a latent 64-bit CTR zero-test (PPCBUG-053/054, active under current GPR-pollution environment)
|
||
and severely thin test coverage (PPCBUG-055).
|
||
|
||
### PPCBUG-053 — CTR zero-test uses 64-bit compare; should use 32-bit in `bcx`/`bclrx`
|
||
- **Severity**: MEDIUM (effectively HIGH given unfixed PPCBUG-001..031 GPR pollution)
|
||
- **Status**: applied (3d8e2ce, 2026-05-02)
|
||
- **Locations**: `interpreter.rs:849` (`bcx` `ctr_ok`), `interpreter.rs:879` (`bclrx` `ctr_ok`)
|
||
- **Symptom**: `ctx.ctr != 0` compares all 64 bits. In 32-bit ABI the CTR is logically 32-bit.
|
||
Canary explicitly truncates to 32 bits: `ctr = f.Truncate(ctr, INT32_TYPE)`. When CTR upper
|
||
32 bits are non-zero (due to upstream GPR pollution flowing through `mtspr CTR, rN`), the
|
||
64-bit test disagrees with the 32-bit ISA semantic. Most dangerous with `neg; mtctr; bdnz`:
|
||
`negx` (PPCBUG-006) always sets upper 32 bits, so the 32-bit CTR counter can reach zero
|
||
while the 64-bit CTR is still non-zero → infinite loop.
|
||
- **Fix**:
|
||
```rust
|
||
// Replace in both bcx and bclrx:
|
||
let ctr_ok = (bo & 0b00100) != 0
|
||
|| (((ctx.ctr as u32) != 0) ^ ((bo & 0b00010) != 0));
|
||
```
|
||
Or, alternatively, truncate at decrement:
|
||
```rust
|
||
if bo & 0b00100 == 0 {
|
||
ctx.ctr = ctx.ctr.wrapping_sub(1) as u32 as u64;
|
||
}
|
||
```
|
||
- **Test gap**: zero tests for CTR-decrement branches (bdnz, bdz, bdnzt, bdnzf, bdzt, bdzf).
|
||
|
||
### PPCBUG-054 — `mtspr CTR` writeback not truncated to 32 bits
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (3d8e2ce, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1411`
|
||
- **Symptom**: `crate::context::spr::CTR => ctx.ctr = val` writes the full 64-bit GPR to CTR.
|
||
Acts as a firewall gap: any upstream 64-bit GPR pollution flows directly into CTR, where it
|
||
will be tested by PPCBUG-053's 64-bit comparison. Defensive fix prevents CTR from ever
|
||
acquiring non-zero upper 32 bits independently of the GPR-pollution fix.
|
||
- **Note**: the `bcctrx` branch-target read (`(ctx.ctr as u32) & !3`) already truncates
|
||
correctly; the bug is confined to the `ctr != 0` zero-test in `bcx`/`bclrx`.
|
||
- **Fix**: `crate::context::spr::CTR => ctx.ctr = val as u32 as u64,`
|
||
- **Cross-reference**: Group 16 (SPR/MSR) subagent should verify this write-point.
|
||
|
||
### PPCBUG-055 — Severely inadequate test coverage for all four branch opcodes
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Locations**: `interpreter.rs` test module (lines 4455–4491)
|
||
- **Current coverage**: `bx` forward (1 test), `bl` LR update (1 test), `bcx` taken beq (1 test via `test_cmp_and_bc`). Zero tests for: `bclrx`, `bcctrx`, any CTR-decrement variant, not-taken path, backward branch, AA=1 absolute, `bcl` LR-write-on-not-taken.
|
||
- **Recommended minimum**: blr, bctr, bdnz (taken and not-taken at boundary CTR=1), bclrl old-LR-as-target, bcl LK-write-on-not-taken. See per-group report for concrete encoding patterns.
|
||
|
||
---
|
||
|
||
## Batch 3 — trap + system call (group 14)
|
||
|
||
Per-group report: `audit-out/group-14-trap-sc.md`.
|
||
|
||
Group 14 summary: the core trap evaluation (`trap.rs`) is correct — TO bit constants, signed/unsigned
|
||
comparison dispatch, and word-vs-doubleword width handling are all ISA-conformant. The live interpreter
|
||
arm properly evaluates the TO field (replacing the old unconditional-trap stub). Three MEDIUM issues
|
||
found: PC ordering on trap return, missing LEV dispatch for `sc`, and the Xbox 360 typed-trap
|
||
convention (`twi 31, r0, IMM`) not handled. Two LOW findings for stale manual snapshots and test gaps.
|
||
|
||
### PPCBUG-063 — `ctx.pc` already at CIA+4 when `StepResult::Trap` returns
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: interpreter.rs:1543 (`ctx.pc += 4`) before interpreter.rs:1549 (`return StepResult::Trap`)
|
||
- **Symptom**: any trap handler that reads `ctx.pc` to find the faulting instruction sees CIA+4 instead
|
||
of CIA. The existing `tracing::warn!` compensates with `.wrapping_sub(4)`, confirming the asymmetry.
|
||
On real hardware, SRR0 = CIA (trapping instruction address). Current risk LOW (no handler inspects
|
||
pc), but HIGH if any SEH/exception-delivery path is added (critical for the C++ throw investigation).
|
||
- **Fix**: save CIA before incrementing, restore it when firing the trap:
|
||
```rust
|
||
let trap_pc = ctx.pc;
|
||
ctx.pc += 4;
|
||
if fired { ctx.pc = trap_pc; return StepResult::Trap; }
|
||
```
|
||
Alternatively store CIA in a separate `ctx.srr0`-equivalent field and leave `ctx.pc` at NIA.
|
||
- **Note**: `sc` correctly leaves `ctx.pc` at NIA (the return address) — that is a different and
|
||
correct design choice. The inconsistency between sc and trap is the bug.
|
||
|
||
### PPCBUG-064 — `sc` ignores `LEV` field; `sc 2` (HVcall) silently misdispatched
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: interpreter.rs:915-918
|
||
- **Symptom**: `sc 2` (Xbox 360 hypervisor call) returns `StepResult::SystemCall` identically to
|
||
`sc 0`. Canary dispatches LEV=0 to `syscall_handler` and LEV=2 to `f.function()` (the HVcall
|
||
path). For pure game-title code (LEV=0 only) this is invisible; XDK kernel-mode components and
|
||
some HV-aware titles may use `sc 2`.
|
||
- **Fix**: decode the 7-bit LEV field (bits 20-26 of SC-form encoding), add a `HypervisorCall`
|
||
variant to `StepResult`, and dispatch accordingly.
|
||
|
||
### PPCBUG-065 — `twi 31, r0, IMM` typed-trap not handled; SIMM type code discarded
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: interpreter.rs:1532-1551 (trap arm)
|
||
- **Symptom**: `twi 31, r0, IMM` (TO=31=unconditional, RA=r0) is used by the Xbox 360 CRT/kernel
|
||
to encode typed C++ exceptions — the 16-bit SIMM carries the exception type discriminator. xenia-rs
|
||
fires the trap correctly but discards SIMM. The caller sees a generic `StepResult::Trap` with no
|
||
type information, preventing correct C++ SEH dispatch.
|
||
- **Canary reference**: `ppc_emit_control.cc:611-616` special-cases `RA==0 && TO==31` and calls
|
||
`f.Trap(type)` with the SIMM as the type code.
|
||
- **Fix**: add a `trap_type: Option<u16>` payload to `StepResult::Trap`. Detect `twi` with `to()==31`
|
||
and `ra()==0` and populate it with `instr.simm16() as u16`.
|
||
- **Note**: directly relevant to the Sylpheed `std::runtime_error` throw investigation
|
||
(project_xenia_rs_sylpheed_throw_2026_04_28.md) — the typed-trap SIMM carries the CRT exception
|
||
class that the kernel uses to route to the correct handler.
|
||
|
||
### PPCBUG-066 — Stale frozen snapshots in ppc-manual for td/tdi/tw/twi
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P7 manual regen, 2026-05-02)
|
||
- **Location**: `ppc-manual/branch/td.md`, `tdi.md`, `tw.md`, `twi.md`
|
||
- **Symptom**: all four show the old unconditional-trap stub (`// For now, just trace and continue`)
|
||
instead of the current TO-field-evaluating implementation.
|
||
- **Fix**: regenerate after PPCBUG-063 and PPCBUG-065 are resolved.
|
||
|
||
### PPCBUG-067 — Test gaps for trap and sc
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs `#[cfg(test)] mod tests`
|
||
- **Missing coverage**: `sc` smoke test (fires SystemCall, advances PC); `td` vs `tw` on 64-bit-clean
|
||
operands (width discrimination); `tdi`/`td` signed/unsigned LT/GT conditions; `tw 31, r0, r0`
|
||
unconditional `trap` encoding; `twi 31, r0, N` typed-trap; negative simm16 in `twi`.
|
||
|
||
---
|
||
|
||
## Batch 3 — SPR / MSR / TB / FPSCR / VSCR moves (group 16)
|
||
|
||
Per-group report: `audit-out/group-16-spr-msr.md`.
|
||
|
||
Group 16 summary: the core paths are clean — `mfcr`, `mtcrf`, `mfspr`, `mtspr`, `mftb`, `mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`, `mfvscr`, `mtvscr` are all functionally ISA-correct. The `spr()` decoder accessor correctly inverts the PPC XFX half-swap encoding. The one MEDIUM finding is `mtmsrd` silently ignoring the `L=1` partial-MSR-write semantics. Five LOW test-gap findings cover near-total absence of unit tests for this entire group.
|
||
|
||
### PPCBUG-078 — `mtmsrd` L=1 partial-MSR-write not modelled
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1458-1461`
|
||
- **Symptom**: xenia-rs merges `mtmsr` and `mtmsrd` into a single body that unconditionally writes `ctx.msr = ctx.gpr[instr.rs()]`. PowerISA specifies that `mtmsrd` with instruction bit 15 (`L`) = 1 performs a partial update: only `MSR[EE]` (u64 bit 15) and `MSR[RI]` (u64 bit 0) are modified; all other MSR bits preserved. Kernel code using `mtmsrd L=1` to re-enable external interrupts silently corrupts the entire MSR in xenia-rs. Canary acknowledges the same TODO.
|
||
- **Fix**:
|
||
```rust
|
||
PpcOpcode::mtmsrd => {
|
||
let l = (instr.raw >> (31 - 15)) & 1;
|
||
if l == 1 {
|
||
let mask: u64 = (1u64 << 15) | 1u64;
|
||
let rs = ctx.gpr[instr.rs()];
|
||
ctx.msr = (ctx.msr & !mask) | (rs & mask);
|
||
} else {
|
||
ctx.msr = ctx.gpr[instr.rs()];
|
||
}
|
||
ctx.pc += 4;
|
||
}
|
||
```
|
||
- **Test gap**: zero tests for `mtmsr` or `mtmsrd`.
|
||
|
||
### PPCBUG-079 — `mtspr` silent drop of unknown-SPR writes without value logging
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1430-1433`
|
||
- **Symptom**: Unknown SPR writes are silently discarded with only a `tracing::warn!()` that omits the value being written. Reduces debuggability; no correctness impact for known Xbox 360 titles.
|
||
- **Fix** (optional): `tracing::warn!("mtspr: unimplemented SPR {} <= 0x{:016x}", spr, val)`.
|
||
|
||
### PPCBUG-080 — `mfvscr` does not zero the upper 96 bits of VD per ISA
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: `interpreter.rs:2198-2201`
|
||
- **Symptom**: ISA requires `mfvscr VD` to place VSCR in the rightmost word of VD and zero bytes 0-11. xenia-rs copies the full 128-bit `ctx.vscr` into `ctx.vr[VD]`, leaving stale data in bytes 0-11 if `ctx.vscr` was populated from a non-zeroed vector. Canary explicitly zero-extends.
|
||
- **Fix**:
|
||
```rust
|
||
PpcOpcode::mfvscr => {
|
||
let vscr_word = ctx.vscr.as_u32x4()[3];
|
||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array([0, 0, 0, vscr_word]);
|
||
ctx.pc += 4;
|
||
}
|
||
```
|
||
|
||
### PPCBUG-081 — Zero unit tests for `mfcr` / `mtcrf`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1436-1453`
|
||
- **Recommended additions**: full mfcr round-trip; `mtcrf 0xFF`; `mtcrf 0x80` (CR0 only); `mtcrf 0x38` (ABI CR2|CR3|CR4 restore).
|
||
|
||
### PPCBUG-082 — Minimal unit tests for `mfspr` / `mtspr`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1376-1435`
|
||
- **Note**: only DEC and TBL_WRITE covered; add LR, CTR, XER, TBL/TBU, VRSAVE.
|
||
|
||
### PPCBUG-083 — Zero unit tests for `mftb`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1462-1470`
|
||
|
||
### PPCBUG-084 — Zero interpreter-level round-trip tests for FPSCR move instructions
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs:2678-2720`
|
||
- **Note**: `fpscr.rs` helper-level tests exist; interpreter dispatch (`mffsx`, `mtfsfx`, `mtfsb0x`, `mtfsb1x`, `mtfsfix`) is untested end-to-end.
|
||
|
||
### PPCBUG-085 — Zero unit tests for `mfvscr` / `mtvscr`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs:2198-2205`
|
||
|
||
IDs PPCBUG-086 and PPCBUG-087 are unallocated — reserved for group 16 follow-up findings.
|
||
|
||
---
|
||
|
||
## Batch 3 — cache + sync (group 17)
|
||
|
||
Per-group report: `audit-out/group-17-cache-sync.md`.
|
||
|
||
Group 17 summary: the cleanest group audited so far. Both `dcbz` and `dcbz128` have correct EA computation (ra=0 special case, 64-bit→u32 truncation, alignment masks `& !31` / `& !127`, byte counts 32/128). The nine no-op opcodes (dcbf, dcbi, dcbst, dcbt, dcbtst, icbi, sync, eieio, isync) are all listed in one arm and complete. The `dcbz128` Xbox 360 specific opcode (RT=1 bit distinguishes from dcbz) dispatches correctly. **0 HIGH, 0 MEDIUM, 2 LOW** findings.
|
||
|
||
### PPCBUG-088 — sync disasm ignores L field; `lwsync` (L=1) shows as "sync"
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `xenia-rs/crates/xenia-cpu/src/disasm.rs:364`
|
||
- **Symptom**: The `PpcOpcode::sync` disasm arm outputs `"sync"` unconditionally regardless of the L field (PPC bit 10). When L=1 (word `0x7C2004AC`), the instruction should disassemble as `"lwsync"`. The `extended_mnemonics.json` golden already accepts `"sync"` as output for the lwsync case, meaning the test currently passes with the wrong string.
|
||
- **Impact**: Disassembly output for `lwsync` (very common in Xbox 360 acquire-barrier idioms) shows as `sync`. No interpreter impact; both L=0 and L=1 are correctly treated as no-op PC advance.
|
||
- **Fix**:
|
||
```rust
|
||
PpcOpcode::sync => {
|
||
// L field at PPC bit 10
|
||
if extract_bits(instr.raw, 10, 10) == 1 {
|
||
base("lwsync", String::new(), 0)
|
||
} else {
|
||
base("sync", String::new(), 0)
|
||
}
|
||
}
|
||
```
|
||
Update `extended_mnemonics.json` golden to add `"ext_mnemonic": "lwsync"` for that entry.
|
||
|
||
### PPCBUG-089 — Zero interpreter execution tests for group 17
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `xenia-rs/crates/xenia-cpu/src/interpreter.rs` (test module)
|
||
- **Symptom**: No `#[test]` covers `dcbz`, `dcbz128`, or any no-op (sync/isync/eieio/dcbf/icbi). A regression in dcbz byte count or alignment would go undetected.
|
||
- **Recommended additions**: `dcbz` with misaligned address (verifies 32-byte aligned zero), `dcbz128` with misaligned address (verifies 128-byte aligned zero), both ra=0 and ra!=0 cases, `sync`/`isync`/`dcbf` no-op PC-advance smoke tests.
|
||
|
||
---
|
||
|
||
## Batch 3 — CR logical + CR moves (group 15)
|
||
|
||
Per-group report: `audit-out/group-15-cr-logical.md`.
|
||
|
||
Group 15 summary: **cleanest group audited to date**. All 8 CR logical ops (`crand`, `crandc`,
|
||
`creqv`, `crnand`, `crnor`, `cror`, `crorc`, `crxor`), `mcrf`, and `mcrxr` are ISA-correct.
|
||
The `cr_logical` helper's use of `fn(bool, bool) -> bool` prevents the `!u64` bit-pollution class
|
||
(PPCBUG-028–031 in group 7). CR bit indexing in `get_cr_bit`/`set_cr_bit` is correct (bit/4 =
|
||
field, bit%4 = within-field sub-index matching PPC MSB-0 numbering, with sub `{0=LT, 1=GT, 2=EQ,
|
||
3=SO}`). `mcrxr` correctly maps XER{SO,OV,CA} to CR{LT,GT,EQ} with SO=false and unconditionally
|
||
clears the XER bits. `mcrfs` nibble extraction, field shift formula (`28 - crfs*4`), and
|
||
CLEARABLE_MASK (all 14 ISA-clearable exception bits, no FEX/VX) are all correct. One MEDIUM ISA
|
||
violation: `mcrfs` omits VX summary recomputation. Two LOW findings: a misleading test comment and
|
||
zero coverage for all 8 CR logical ops + `mcrf`.
|
||
|
||
### PPCBUG-068 — `mcrfs` does not recompute VX summary bit after clearing VX* exception bits
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: `interpreter.rs:4250` (`ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK)`)
|
||
- **Symptom**: When `mcrfs` clears VX* exception bits (VXSNAN, VXISI, VXIDI, VXZDZ, VXIMZ,
|
||
VXVC, VXSOFT, VXSQRT, VXCVI) from any source field, the VX summary bit (FPSCR[2], `fpscr::VX
|
||
= 1<<29`) is left stale. If those VX* bits were the only contributors to VX, it should become
|
||
0 but remains 1. A subsequent `mcrfs cr0, 0` will then report VX=1 in CR0.EQ, misleading the
|
||
caller into thinking an invalid-operation exception is still active.
|
||
- **Fix**:
|
||
```rust
|
||
// After ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK); add:
|
||
if (ctx.fpscr & fpscr::VX_ALL) != 0 {
|
||
ctx.fpscr |= fpscr::VX;
|
||
} else {
|
||
ctx.fpscr &= !fpscr::VX;
|
||
}
|
||
// FEX recomputation omitted — xenia doesn't model enabled-exception dispatch.
|
||
```
|
||
- **Test gap**: existing test only covers crfS=0 (FX+OX) — no VX* bits involved. Add a test
|
||
that sets only VXSNAN, runs `mcrfs cr0, 1`, then verifies VX is now 0.
|
||
|
||
### PPCBUG-069 — `mcrfs` test comment claims OX(so)=0 but OX is set in the test
|
||
|
||
- **Severity**: LOW (cosmetic; the assert is correct, only the comment is wrong)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:5402`
|
||
- **Symptom**: Comment reads `"FX(lt)=1 and OX(so)=0"`. FPSCR was set to `(1<<31)|(1<<28)`,
|
||
which sets both FX and OX. The nibble is `0b1001`, so `so=true`. The assert `cr[2].as_u8()
|
||
== 0b1001` is correct; only the comment is wrong.
|
||
- **Fix**: `// FX(lt)=1, FEX(gt)=0, VX(eq)=0, OX(so)=1 → 0b1001 = 9`
|
||
|
||
### PPCBUG-070 — Zero execution tests for all 8 CR logical ops and `mcrf`
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Locations**: `interpreter.rs:1473–1484`
|
||
- **Missing minimum**: `crclr` idiom (`crxor BT,BT,BT`, BT=1 → 0), `crset` idiom
|
||
(`creqv BT,BT,BT`, BT=0 → 1), `crmove` idiom (`cror BT,BA,BA`), `crnot` idiom
|
||
(`crnor BT,BA,BA`, BA=1 → 0), cross-field `crand`/`crandc`, and a full `mcrf
|
||
cr0, cr3` field-copy + source-field-intact test.
|
||
|
||
---
|
||
|
||
## Pre-pass hints REFUTED by audit
|
||
|
||
These were flagged by the orchestrator's regex scan but the subagents found them to be safe:
|
||
|
||
- **`divwux` writeback** (interpreter.rs:390) — both operands cast to `u32` before division, `as u64` zero-extends correctly. **Clean.**
|
||
- **`mulhwx` intermediate cast** (interpreter.rs:349) — `((result >> 32) as i32 as i64 as u64) & 0xFFFF_FFFF` is redundant but the trailing mask saves correctness. Cosmetic only.
|
||
- **`mulhwux` writeback** (interpreter.rs:359) — `(result >> 32) & 0xFFFF_FFFF` clean unsigned. Clean.
|
||
- **CR0 stale-prepass-claim**: pre-pass document mentioned `result as i32 as i64`; live code actually uses `result as i64` — so the *claim that the live form is i64* is **correct**, but the prepass implied an i32 form was already there. PPCBUG-020 is the real finding.
|
||
|
||
---
|
||
|
||
## Batch 4 — load float (group 23)
|
||
|
||
Per-group report: `audit-out/group-23-load-float.md`.
|
||
|
||
Group 23 summary: the double-precision load family (`lfd`, `lfdu`, `lfdux`, `lfdx`) is fully
|
||
ISA-correct — EA computation, endianness, update-form writeback, and bit-pattern fidelity are
|
||
all clean. The single-precision family (`lfs`, `lfsu`, `lfsux`, `lfsx`) has one HIGH bug:
|
||
Rust's `as f64` float cast compiles to x86 `CVTSS2SD` which unconditionally sets the IEEE quiet
|
||
bit in the output, silently converting f32 SNaN loads to f64 QNaN. The ISA requires the SNaN
|
||
to pass through unchanged. FPSCR.NI does not apply to loads (correct by omission). One LOW
|
||
test-gap finding. **2 IDs used (PPCBUG-128, PPCBUG-129). 8 IDs unallocated (PPCBUG-130..137).**
|
||
|
||
### PPCBUG-128 — lfs/lfsu/lfsx/lfsux silently quieten SNaN via `as f64` Rust float cast
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1064 (lfs), 1070 (lfsx), 1087 (lfsu), 1093 (lfsux)
|
||
- **Symptom**: All four single-precision load arms use `mem.read_f32(ea) as f64` where
|
||
`read_f32` = `f32::from_bits(read_u32(ea))`. The `as f64` Rust float cast compiles to x86
|
||
`CVTSS2SD`, which unconditionally sets bit 51 of the f64 mantissa (the IEEE quiet/signalling
|
||
discriminator bit) for any NaN input. An f32 SNaN (e.g. `0x7F800001`) is loaded and written
|
||
to the FPR as the f64 QNaN `0x7FF8000002000000` instead of the SNaN `0x7FF0000002000000`.
|
||
|
||
**ISA requirement**: "A signalling NaN passes through unchanged into the FPR — it will signal
|
||
at the next FP arithmetic instruction." (lfs.md Special Cases). The FPR must hold the SNaN;
|
||
VXSNAN fires at the consuming arithmetic op, not at the load.
|
||
|
||
**Impact**: (a) Game code storing f32 SNaN sentinels (physics engines mark unset float slots
|
||
with SNaN) and then loading+inspecting them: `fpscr::is_snan(ctx.fpr[rd])` returns false
|
||
after the load, breaking sentinel detection. (b) Arithmetic ops consuming the loaded value
|
||
see a QNaN rather than SNaN, so VXSNAN is never set; games relying on VXSNAN to detect
|
||
uninitialized-read bugs get false negatives.
|
||
|
||
- **Canary parity**: Canary's JIT also uses CVTSS2SD via `f.Convert()`. Both emulators share
|
||
this deviation. The bug is a structural consequence of using semantic float widening rather
|
||
than a bit-pattern-preserving widening routine.
|
||
- **Fix**: replace the float cast with a bit-manipulation widening that preserves the SNaN bit:
|
||
```rust
|
||
fn widen_f32_bits_to_f64(raw32: u32) -> u64 {
|
||
let sign = ((raw32 >> 31) as u64) << 63;
|
||
let exp32 = ((raw32 >> 23) & 0xFF) as i32;
|
||
let mant32 = (raw32 & 0x007F_FFFF) as u64;
|
||
if exp32 == 0xFF {
|
||
// NaN or Infinity — propagate mantissa left-shifted by 29 bits.
|
||
// SNaN (bit22=0) stays SNaN (bit51=0); QNaN (bit22=1) stays QNaN (bit51=1).
|
||
sign | (0x7FFu64 << 52) | (mant32 << 29)
|
||
} else if exp32 == 0 {
|
||
// ±Zero or subnormal f32.
|
||
if mant32 == 0 { return sign; } // ±zero
|
||
// Subnormal: normalize by finding leading bit, then adjust exponent.
|
||
let shift = mant32.leading_zeros() - (64 - 23);
|
||
let exp64 = (1023u64 - 126).wrapping_sub(shift as u64);
|
||
let mant64 = (mant32 << (shift + 1 + 29)) & 0x000F_FFFF_FFFF_FFFF;
|
||
sign | (exp64 << 52) | mant64
|
||
} else {
|
||
// Normal f32 → normal f64.
|
||
let exp64 = (exp32 as u64) - 127 + 1023;
|
||
sign | (exp64 << 52) | (mant32 << 29)
|
||
}
|
||
}
|
||
// In each lfs* arm:
|
||
ctx.fpr[instr.rd()] = f64::from_bits(widen_f32_bits_to_f64(mem.read_u32(ea)));
|
||
```
|
||
This function also correctly handles subnormal f32 → normal f64 widening (which the `as f64`
|
||
cast already gets right numerically, but now goes through a consistent code path).
|
||
- **Test gap**: add a test loading an f32 SNaN (`0x7F800001`) via `lfs` and asserting
|
||
`fpscr::is_snan(ctx.fpr[rd])` is `true` and bit 51 of `ctx.fpr[rd].to_bits()` is 0.
|
||
|
||
### PPCBUG-129 — Zero interpreter execution tests for all 8 float-load opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Locations**: interpreter.rs test module; `tests/disasm_goldens.rs:249-250` (disasm-only)
|
||
- **Symptom**: No `#[test]`-decorated function exercises any float-load interpreter arm.
|
||
A regression in EA computation, endianness, f32→f64 widening, or update-form writeback
|
||
would go undetected. The SNaN bug (PPCBUG-128) was undetected partly due to this gap.
|
||
- **Recommended minimum**:
|
||
1. `lfs` normal: `0x3F800000` (1.0f32) → assert `fpr[rd] == 1.0f64` exact.
|
||
2. `lfs` negative displacement: base minus 4.
|
||
3. `lfs` ra=0 path (absolute addressing).
|
||
4. `lfd` normal: store PI bits, assert exact bit equality via `.to_bits()`.
|
||
5. `lfd` SNaN: store `0x7FF0_0000_0000_0001u64`, assert exact bit equality after load.
|
||
6. `lfsu` / `lfsux` / `lfdu` / `lfdux`: verify loaded FPR value AND rA update address.
|
||
7. After PPCBUG-128 fix: `lfs` SNaN round-trip test.
|
||
|
||
IDs PPCBUG-130 through PPCBUG-137 are unallocated — no further bugs found in group 23.
|
||
|
||
---
|
||
|
||
## Files modified by the audit
|
||
|
||
- `xenia-rs/audit-prepass-findings.md` — Phase A pre-pass red flags (orchestrator regex output).
|
||
- `xenia-rs/audit-out/group-01-add-imm.md` — Group 1 report (Sonnet subagent).
|
||
- `xenia-rs/audit-out/group-02-add-reg.md` — Group 2 report.
|
||
- `xenia-rs/audit-out/group-03-sub-reg.md` — Group 3 report.
|
||
- `xenia-rs/audit-out/group-04-multiply.md` — Group 4 report.
|
||
- `xenia-rs/audit-out/group-05-divide.md` — Group 5 report.
|
||
- `xenia-rs/audit-out/group-06-logic-imm.md` — Group 6 report.
|
||
- `xenia-rs/audit-out/group-09-word-rotate.md` — Group 9 report.
|
||
- `xenia-rs/audit-out/group-13-branch.md` — Group 13 report.
|
||
- `xenia-rs/audit-out/group-14-trap-sc.md` — Group 14 report.
|
||
- `xenia-rs/audit-out/group-15-cr-logical.md` — Group 15 report.
|
||
- `xenia-rs/audit-out/group-16-spr-msr.md` — Group 16 report.
|
||
- `xenia-rs/audit-out/group-17-cache-sync.md` — Group 17 report.
|
||
- `xenia-rs/audit-out/group-18-load-byte.md` — Group 18 report.
|
||
- `xenia-rs/audit-out/group-19-load-halfword.md` — Group 19 report.
|
||
- `xenia-rs/audit-out/group-21-load-doubleword.md` — Group 21 report.
|
||
- `xenia-rs/audit-out/group-22-load-mlsr.md` — Group 22 report.
|
||
- `xenia-rs/audit-out/group-23-load-float.md` — Group 23 report.
|
||
- `xenia-rs/audit-out/group-24-store-byte-half.md` — Group 24 report.
|
||
- `xenia-rs/audit-out/group-26-store-doubleword.md` — Group 26 report.
|
||
- `xenia-rs/audit-findings.md` — this consolidated tracker.
|
||
|
||
**No source code under `xenia-rs/crates/` has been modified.**
|
||
|
||
---
|
||
|
||
## Batch 4 — load byte (group 18)
|
||
|
||
Per-group report: `audit-out/group-18-load-byte.md`.
|
||
|
||
Group 18 summary: **cleanest group audited to date — zero HIGH or MEDIUM bugs.** All four opcodes
|
||
(`lbz`, `lbzu`, `lbzx`, `lbzux`) are ISA-correct: EA computation (rA=0 special case, D-field
|
||
sign-extension, 32-bit EA truncation), zero-extension of the byte result to 64 bits, and
|
||
update-form writeback all match the ISA spec and Canary cross-reference. Two LOW findings only.
|
||
|
||
### PPCBUG-090 — lbzu/lbzux: rD==rA "invalid form" silently misloads rD
|
||
|
||
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:951-956 (lbzu), 963-968 (lbzux)
|
||
- **Symptom**: When `rD == rA` (invalid form, UISA undefined), the byte load into `gpr[rD]` at
|
||
line 953/965 is immediately overwritten by the EA writeback at line 954/966. Net result:
|
||
`gpr[rD]` holds the EA, not the loaded byte. Canary has the same behaviour. No practical impact
|
||
under normal compiler output.
|
||
- **Recommendation**: add `debug_assert!(instr.rd() != instr.ra())` in debug builds.
|
||
|
||
### PPCBUG-091 — Zero interpreter execution tests for all four lbz* opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs test module; disasm_goldens.rs:247 (disasm-only, no execution)
|
||
- **Symptom**: No `#[test]` exercises lines 945-968. A regression in EA computation,
|
||
zero-extension, or the update writeback would go undetected.
|
||
- **Recommended minimum**: `lbz` with ra=0 + negative displacement; `lbzu` normal case (verify
|
||
both byte result and rA update); `lbzx` with ra=0; `lbzux` normal case. Each test should
|
||
assert `gpr[rD] <= 0xFF` to catch any future accidental sign-extension.
|
||
|
||
IDs PPCBUG-092, PPCBUG-093, PPCBUG-094 are unallocated — no further bugs found in group 18.
|
||
|
||
---
|
||
|
||
## Batch 4 — load halfword (group 19)
|
||
|
||
Per-group report: `audit-out/group-19-load-halfword.md`.
|
||
|
||
Group 19 summary: **4 HIGH bugs confirmed — all pre-pass flags validated.** The four `lha*` opcodes
|
||
(`lha`, `lhax`, `lhau`, `lhaux`) all use `as i16 as i64 as u64`, sign-extending a negative halfword
|
||
to 64 bits in violation of the 32-bit ABI. Every negative halfword load (common for `int16_t` PCM
|
||
samples, packed vertex deltas, `short[]` arrays) actively poisons the upper 32 bits of the
|
||
destination GPR — identical shape to the `addis` bug. The four `lhz*` opcodes and `lhbrx` are all
|
||
clean (`as u64` zero-extension; `swap_bytes() as u64` byte-reversal; correct endian handling; correct
|
||
EA computation and update writebacks). Two LOW findings: rD==rA invalid-form in update variants,
|
||
and zero unit tests for all nine opcodes.
|
||
|
||
### PPCBUG-095 — `lha`: GPR writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:990
|
||
- **Symptom**: `mem.read_u16(ea) as i16 as i64 as u64` — memory `0x8000` writes
|
||
`0xFFFFFFFF_FFFF8000` instead of `0x00000000_FFFF8000`. Active GPR poisoning for every
|
||
negative halfword. Common trigger: `int16_t` struct fields, PCM samples, packed vertex deltas.
|
||
- **Fix**:
|
||
```rust
|
||
ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64;
|
||
```
|
||
- **Test gap**: zero unit tests. Add: memory `0x8000` → `gpr[rD] == 0x00000000_FFFF8000`;
|
||
memory `0x7FFF` → `gpr[rD] == 0x00000000_00007FFF`.
|
||
|
||
### PPCBUG-096 — `lhax`: GPR writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:996
|
||
- **Symptom**: identical to PPCBUG-095. Indexed form emitted for array access with GPR index.
|
||
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
|
||
- **Test gap**: zero unit tests.
|
||
|
||
### PPCBUG-097 — `lhau`: GPR writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:1007
|
||
- **Symptom**: identical to PPCBUG-095. Update form emitted for auto-incrementing `short[]` loops;
|
||
poison accumulates across all iterations.
|
||
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
|
||
- **Test gap**: zero unit tests. Add: verify both `gpr[rD]` (upper-32 = 0) and `gpr[rA]` (EA update).
|
||
|
||
### PPCBUG-098 — `lhaux`: GPR writeback sign-extends to 64 bits
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Location**: interpreter.rs:1013
|
||
- **Symptom**: identical to PPCBUG-095, update+indexed form.
|
||
- **Fix**: `mem.read_u16(ea) as i16 as i32 as u32 as u64`
|
||
- **Test gap**: zero unit tests.
|
||
- **Note**: PPCBUG-095..098 are the same one-line fix at four sites. Fix session sweep:
|
||
`rg -n 'as i16 as i64 as u64' interpreter.rs` finds exactly these four lines.
|
||
|
||
### PPCBUG-099 — `lhau`/`lhaux`: rD==rA invalid-form silently destroys load result
|
||
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this encoding)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1005-1016
|
||
- **Symptom**: same as PPCBUG-090 (`lbzu`/`lbzux`) — EA writeback overwrites `gpr[rD]` when
|
||
`rD == rA`. Net: `gpr[rD]` holds EA, not the loaded value.
|
||
- **Recommendation**: `debug_assert!(instr.rd() != instr.ra())` in both arms.
|
||
|
||
### PPCBUG-100 — Zero execution tests for all nine halfword-load opcodes
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs test module
|
||
- **Symptom**: No `#[test]` exercises any of the 9 opcodes. The HIGH sign-extension bug would
|
||
have been caught by any test that checks `gpr[rD] <= 0x0000_0000_FFFF_FFFF`.
|
||
- **Recommended minimum**: `lha` with negative halfword (assert upper 32 zero), `lhz` same,
|
||
`lhau` verify both rD and rA, `lhzux` verify both rD and rA, `lhbrx` verify byte-swap.
|
||
|
||
IDs PPCBUG-101, PPCBUG-102, PPCBUG-103, PPCBUG-104 are unallocated — no further bugs found in group 19.
|
||
|
||
---
|
||
|
||
## Batch 4 — load word (group 20)
|
||
|
||
Per-group report: `audit-out/group-20-load-word.md`.
|
||
|
||
Group 20 summary: **1 HIGH bug (reservation invalidation never called), 1 MEDIUM (cross-thread
|
||
reservation isolation), 1 MEDIUM (lwa 64-bit sign-extension hazard), 3 LOW test gaps.** The
|
||
zero-extending family (`lwz`/`lwzu`/`lwzx`/`lwzux`) is entirely correct — `mem.read_u32(ea) as u64`
|
||
cleanly zero-extends; EA computation, update writebacks, and RA0 handling all match ISA and Canary.
|
||
`lwbrx` is correct: the double-swap (`from_be_bytes` then `swap_bytes()`) correctly produces a
|
||
little-endian word read, zero-extended. The sign-extending family (`lwa`/`lwax`/`lwaux`) is
|
||
ISA-correct for 64-bit mode but a 32-bit-ABI hazard — classified MEDIUM because `lwa` is a
|
||
64-bit-mode instruction unlikely to appear in Xbox 360 32-bit-ABI binaries. The HIGH finding is
|
||
that `ReservationTable::invalidate_for_write` is defined and unit-tested but **never called** from
|
||
any store instruction, breaking multi-threaded `lwarx`/`stwcx.` atomicity under `--parallel`.
|
||
|
||
### PPCBUG-105 — lwa / lwax / lwaux sign-extend to 64 bits; 32-bit-ABI hazard
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P4 d945aea, 2026-05-02)
|
||
- **Locations**: interpreter.rs:1032 (lwa), 1038 (lwax), 1043 (lwaux)
|
||
- **Symptom**: `mem.read_u32(ea) as i32 as i64 as u64` — a word with high bit set (e.g. `0x8000_0000`)
|
||
writes `0xFFFF_FFFF_8000_0000` to rD. ISA-correct for 64-bit-mode `lwa`. In 32-bit ABI, the poisoned
|
||
upper 32 bits produce wrong CA / CR results in downstream 64-bit unsigned compares — same shape as
|
||
the `addis` bug.
|
||
- **Likelihood**: LOW on real Xbox 360 32-bit-ABI binaries (compilers use `lwz` for word loads; `lwa`
|
||
is a 64-bit-mode instruction). Risk elevated if the binary contains 64-bit-mode kernel code.
|
||
- **Note**: Canary also uses `SignExtend(..., INT64_TYPE)` — both are ISA-correct. Pre-pass flagged
|
||
HIGH; audit downgrades to MEDIUM because `lwa` is unlikely in 32-bit-ABI Xbox 360 code.
|
||
|
||
### PPCBUG-106 — lwa no-update-form undocumented (LOW / informational)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1029-1034
|
||
- **Symptom**: `lwa` arm has no RA writeback. Correct per ISA (no `lwau` in PowerISA). Undocumented.
|
||
- **Fix**: add comment `// No lwau in PowerISA; lwa is DS-form non-update only.`
|
||
|
||
### PPCBUG-107 — `invalidate_for_write` never called from stores; lwarx/stwcx. atomicity broken under `--parallel` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: `reservation.rs:234` (definition, never called from interpreter); `interpreter.rs:1182-1278` (all store arms, none call it)
|
||
- **Symptom**: `ReservationTable::invalidate_for_write(addr)` is defined and correctly unit-tested but
|
||
no interpreter store arm calls it. Under M3 `--parallel` with the table enabled, a plain `stw` by
|
||
thread B to a cache line reserved by thread A does NOT clear thread A's table slot. Thread A's
|
||
subsequent `stwcx.` calls `t.try_commit()`, which succeeds — spurious success, violating
|
||
store-conditional atomicity. All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic
|
||
counters) built on `lwarx`/`stwcx.` are broken in multi-threaded mode.
|
||
- **Concrete scenario**: thread A: `lwarx r3, 0, r4` (reserves line). Thread B: `stw r5, 0(r4)`
|
||
(same address; should invalidate). Thread A: `stwcx. r6, 0, r4` → should fail (CR0.EQ=0) but
|
||
succeeds (CR0.EQ=1). Thread A's store silently overwrites thread B's store.
|
||
- **Fix**: in every store arm, before `mem.write_*`, add:
|
||
```rust
|
||
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
|
||
if t.has_active_reservers() { t.invalidate_for_write(ea); }
|
||
}
|
||
```
|
||
`has_active_reservers()` is a single `Relaxed` atomic load — negligible cost for non-atomic code
|
||
(common case returns false immediately). Alternative: inject the table into the memory layer so
|
||
`write_u32`/`write_u64` call it automatically.
|
||
- **Test gap**: add interpreter-level test: `lwarx` reserve a line, intervening `stw` to the same
|
||
line, `stwcx.` must fail (CR0.EQ=0).
|
||
|
||
### PPCBUG-108 — Legacy per-ctx reservation path: cross-thread invalidation impossible (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: interpreter.rs:1148-1153 (stwcx legacy path)
|
||
- **Symptom**: When table is None/disabled, reservation state lives in per-thread `PpcContext` fields.
|
||
A store by thread B cannot clear `ctx_A.has_reservation`. Safe in strict lockstep (one host thread).
|
||
Broken under real parallelism with the table inadvertently disabled.
|
||
- **Fix**: add a `debug_assert!` in `lwarx`/`stwcx.` that table is enabled when multiple host threads
|
||
are active. The M3 scheduler should always enable the table before spawning a second host thread.
|
||
|
||
### PPCBUG-109 — Zero unit tests for lwa / lwax / lwaux
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs test module
|
||
- **Recommended minimum**:
|
||
- `lwa` with `0x8000_0000` → `gpr[rD] == 0xFFFF_FFFF_8000_0000`.
|
||
- `lwa` with `0x7FFF_FFFF` → `gpr[rD] == 0x0000_0000_7FFF_FFFF`.
|
||
- `lwax` with ra=0.
|
||
- `lwaux`: verify loaded value and rA update.
|
||
|
||
### PPCBUG-110 — Zero unit tests for lwbrx
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs test module
|
||
- **Recommended minimum**: memory `[0x11, 0x22, 0x33, 0x44]` at EA → `gpr[rD] == 0x4433_2211`; ra=0;
|
||
assert `gpr[rD] <= 0xFFFF_FFFF`.
|
||
|
||
### PPCBUG-111 — lwarx / stwcx test suite missing key cases
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs:5167-5207 (two tests exist)
|
||
- **Missing**: `lwarx` ra=0; `stwcx.` without prior `lwarx` → CR0.EQ=0; second `lwarx` displaces
|
||
first; post-PPCBUG-107-fix store-invalidation test; `lwarx` zero-extension assertion.
|
||
|
||
IDs PPCBUG-112, PPCBUG-113, PPCBUG-114 are unallocated — reserved for group 20 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 4 — load doubleword (group 21)
|
||
|
||
Per-group report: `audit-out/group-21-load-doubleword.md`.
|
||
|
||
Group 21 summary: **cleanest load group audited — zero HIGH bugs.** All six instructions (`ld`,
|
||
`ldu`, `ldux`, `ldx`, `ldbrx`, `ldarx`) are ISA-correct: 64-bit load, big-endian byte order,
|
||
EA computation (RA=0, DS-form, u32 truncation), update-form writebacks, and reservation tracking
|
||
all pass scrutiny against Canary and the ISA spec. `ldbrx`'s double-swap pattern was investigated
|
||
and confirmed correct (PPCBUG-115 informational). One MEDIUM documentation finding, two LOW findings.
|
||
|
||
### PPCBUG-115 — `ldbrx` byte-swap confirmed correct (informational)
|
||
|
||
- **Severity**: LOW (confirmed clean, informational only)
|
||
- **Status**: wontfix
|
||
- **Location**: `interpreter.rs:4157-4159`
|
||
- **Analysis**: `mem.read_u64` uses `u64::from_be_bytes` internally (confirmed in `heap.rs:404`
|
||
and interpreter's `TestMem`), so it returns the BE-decoded value. Calling `.swap_bytes()`
|
||
re-reverses to give the LE interpretation, which is exactly what `ldbrx` specifies. Canary
|
||
achieves the same result by skipping `ByteSwap` at the HIR level. Both approaches are correct.
|
||
See per-group report for full byte-level worked example.
|
||
|
||
### PPCBUG-116 — `ld`/`ldx`/`ldu`/`ldux` as 32-bit-ABI poison sources (documentation)
|
||
|
||
- **Severity**: MEDIUM (awareness/documentation; no change to load instructions themselves)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1017-1058`
|
||
- **Symptom**: These instructions correctly write full 64-bit values to the destination GPR.
|
||
Xbox 360 32-bit-ABI binaries legitimately emit them for TOC loads, vtable loads, and kernel
|
||
structure accesses — all of which may have non-zero upper 32 bits. Until PPCBUG-001..089
|
||
arithmetic truncation fixes land, such values can flow into 64-bit compares and corrupt CA
|
||
bits and CR fields — the inverse of the `addis` bug (pollution from memory side vs. sign-ext).
|
||
- **Key guard already in place**: PPCBUG-007's `subfcx` CA fix truncates operands to u32 before
|
||
the compare, correctly handling `ld`-originated 64-bit values. This is the most critical
|
||
downstream consumer and the fix is already specified.
|
||
|
||
### PPCBUG-117 — Stale frozen snapshot in `ppc-manual/memory/ldarx.md`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P7 manual regen, 2026-05-02)
|
||
- **Location**: `ppc-manual/memory/ldarx.md` (frozen snapshot section)
|
||
- **Symptom**: Snapshot uses old field name `ctx.reserved_addr`; live code uses
|
||
`ctx.reserved_line = ea & !RESERVATION_MASK` (M3 refactor). Cosmetic only.
|
||
- **Fix**: Regenerate snapshot after M3 field names settle.
|
||
|
||
### PPCBUG-118 — Zero functional tests for `ld`, `ldx`, `ldu`, `ldux`, `ldbrx`
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: `test_ldarx_stdcx_pair` covers `ldarx`/`stdcx` only. Five doubleword load
|
||
variants are untested. Recommended minimum: `ld` with positive DS, negative DS, and RA=0;
|
||
`ldx` basic; `ldu` with RA writeback check; `ldux` with RA writeback check; `ldbrx` with
|
||
asymmetric data to distinguish output from plain `ldx`.
|
||
|
||
IDs PPCBUG-119 through PPCBUG-122 are unallocated — reserved for group 21 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 4 — load multiple/string (group 22)
|
||
|
||
Per-group report: `audit-out/group-22-load-mlsr.md`.
|
||
|
||
Group 22 summary: one structural HIGH bug (`lswx` is always a no-op due to missing XER TBC field),
|
||
one MEDIUM coupling bug (the write path discards TBC on `mtspr XER`), one MEDIUM ISA-form deviation
|
||
(`lmw` does not skip RA-in-range stores unlike Canary), and two LOW findings. The `lswi` body itself
|
||
is correct; `lmw` core logic (loop bound, zero-extension, byte-packing, register wraparound) is clean.
|
||
Zero unit tests across all three opcodes.
|
||
|
||
### PPCBUG-123 — `lswx` XER TBC field not modeled; always loads 0 bytes
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: `context.rs:235-237` (`xer()` method) + `interpreter.rs:4172`
|
||
- **Symptom**: `ctx.xer()` assembles only SO[31], OV[30], CA[29] — bits 0–28 are always zero.
|
||
`lswx` reads `ctx.xer() & 0x7F` expecting the XER TBC byte-count field at bits 0–6, but always
|
||
gets 0. The `while bytes_left > 0` loop never executes; **`lswx` is permanently a no-op** —
|
||
no bytes are loaded, no destination registers are written. The companion `stswx` at
|
||
`interpreter.rs:4191` has the identical pattern and is equally broken.
|
||
- **Root cause**: `PpcContext` has no `xer_tbc` field. Neither `xer()` nor `set_xer()` model
|
||
XER[25:31]. Any `mtspr XER, rN` that sets a non-zero byte count silently discards it (PPCBUG-124).
|
||
- **Cross-reference**: Canary marks `lswx` as `XEINSTRNOTIMPLEMENTED()` — xenia-rs implemented the
|
||
body but left the XER infrastructure incomplete.
|
||
- **Fix**:
|
||
1. Add `pub xer_tbc: u8` to `PpcContext`.
|
||
2. In `xer()`: `| (self.xer_tbc as u32)` for bits 0–6.
|
||
3. In `set_xer()`: `self.xer_tbc = (val & 0x7F) as u8`.
|
||
The `lswx` body is then correct as-is.
|
||
- **Test gap**: zero unit tests. After fix: `mtspr XER, r3` (r3=4) then `lswx r5, 0, r4` should
|
||
write exactly 4 bytes into r5 (high byte = first byte at EA).
|
||
|
||
### PPCBUG-124 — `set_xer()` discards TBC on `mtspr XER` (structural coupling to PPCBUG-123)
|
||
|
||
- **Severity**: MEDIUM (must land with PPCBUG-123)
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: `context.rs:239-244`
|
||
- **Symptom**: `set_xer()` writes only SO/OV/CA from the 32-bit value, silently discarding bits 0–28
|
||
(including the 7-bit TBC field). Any guest `mtspr XER, rN` with a non-zero byte count loses that
|
||
count; subsequent `lswx`/`stswx` see TBC=0. Fix is the same three-line change as PPCBUG-123.
|
||
|
||
### PPCBUG-125 — `lmw` missing RA-in-destination-range skip
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1515`
|
||
- **Symptom**: PowerISA declares `lmw rT, D(rA)` invalid when `rA` is in `[rT..31]`. Canary skips
|
||
the store to `rA` in that case (`if (i.D.RT + j == i.D.RA) continue`). xenia-rs pre-computes EA
|
||
before the loop (so EA values remain correct), but overwrites `rA` with the loaded word instead of
|
||
preserving it. Result differs from Canary for this invalid encoding. Any program that relies on RA
|
||
surviving a nominally invalid `lmw` will see the wrong value.
|
||
- **Fix**:
|
||
```rust
|
||
for r in instr.rd()..32 {
|
||
if r == instr.ra() { ea = ea.wrapping_add(4); continue; }
|
||
ctx.gpr[r] = mem.read_u32(ea as u32) as u64;
|
||
ea = ea.wrapping_add(4);
|
||
}
|
||
```
|
||
- **Test gap**: zero tests. Add: `lmw r28, 0(r28)` (RA=RT=28) — after fix, gpr[28] unchanged.
|
||
|
||
### PPCBUG-126 — `lswi` uses `instr.rb()` instead of `instr.nb()` for the NB field
|
||
|
||
- **Severity**: LOW (maintenance hazard, not a correctness bug)
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1340`
|
||
- **Symptom**: `instr.rb()` and `instr.nb()` both extract bits 16–20 and return identical values.
|
||
Using `rb()` misrepresents the operand as a register reference rather than a 5-bit immediate count.
|
||
The companion `stswi` at line 1359 has the same pattern. A future `rb()` type-system refactor
|
||
could break `lswi`/`stswi` silently.
|
||
- **Fix**: `instr.nb()` at both sites.
|
||
|
||
### PPCBUG-127 — Zero execution tests for lmw, lswi, lswx
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: No `#[test]` exists for any of the three opcodes. A regression in loop bounds,
|
||
byte-packing, EA computation, or the NB=0 special case would go undetected.
|
||
- **Recommended minimum**: `lmw r30, 0(r1)` (2-word load); `lswi r3, r4, 8` (2-word byte pack);
|
||
`lswi r31, r4, 8` (register wraparound → r31 and r0); `lswi r3, r4, 0` (NB=0→32 special case);
|
||
post-PPCBUG-123 fix: `lswx` with XER TBC=4 (1-word load), TBC=0 (no-op), TBC=5 (partial word).
|
||
|
||
---
|
||
|
||
## Batch 5 — store byte/halfword (group 24)
|
||
|
||
Per-group report: `audit-out/group-24-store-byte-half.md`.
|
||
|
||
Group 24 summary: **3 findings: 1 HIGH (cross-cutting reservation invalidation), 1 LOW/informational
|
||
(update-form zero-extension correct but undocumented), 1 LOW (zero test coverage).** EA computation,
|
||
value truncation (`as u8`, `as u16`), RA=0 special cases, update-form writeback zero-extension,
|
||
big-endian `mem.write_u16` path, and `sthbrx` byte-reverse logic are all ISA-correct. The single
|
||
HIGH finding is the systemic absence of `invalidate_for_write` calls — same class as PPCBUG-107,
|
||
now documented for all 9 byte/halfword store opcodes.
|
||
|
||
### PPCBUG-130 — All 9 store-byte/halfword opcodes missing `invalidate_for_write` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: `interpreter.rs:1207` (stb), `1213` (stbu), `1219` (stbx), `1225` (stbux),
|
||
`1231` (sth), `1237` (sthu), `1243` (sthx), `1249` (sthux), `1337` (sthbrx)
|
||
- **Class**: same root cause as PPCBUG-107 (stw/stdcx family — `invalidate_for_write` never called
|
||
from any store arm).
|
||
- **Symptom**: Under `--parallel`, a `stb`, `sth`, or `sthbrx` (or any variant in this group) to a
|
||
cache line reserved by another thread via `lwarx`/`ldarx` does NOT clear the table slot.
|
||
The reserving thread's subsequent `stwcx.`/`stdcx.` spuriously succeeds even though an
|
||
intervening sub-word store has modified the line — violating store-conditional atomicity. Affects
|
||
any lock-free protocol that uses byte or halfword stores adjacent to or inside a `lwarx`/`stwcx.`
|
||
loop (e.g. byte-level spinlocks, tagged-pointer updates, audio ring-buffer flags).
|
||
- **Fix** (per PPCBUG-107 pattern): before each `mem.write_u8/u16`, add:
|
||
```rust
|
||
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
|
||
if t.has_active_reservers() { t.invalidate_for_write(ea); }
|
||
}
|
||
```
|
||
- **Note**: PPCBUG-107 is the canonical parent finding. PPCBUG-130 documents that the byte/halfword
|
||
group must be included in the same fix sweep.
|
||
|
||
### PPCBUG-131 — Update-form rA zero-extension correct but undocumented (LOW / informational)
|
||
|
||
- **Severity**: LOW (informational — behavior is correct)
|
||
- **Status**: open (documentation gap)
|
||
- **Locations**: `interpreter.rs:1216` (stbu), `1228` (stbux), `1240` (sthu), `1252` (sthux)
|
||
- **Symptom**: Each update-form arm writes `ctx.gpr[instr.ra()] = ea as u64` where `ea: u32`.
|
||
This zero-extends to 64 bits — correct in the 32-bit ABI (addresses are 32-bit; upper half must
|
||
be zero). No bug, but there is no comment explaining the deliberate zero-extension. A maintainer
|
||
who computes EA as `u64` throughout and drops the `as u32` intermediate would silently
|
||
sign-extend negative displacements into rA, mirroring the `addis` bug shape.
|
||
- **Fix**: add comment `// EA is u32; zero-extend into rA (32-bit ABI: upper 32 bits must be 0).`
|
||
at each update-form writeback line.
|
||
|
||
### PPCBUG-132 — Zero unit tests for all 9 store-byte/halfword opcodes (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: No `test_stb*` or `test_sth*` functions exist. Any regression in EA computation,
|
||
value truncation, update-form writeback order, or `sthbrx` byte-swap logic would be invisible.
|
||
- **Recommended minimum**: `stb` basic + ra=0; `stbu`/`stbux` with rA writeback check; `stbx`
|
||
ra=0; `sth` big-endian byte check (`0xDEAD` → `[0xDE, 0xAD]`); `sthu`/`sthux` writeback;
|
||
`sthbrx` byte-reversed check (`0xDEAD` → `[0xAD, 0xDE]`); post-PPCBUG-130 fix: `lwarx` + `stb`
|
||
to same line + `stwcx.` → CR0.EQ=0.
|
||
|
||
IDs PPCBUG-133 through PPCBUG-139 are unallocated — reserved for group 24 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 5 — store word (group 25)
|
||
|
||
Per-group report: `audit-out/group-25-store-word.md`.
|
||
|
||
Group 25 summary: **8 findings: 4 HIGH (reservation invalidation per opcode), 0 MEDIUM, 4 LOW.**
|
||
Core arithmetic and semantics are entirely clean for all 6 opcodes. EA computation (RA=0 guards,
|
||
simm16 sign-extend, u32 truncation), value truncation (`as u32`), update-form writebacks
|
||
(`ea as u64` zero-extension), big-endian `mem.write_u32`, `stwbrx` byte-reversal, and `stwcx`
|
||
conditional-store logic (cache-line reservation check, CAS, CR0 update, reservation always
|
||
cleared) all match the ISA and Canary exactly. The `stwcx` manual snapshot is stale (uses old
|
||
`reserved_addr` field name; live code correctly uses `reserved_line` at cache-line granularity —
|
||
actually MORE correct than the snapshot). Dominant finding is the same systemic miss as PPCBUG-107
|
||
and PPCBUG-130: `invalidate_for_write` is never called from any plain store arm.
|
||
|
||
### PPCBUG-140 — stw: missing `invalidate_for_write` call (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1183-1188`
|
||
- **Systemic root cause**: PPCBUG-107
|
||
- **Symptom**: Under `--parallel` with the ReservationTable enabled, a plain `stw` by thread B
|
||
to a cache line reserved by thread A does not clear thread A's table slot. Thread A's
|
||
subsequent `stwcx.` spuriously succeeds (CR0.EQ=1) even though thread B has written the line.
|
||
All lock-free sync primitives (`spin_lock`, `CompareExchange`, atomic counters) built on
|
||
`lwarx`/`stwcx.` are broken in multi-threaded mode. `stw` is the most common store instruction —
|
||
every stack write, pointer store, and integer field write is affected.
|
||
- **Fix**: Before `mem.write_u32(ea, ...)`:
|
||
```rust
|
||
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
|
||
if t.has_active_reservers() { t.invalidate_for_write(ea); }
|
||
}
|
||
```
|
||
`has_active_reservers()` is a single `Relaxed` load — zero cost in the common non-atomic case.
|
||
|
||
### PPCBUG-141 — stwu: missing `invalidate_for_write` call (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1189-1194`
|
||
- **Systemic root cause**: PPCBUG-107
|
||
- **Symptom**: Same class as PPCBUG-140. `stwu r1, -N(r1)` is the canonical function-prologue
|
||
stack-allocation idiom emitted by every compiled function. A thread holding a reservation on
|
||
the stack region would see spurious `stwcx.` success after any prologue store.
|
||
- **Fix**: Same pattern as PPCBUG-140, inserted before `mem.write_u32`.
|
||
|
||
### PPCBUG-142 — stwx: missing `invalidate_for_write` call (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1195-1200`
|
||
- **Systemic root cause**: PPCBUG-107
|
||
- **Symptom**: Same class as PPCBUG-140. `stwx` is the indexed store used for array writes and
|
||
indirect dereferences — common in loops that may run concurrently with reservation holders.
|
||
- **Fix**: Same pattern as PPCBUG-140.
|
||
|
||
### PPCBUG-143 — stwux: missing `invalidate_for_write` call (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1201-1206`
|
||
- **Systemic root cause**: PPCBUG-107
|
||
- **Symptom**: Same class as PPCBUG-140. Less common than stw/stwu but still a plain store
|
||
that must participate in reservation invalidation.
|
||
- **Fix**: Same pattern as PPCBUG-140.
|
||
|
||
### PPCBUG-144 — stwbrx: missing `invalidate_for_write` call (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1568-1573`
|
||
- **Systemic root cause**: PPCBUG-107
|
||
- **Symptom**: Same class as PPCBUG-140. Byte-reversed stores (used for LE-payload GPU command
|
||
buffers, file format fields) are still plain stores with respect to the reservation protocol.
|
||
- **Fix**: Same pattern as PPCBUG-140. `ea` is already a `u32` at this point (line 1570).
|
||
|
||
### PPCBUG-145 — stwcx: stale manual snapshot uses `reserved_addr` (LOW)
|
||
|
||
- **Severity**: LOW (documentation only; live code is correct)
|
||
- **Status**: applied (P7 manual regen, 2026-05-02)
|
||
- **Location**: `ppc-manual/memory/stwcx.md` (frozen snapshot section)
|
||
- **Symptom**: The frozen snapshot shows `ctx.reserved_addr == ea` (exact-word comparison).
|
||
The live code at `interpreter.rs:1137-1153` uses `ctx.reserved_line == line` where
|
||
`line = ea & !RESERVATION_MASK` (cache-line comparison). The live code is MORE correct per
|
||
ISA (PowerISA 2.07B defines reservation at cache-line granularity). Snapshot reflects an
|
||
earlier implementation before M3 introduced `RESERVATION_MASK` and `reserved_line`.
|
||
Tests confirm live behavior is correct (`stwcx_succeeds_within_same_cache_line`).
|
||
- **Fix**: Regenerate the `stwcx.md` snapshot to show current field names and add a note on
|
||
the ISA cache-line granule.
|
||
|
||
### PPCBUG-146 — Zero unit tests for stwu / stwx / stwux / stwbrx (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: Four of the six group-25 opcodes have zero dedicated unit tests.
|
||
- **Recommended minimum**:
|
||
- `stwu r3, -8(r1)`: verify memory at `r1-8` and `gpr[1]` updated to `old_r1 - 8`.
|
||
- `stwx ra=0`: store at `gpr[rb]`, verify memory and no RA writeback.
|
||
- `stwux`: indexed update — verify store and RA writeback.
|
||
- `stwbrx 0x11223344`: bytes at EA should be `[0x44, 0x33, 0x22, 0x11]`.
|
||
|
||
### PPCBUG-147 — stwcx test suite missing key cases (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs:5167-5208` (two existing tests)
|
||
- **Missing**:
|
||
- `stwcx.` without prior `lwarx` → CR0.EQ=0, memory not written.
|
||
- Post-PPCBUG-140-fix: `lwarx` then `stw` to same line then `stwcx.` → CR0.EQ=0.
|
||
- RA=0 form: `stwcx. rS, 0, rB`.
|
||
- Explicit memory check on failure path (assert memory unchanged).
|
||
|
||
IDs PPCBUG-148 and PPCBUG-149 are unallocated — reserved for group 25 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 5 (continued) — store multiple/string (group 27)
|
||
|
||
Per-group report: `audit-out/group-27-store-mlsr.md`.
|
||
|
||
Group 27 summary: **5 findings: 2 HIGH, 1 MEDIUM, 2 LOW.** `stswx` is a permanent no-op (identical
|
||
root cause as PPCBUG-123 for `lswx` — XER TBC field not modeled; fixed as side effect of
|
||
PPCBUG-123/124). `stmw`, `stswi`, and `stswx` all omit `invalidate_for_write`, aggravated vs.
|
||
single-word stores because a single `stmw` can dirty multiple cache lines. `stswi` uses `instr.rb()`
|
||
instead of `instr.nb()` (maintenance hazard, same shape as PPCBUG-126 for `lswi`). Zero unit tests
|
||
across all three opcodes.
|
||
|
||
### PPCBUG-160 — stmw, stswi, stswx missing `invalidate_for_write`; multi-line atomicity exposure (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: `interpreter.rs:1521` (stmw), `interpreter.rs:1357` (stswi), `interpreter.rs:4189` (stswx)
|
||
- **Extends**: PPCBUG-107. The prior stated range `1182-1278` does not cover these three arms.
|
||
Multi-word instructions (stmw up to 128 bytes = 2 lines; stswx up to 127 bytes = ~2 lines) make
|
||
the probability of missing a reservation invalidation much higher than single-word stores.
|
||
- **Symptom**: thread B's `stmw` saves 18+ non-volatile registers across two cache lines. Thread A's
|
||
`lwarx` reservation on the second line is not cleared. Thread A's `stwcx.` spuriously succeeds.
|
||
Because `stmw` is the ABI-standard non-volatile register save, this is triggered constantly in
|
||
function prologues — any lock-free primitive inside a prologue/epilogue window is at risk.
|
||
- **Fix** (same pattern as PPCBUG-107): before each `mem.write_u32`/`mem.write_u8` call, add the
|
||
`invalidate_for_write` guard. See group-27 report for per-opcode code snippets.
|
||
- **Test gap**: `lwarx` reserve a line, `stmw` across that line, `stwcx.` must return CR0.EQ=0.
|
||
|
||
### PPCBUG-161 — `stswx` is a permanent no-op: XER TBC not modeled (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: `interpreter.rs:4189` (`stswx` arm) + `context.rs:235-243` (`xer()`/`set_xer()`)
|
||
- **Companion**: PPCBUG-123 (lswx), PPCBUG-124 (mtspr XER). This finding covers the store side.
|
||
- **Symptom**: `ctx.xer() & 0x7F` always returns 0 (no `xer_tbc` field). `stswx` unconditionally
|
||
stores zero bytes. The byte-loop body is otherwise correct and requires no further changes.
|
||
- **Fix**: same three-line fix as PPCBUG-123 (add `xer_tbc: u8` to `PpcContext`; update `xer()`
|
||
and `set_xer()`). The `stswx` body is correct once TBC is live.
|
||
- **Test gap**: `mtspr XER` (TBC=5) + `stswx r3, 0, r4` → 5 bytes written big-endian.
|
||
|
||
### PPCBUG-162 — `stswi` uses `instr.rb()` instead of `instr.nb()` for NB field (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM (maintenance hazard; not a runtime correctness bug today)
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1359`
|
||
- **Companion**: PPCBUG-126 (`lswi` identical pattern at line 1340).
|
||
- **Symptom**: `instr.rb()` and `instr.nb()` extract the same bits 16-20, so values are equal now.
|
||
If `rb()` is ever given a newtype wrapper (e.g. `RegIdx`) to enforce register semantics, the cast
|
||
`instr.rb() as u32` will either fail or yield wrong semantics — silently treating a register index
|
||
as a byte count.
|
||
- **Fix**: `let nb = if instr.nb() == 0 { 32 } else { instr.nb() };`
|
||
|
||
### PPCBUG-163 — Zero unit tests for stmw, stswi, stswx (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: No `#[test]` exists for any of the three opcodes. Regressions in loop bounds, byte
|
||
order, EA computation, NB=0 handling, or register wraparound are invisible.
|
||
- **Recommended minimum**: stmw 2-word and 32-word cases; stswi 4-byte / 0 to 32 / wraparound /
|
||
partial; stswx (post PPCBUG-123 fix) TBC=4, TBC=0, TBC=5. See group-27 report for full list.
|
||
|
||
ID PPCBUG-164 is unallocated — reserved for group 27 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 5 (continued) — store doubleword (group 26)
|
||
|
||
Per-group report: `audit-out/group-26-store-doubleword.md`.
|
||
|
||
Group 26 summary: **0 HIGH, 2 MEDIUM, 2 LOW.** The core semantics of all six opcodes are
|
||
ISA-correct: `ds()` decoder extracts the DS-form displacement correctly; `mem.write_u64` handles
|
||
big-endian byte ordering; update-form writebacks are zero-extended and in the right order; `stdcx.`
|
||
CR0 encoding, reservation check, and table-path interaction all match the ISA. `stdbrx` correctly
|
||
applies `swap_bytes()`. No 32-bit writeback truncation issues (these are store ops, not ALU ops).
|
||
Two MEDIUM findings: (1) PPCBUG-150 extends PPCBUG-107 to the doubleword stores (same gap —
|
||
`invalidate_for_write` never called); (2) PPCBUG-151 identifies that `stwcx.` and `stdcx.` share
|
||
the same reservation slot without a width discriminator, allowing a `lwarx`+`stdcx.` or
|
||
`ldarx`+`stwcx.` cross-pair to succeed when it should fail. Four IDs used (PPCBUG-150..153).
|
||
|
||
### PPCBUG-150 — `std`/`stdu`/`stdx`/`stdux`/`stdbrx` do not call `invalidate_for_write` (scope extension of PPCBUG-107)
|
||
|
||
- **Severity**: MEDIUM (same classification as PPCBUG-107)
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**:
|
||
- `interpreter.rs:1258` (`std`)
|
||
- `interpreter.rs:1264` (`stdx`)
|
||
- `interpreter.rs:1269` (`stdu`)
|
||
- `interpreter.rs:1275` (`stdux`)
|
||
- `interpreter.rs:4163` (`stdbrx`)
|
||
- **Symptom**: When `--parallel` is active and the `ReservationTable` is enabled, any of these
|
||
five stores to an address another HW thread has reserved via `ldarx` will NOT invalidate that
|
||
thread's reservation. The `ldarx`-holding thread's `stdcx.` can subsequently succeed even though
|
||
the memory was overwritten — a classic LL/SC ABA gap. Fix session for PPCBUG-107 must include
|
||
these five sites.
|
||
- **Fix**: in each arm, after `mem.write_u64(ea, ...)`, add:
|
||
```rust
|
||
if let Some(t) = &ctx.reservation_table {
|
||
if t.has_active_reservers() { t.invalidate_for_write(ea); }
|
||
}
|
||
```
|
||
|
||
### PPCBUG-151 — `stdcx.`/`stwcx.` reservation width not discriminated: cross-width pair silently succeeds
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Location**: `interpreter.rs:4119-4155` (`stdcx`) vs `interpreter.rs:1134-1180` (`stwcx`)
|
||
- **Symptom**: Both `stwcx.` and `stdcx.` match reservations using only `(has_reservation,
|
||
reserved_line)`. A `lwarx` reservation can be spuriously committed by `stdcx.`, or a `ldarx`
|
||
reservation by `stwcx.`, as long as the cache line matches. The ISA requires pairing — `lwarx`
|
||
must be committed by `stwcx.`, and `ldarx` by `stdcx.`. Cross-width commit reads the wrong width
|
||
from memory and writes back the wrong width, with no failure indication (CR0.EQ=1).
|
||
- **Fix**: add a `reservation_width: u8` field (4 or 8) to `PpcContext`. `stwcx.` requires
|
||
`reservation_width==4`; `stdcx.` requires `reservation_width==8`. In the table path, pack the
|
||
1-bit width flag into one of the spare bits of the 64-bit slot (bits 39–32 are always zero for
|
||
line addresses in the 32-bit guest address space).
|
||
|
||
### PPCBUG-152 — `stdu`/`stdux` no invalid-form guard for RS==RA (LOW)
|
||
|
||
- **Severity**: LOW (ISA-undefined; no Xbox 360 compiler emits this)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1267-1278`
|
||
- **Symptom**: When `RA==RS`, the store writes the original RS value, then RA (==RS) is
|
||
overwritten with EA, destroying the source. ISA marks this invalid-form. Consistent with
|
||
policy of other update-form stores in groups 18-22.
|
||
- **Fix**: `debug_assert!(instr.ra() != 0 && instr.ra() != instr.rs())` in debug builds.
|
||
|
||
### PPCBUG-153 — Zero unit tests for std/stdu/stdx/stdux/stdbrx; stdcx. happy-path only (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` test module (only `test_ldarx_stdcx_pair` at line 4629)
|
||
- **Missing coverage**: `std` with negative DS; `std` with RA=0; `stdu` update writeback; `stdx`
|
||
with RA=0; `stdux` indexed update; `stdbrx` byte-reversed output; `stdcx.` failure path (no
|
||
prior reservation or EA mismatch); `stdcx.` `has_reservation` cleared on failure.
|
||
- **Recommended minimum**: 6 tests — see per-group report for encodings.
|
||
|
||
IDs PPCBUG-154 through PPCBUG-159 are unallocated — reserved for group 26 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 5 (continued) — store float (group 28)
|
||
|
||
Per-group report: `audit-out/group-28-store-float.md`.
|
||
|
||
Group 28 summary: **7 findings: 3 HIGH, 1 MEDIUM, 3 LOW.** EA computation, endianness, update-form
|
||
writebacks, and `stfiwx` integer-word extraction are all correct. Critical bugs: (1) `stfs*` never
|
||
raises FPSCR exception bits (VXSNAN, XX, OX, UX) required by PowerISA for double→single narrowing;
|
||
(2) `stfs*` ignores FPSCR.RN rounding mode, always using round-to-nearest-even; (3) all 9 FP store
|
||
arms omit `invalidate_for_write` (same class as PPCBUG-107). The `stfd*` family and `stfiwx` are
|
||
clean (bit-pattern stores with no conversion). Zero unit tests across all 9 opcodes.
|
||
**7 IDs used (PPCBUG-165..171). 3 IDs unallocated (PPCBUG-172..174).**
|
||
|
||
### PPCBUG-165 — stfs* does not raise FPSCR exception bits (VXSNAN, XX, OX, UX)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux)
|
||
- **Symptom**: PowerISA requires that `stfs` double→single narrowing raises FPSCR[VXSNAN] for SNaN
|
||
input, FPSCR[OX] on overflow to ±∞, FPSCR[UX] on underflow to ±0/denormal, and FPSCR[XX] when the
|
||
result is inexact. None of these bits are ever set. The narrowing is done via `ctx.fpr[instr.rs()] as f32`
|
||
(x86 `CVTSD2SS`); no FPSCR inspection or update follows. Games that poll FPSCR[OX] to detect
|
||
overflow (physics engines clamping large velocities), or FPSCR[VXSNAN] after sentinel SNaN writes,
|
||
get false negatives.
|
||
- **Canary parity**: Canary also omits these FPSCR updates for `stfs*`. Both share the deviation.
|
||
- **Fix**: after the narrowing, check `fpscr::is_snan(src)` → set `VXSNAN`; compare source vs.
|
||
f64 round-trip of narrowed value for inexact; compare src.is_finite() && f32.is_infinite() for
|
||
overflow. See group-28 report for illustrative code sketch.
|
||
|
||
### PPCBUG-166 — stfs* ignores FPSCR.RN; always uses round-to-nearest-even
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1284, 1289, 1296, 1301
|
||
- **Symptom**: `ctx.fpr[instr.rs()] as f32` uses the host MXCSR rounding mode, never consulting
|
||
`ctx.fpscr & fpscr::RN_MASK`. Any game that configures FPSCR.RN to truncate/ceil/floor and then
|
||
stores via `stfs` gets the wrong f32 in memory (wrong by at most 1 ULP). The stfs.md spec
|
||
explicitly acknowledges this gap.
|
||
- **Canary parity**: Canary also ignores FPSCR.RN for stfs. Both share the deviation.
|
||
- **Fix**: read `ctx.fpscr & fpscr::RN_MASK` and set host MXCSR before narrowing, then restore.
|
||
Minimum viable: `debug_assert_eq!(ctx.fpscr & fpscr::RN_MASK, 0)` for debug-build visibility.
|
||
|
||
### PPCBUG-167 — All 9 FP store arms missing `invalidate_for_write` (PPCBUG-107 class)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: interpreter.rs:1284 (stfs), 1289 (stfsu), 1296 (stfsx), 1301 (stfsux),
|
||
1308 (stfd), 1313 (stfdu), 1320 (stfdx), 1325 (stfdux), 1333 (stfiwx)
|
||
- **Symptom**: Same class as PPCBUG-107. Under M3 `--parallel`, a FP store by thread B to a
|
||
cache line reserved by thread A via `lwarx` does not clear thread A's reservation table slot.
|
||
Thread A's subsequent `stwcx.` spuriously succeeds. Rendering workers using FP stores to shared
|
||
transform/particle buffers co-located with spinlock sites are at risk.
|
||
- **Fix**: before each `mem.write_f32`/`write_f64`/`write_u32` in every FP store arm:
|
||
```rust
|
||
if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) {
|
||
if t.has_active_reservers() { t.invalidate_for_write(ea); }
|
||
}
|
||
```
|
||
Recommend a single sweep of all store groups (PPCBUG-107, 130, 160, 167) to avoid further drift.
|
||
|
||
### PPCBUG-168 — stfs* SNaN narrowing: `as f32` quietens SNaN without raising FPSCR.VXSNAN
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1284, 1289, 1296, 1301
|
||
- **Symptom**: When FRS holds an f64 SNaN (bit 51 = 0), `CVTSD2SS` sets the f32 quiet bit (bit 22),
|
||
producing a QNaN in memory, without raising FPSCR[VXSNAN]. The stored memory bytes are correct per
|
||
IEEE-754 (narrowing an SNaN produces a QNaN). The bug is the missing FPSCR signal, a subset of
|
||
PPCBUG-165. **Contrast with PPCBUG-128** (lfs stores wrong FPR bits — HIGH severity; here memory
|
||
bytes are right, only the flag is missing).
|
||
- **Note**: fixed as a side effect of the PPCBUG-165 fix. No independent code change needed.
|
||
|
||
### PPCBUG-169 — stfd* bit-pattern store: confirmed correct (informational)
|
||
|
||
- **Severity**: LOW (confirmed clean, informational)
|
||
- **Status**: wontfix
|
||
- **Locations**: interpreter.rs:1305, 1311, 1317, 1323
|
||
- **Analysis**: `write_f64(ea, fpr)` → `write_u64(ea, fpr.to_bits())` → `val.to_be_bytes()`. Pure
|
||
bit-pattern, correct big-endian. SNaN preserved. EA computation and update-form writebacks all
|
||
correct. Canary parity confirmed. No bugs.
|
||
|
||
### PPCBUG-170 — stfiwx: confirmed correct (informational)
|
||
|
||
- **Severity**: LOW (confirmed clean, informational)
|
||
- **Status**: wontfix
|
||
- **Location**: interpreter.rs:1329-1335
|
||
- **Analysis**: `write_u32(ea, fpr.to_bits() as u32)` correctly extracts the low 32 bits of the
|
||
64-bit FPR as a raw bit pattern (the integer word produced by `fctiw`/`fctiwz`) and stores
|
||
big-endian. RA=0 handled correctly. No FPSCR effects required. Canary parity confirmed. No bugs.
|
||
|
||
### PPCBUG-171 — Zero unit tests for all 9 store-float opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs test module
|
||
- **Symptom**: No `#[test]` covers any of the 9 FP store arms. Regressions in EA computation,
|
||
endianness, update-form writeback order, or double→single narrowing are invisible.
|
||
- **Recommended minimum** (10 tests): `stfd` normal + SNaN bit-exact; `stfdu` update writeback;
|
||
`stfs` round-trip (1.0); `stfs` overflow (→ ±∞); `stfsx` ra=0; `stfsux` update; `stfiwx` integer
|
||
word extract; post-PPCBUG-165 fix: SNaN → FPSCR.VXSNAN set; post-PPCBUG-166 fix: RN=truncate.
|
||
|
||
IDs PPCBUG-172 through PPCBUG-174 are unallocated — reserved for group 28 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 6 — FPU single-precision (group 29)
|
||
|
||
Per-group report: `audit-out/group-29-fpu-single.md`.
|
||
|
||
**Context**: The live implementation is substantially more capable than the frozen ppc-manual
|
||
snapshots indicated. `to_single()` correctly dispatches on FPSCR.RN; `check_invalid_*` helpers
|
||
correctly set VXSNAN, VXISI, VXIMZ, VXZDZ, VXIDI, ZX; `update_after_op` sets OX, UX, and
|
||
FPRF. The remaining bugs are: (1) XX/FI/FR (inexact) never set anywhere; (2) fmadd/fmsub
|
||
*sx variants missing the VXISI check for the add-phase infinity collision (their *x double
|
||
siblings have the same gap); (3) fnmadd/fnmsub NaN sign bit incorrectly flipped by Rust `-`;
|
||
(4) fresx produces a full IEEE 1/b instead of the ~12-bit hardware estimate; (5) FPSCR.NI
|
||
flush-to-zero not modelled; (6) SNaN→QNaN propagation relies on host SSE behavior rather than
|
||
the ISA-canonical derivation.
|
||
|
||
**8 IDs used (PPCBUG-180..187). 12 IDs unallocated (PPCBUG-188..199).**
|
||
|
||
### PPCBUG-180 — XX / FI / FR bits never set across all FPU *sx opcodes (and double siblings)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: `fpscr.rs:184-194` (`update_after_op`); affects interpreter.rs:2252-2494
|
||
- **Symptom**: `FPSCR[XX]` (inexact) should be set whenever the mathematical result of an
|
||
FP operation cannot be represented exactly in the destination format (single or double) and
|
||
a rounding step occurs. `FPSCR[FI]` (fraction inexact) and `FPSCR[FR]` (fraction rounded)
|
||
encode the direction. `update_after_op` sets `OX` (overflow to ±∞) and `UX` (subnormal
|
||
result) but has no inexact-detection logic. Since most `*sx` operations on arbitrary inputs
|
||
require rounding to single precision, XX is almost always wrong (false zero). Games using
|
||
FPSCR polling to check exactness receive false "exact" results.
|
||
- **Canary parity**: Canary's `UpdateFPSCR` also does not set XX/FI/FR. Both share this gap.
|
||
- **Fix**: In `update_after_op` (or a post-`to_single` helper), compare the pre-round f64
|
||
result with the post-round f64 result. If they differ, set `XX`; inspect the difference sign
|
||
to set `FR`; set `FI = FR || (result was not exactly representable)`.
|
||
|
||
### PPCBUG-181 — fmaddsx / fnmaddsx missing VXISI check for add-phase ±∞ collision
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Locations**: interpreter.rs:2339-2348 (fmaddsx), 2383-2392 (fnmaddsx)
|
||
- **Symptom**: When `FRA × FRC = +∞` and `FRB = -∞` (or vice versa), PowerISA §4.3.4
|
||
requires `FPSCR[VXISI]` to be set and the result to be a QNaN. The double-precision sibling
|
||
`fmaddx` (line 2327) correctly calls `fpscr::check_invalid_add(ctx, a * c, b, false)` after
|
||
the multiply-check. `fmaddsx` omits this call entirely — only `check_invalid_mul` runs.
|
||
Games using fused-madd in dot-product accumulators that might overflow to ±∞ (e.g. lighting
|
||
accumulators with very large normals) lose the VXISI signal.
|
||
- **Fix**:
|
||
```rust
|
||
// inside fmaddsx arm, after check_invalid_mul:
|
||
fpscr::check_invalid_add(ctx, a * c, b, false);
|
||
```
|
||
Same for fnmaddsx (same operand pair, same `false` sense for the add).
|
||
|
||
### PPCBUG-182 — fmsubsx / fnmsubsx missing VXISI check for subtract-phase ±∞ collision
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Locations**: interpreter.rs:2361-2370 (fmsubsx), 2405-2414 (fnmsubsx)
|
||
- **Symptom**: When `FRA × FRC = ±∞` and `FRB = ±∞` with the same sign, `(±∞) − (±∞)`
|
||
should fire `FPSCR[VXISI]`. Neither `fmsubsx` nor `fnmsubsx` calls `check_invalid_add`.
|
||
- **Fix**:
|
||
```rust
|
||
// inside fmsubsx arm, after check_invalid_mul:
|
||
fpscr::check_invalid_add(ctx, a * c, -b, false);
|
||
```
|
||
Same for fnmsubsx. The negated `b` turns the subtract into the add-form so that
|
||
`check_invalid_add(..., false)` uses the correct infinity-sign comparison.
|
||
|
||
### PPCBUG-183 — fnmaddsx / fnmsubsx NaN sign bit incorrectly flipped by Rust unary `-`
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Locations**: interpreter.rs:2388 (fnmaddsx), 2410 (fnmsubsx)
|
||
- **Symptom**: `to_single(ctx, -(a.mul_add(c, b)))` — Rust's unary `-f64` always flips the
|
||
IEEE sign bit, including when the value is NaN. PowerISA §4.3.2 specifies that the final
|
||
negation in `fnmadd`/`fnmsub` is NOT applied to a QNaN result: if the fused computation
|
||
yields a NaN (due to SNaN input, VXIMZ, or VXISI), the negation is skipped and the NaN is
|
||
propagated with its canonical sign unchanged. xenia-rs flips the sign bit of any NaN result,
|
||
producing a QNaN with the wrong sign. Observable by storing via `stfd` and inspecting bits.
|
||
Games using sign-bit NaN tagging (e.g. `0xFFC00000` vs `0x7FC00000` as distinct sentinels)
|
||
are affected.
|
||
- **Fix**:
|
||
```rust
|
||
// fnmaddsx arm:
|
||
let inner = a.mul_add(c, b);
|
||
let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
|
||
// fnmsubsx arm:
|
||
let inner = a.mul_add(c, -b);
|
||
let result = to_single(ctx, if inner.is_nan() { inner } else { -inner });
|
||
```
|
||
|
||
### PPCBUG-184 — fresx produces full-precision IEEE 1/b instead of ~12-bit hardware estimate
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: interpreter.rs:2481-2494
|
||
- **Symptom**: `fres` on Xenon hardware produces a reciprocal approximation via a 256-entry
|
||
LUT with linear interpolation, accurate to roughly 1/4096 relative error (~12 mantissa
|
||
bits). xenia-rs computes `to_single(1.0 / b)` — the fully IEEE-754 correctly-rounded
|
||
single-precision reciprocal. The result is up to ~4096× more accurate than hardware.
|
||
Newton-Raphson refinement code `x = fres(d); x = x*(2 - d*x)` is not broken by this (NR
|
||
converges even from an accurate seed), but code that checks the seed's error magnitude for
|
||
convergence termination, or that relies on `fres(d)*d ≠ 1.0` to decide whether to refine,
|
||
may take the wrong branch. Also, `fres(d)*d` on xenia is much closer to 1.0 than on hardware,
|
||
so a "was the estimate good enough?" check based on the residual will give wrong answers.
|
||
- **Canary parity**: Canary uses `f.Recip(f.Convert(frB, FLOAT32_TYPE))` — approximates by
|
||
first converting to f32 (quantizing the input), then applying the host reciprocal. Still
|
||
produces a fully-accurate IEEE single reciprocal rather than the 12-bit table estimate.
|
||
Both emulators share the deviation. Canary's conversion-first approach is slightly closer to
|
||
hardware (the input is quantized before the reciprocal), so if a future fix is desired,
|
||
Canary's approach is the better reference.
|
||
- **Fix (minimal viable)**: Pre-convert input to f32 to match Canary's quantization:
|
||
`let b32 = b as f32; to_single(ctx, 1.0_f64 / b32 as f64)`. This matches Canary but still
|
||
does not emulate the 12-bit LUT. Full fix requires an `fres` LUT matching Xenon's hardware
|
||
table (documented in Xbox 360 SDK / GamePPCLisa docs).
|
||
|
||
### PPCBUG-185 — FPSCR.NI flush-to-zero not modelled; subnormal results propagate through *sx
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: All *sx arms in interpreter.rs; fpscr.rs has `NI` not defined as a constant
|
||
- **Symptom**: Xenon firmware sets `FPSCR.NI = 1` at boot. With NI=1, the Xenon FPU flushes
|
||
subnormal inputs and results to the appropriate signed zero before and after every
|
||
floating-point operation. xenia-rs inherits the host x86 IEEE-754 default (NI=0), which
|
||
propagates subnormals. Subnormal differences: (a) subnormal FPR inputs are used as-is by
|
||
xenia vs. treated as ±0 by hardware; (b) subnormal results are stored by xenia vs. flushed
|
||
to ±0 by hardware. `update_after_op` sets `UX` when the result is subnormal, but does NOT
|
||
flush it. Games with NI-dependent behavior — most Xbox 360 titles compiled with default
|
||
Xenon ABI settings — may see different float results in subnormal-touching paths.
|
||
- **Canary parity**: Canary also inherits host IEEE NI=0 semantics. Both share this gap.
|
||
- **Fix**: After `to_single` (or the double-precision result), check `ctx.fpscr & fpscr::NI_BIT`
|
||
(needs a constant adding) and if set, flush subnormals: `if result.is_subnormal() { result =
|
||
result.signum() * 0.0 }`. Apply to inputs as well for strict correctness.
|
||
|
||
### PPCBUG-186 — SNaN → QNaN propagation relies on host SSE; not ISA-canonical for all *sx
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:2252-2414 (all arithmetic *sx arms without explicit SNaN guard)
|
||
- **Symptom**: When an SNaN input reaches `faddsx`/`fsubsx`/`fmulsx`/`fdivsx`, the code calls
|
||
`check_invalid_add/mul/div` (correctly sets VXSNAN) but then performs the operation on the
|
||
raw SNaN value: `a + b`, `a * c`, etc. On x86-64 SSE2, the hardware `ADDSD`/`MULSD` ops
|
||
produce a QNaN from the first SNaN operand (bit 51 set, other mantissa bits preserved). This
|
||
matches ISA §4.3.2.2 for the common case. However, for `mul_add` (VFMADD231SD on AVX), the
|
||
SNaN propagation priority may differ: the ISA specifies FRA takes priority over FRB, but
|
||
hardware FMA may use a different priority for the three-operand form. The `fsqrtsx` and
|
||
`fresx` arms handle SNaN explicitly (via `is_snan` check) but do not synthesize the correct
|
||
QNaN result — they rely on `b.sqrt()` / `1.0/b` to produce a NaN, which the host does.
|
||
This is a latent risk; active wrong-result cases require bit-level NaN inspection.
|
||
|
||
### PPCBUG-187 — Zero interpreter execution tests for all 10 group-29 opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs test module (no `#[test]` covers any *sx or fresx)
|
||
- **Symptom**: Regressions in rounding, FPSCR side effects, or operand-field decoding are
|
||
invisible to CI. The existing fpscr unit tests cover helper functions in isolation; no test
|
||
exercises the full `step()` path for any single-precision FPU opcode.
|
||
- **Recommended minimum** (12 tests — see group-29 report for encodings):
|
||
`fadds` exact; `fadds` VXISI; `fsubs` VXISI; `fmuls` 0×∞; `fdivs` ZX;
|
||
`fmadds` VXISI regression (PPCBUG-181); `fmsubs` VXISI regression (PPCBUG-182);
|
||
`fnmadds` NaN-sign (PPCBUG-183); `fnmsubs` NaN-sign (PPCBUG-183);
|
||
`fsqrts` negative input VXSQRT; `fsqrts` round-trip; `fres` basic reciprocal.
|
||
|
||
IDs PPCBUG-188 through PPCBUG-199 are unallocated — reserved for group 29 follow-up.
|
||
|
||
---
|
||
|
||
## Batch 6 (continued) — FPU arithmetic double (group 30)
|
||
|
||
Per-group report: `audit-out/group-30-fpu-double.md`.
|
||
|
||
Group 30 summary: **9 findings (PPCBUG-200..208). 2 MEDIUM cross-cutting, 3 MEDIUM opcode-specific, 4 LOW.** Result arithmetic is correct for all 10 opcodes. FPSCR infrastructure is partially wired: VXSNAN, OX, UX, ZX, VXISI (add/sub), VXIMZ, VXZDZ, VXIDI, VXSQRT all set correctly for the opcodes that need them. Critical gaps: (1) XX/FR/FI bits never set by any opcode — same gap as PPCBUG-180 but now confirmed on the double-precision path; (2) FPSCR.RN not honored for double arithmetic — single-precision has `round_to_single` but double has no equivalent; (3) fmsubx/fnmaddx/fnmsubx omit the VXISI check for ∞-collision in the add step; (4) fnmaddx/fnmsubx flip NaN sign bit via Rust `-` operator but ISA requires NaN sign preserved. frsqrtex uses full-precision 1/sqrt(b) instead of the hardware estimate — acceptable. All FMA forms use `f64::mul_add` for correct single-rounding semantics.
|
||
**9 IDs used (PPCBUG-200..208). 11 IDs unallocated (PPCBUG-209..219).**
|
||
|
||
### PPCBUG-200 — All group-30 opcodes: XX, FR, FI bits never set
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `fpscr.rs:184-194` (`update_after_op`); `interpreter.rs:2248,2268,2289,2310,2335,2357,2379,2401,2463,2510`
|
||
- **Symptom**: Same gap as PPCBUG-180 but confirmed for the double-precision path. `update_after_op` only tracks OX (overflow to infinity) and UX (subnormal). FPSCR[XX] (inexact sticky), FPSCR[FR] (round direction), and FPSCR[FI] (inexact for current op) are never updated by any group-30 opcode. Every double-precision arithmetic operation that rounds a non-representable result silently omits these bits.
|
||
- **Fix**: Same as PPCBUG-180 — read MXCSR exception flags after each f64 operation and map to FI/XX/FR. For double, no `to_single` step is involved so the comparison must be done via MXCSR or by a post-op bit-level comparison of inputs vs. result.
|
||
- **Test gap**: Zero tests verify XX set after any inexact double-precision operation.
|
||
|
||
### PPCBUG-201 — All group-30 opcodes: FPSCR.RN not honored for double arithmetic
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2242-2512` (all 10 arms)
|
||
- **Symptom**: Host f64 operators always use nearest-even (host MXCSR default). `fpscr.rs` has a complete `rounding_mode(ctx)` helper and directed rounding helpers for single-precision (`round_to_single`), but no equivalent for double arithmetic. Guest `mtfsfi` RN changes have no effect on faddx/fsubx/fdivx/fsqrtx etc.
|
||
- **Fix**: Wrap each double-precision arithmetic arm with an MXCSR round-mode set/restore when `ctx.fpscr & fpscr::RN_MASK != 0`. Fast path (RN=0) stays zero-cost.
|
||
- **Test gap**: No test changes RN and verifies directed rounding on any double arithmetic opcode.
|
||
|
||
### PPCBUG-202 — fmaddx: non-FMA `a * c` used in check_invalid_add can spuriously raise/miss VXISI
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `interpreter.rs:2332`
|
||
- **Symptom**: `check_invalid_add(ctx, a * c, b, false)` uses a separate two-rounding multiply to approximate the FMA intermediate product. When the true FMA intermediate is finite but the standalone product overflows to ±∞, VXISI fires spuriously. When the true intermediate is ±∞ but the standalone product is finite (extreme cancellation), VXISI is missed.
|
||
- **Fix**: Derive VXISI from input-value properties directly: if `(a.is_infinite() || c.is_infinite())` (product is mathematically infinite) and `b.is_infinite()` with opposing sign → VXISI.
|
||
- **Test gap**: No test covers the large-value cancellation case in fmaddx.
|
||
|
||
### PPCBUG-203 — fmsubx, fnmaddx, fnmsubx: VXISI never raised for ∞-collision in add/sub step
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Locations**: `interpreter.rs:2354` (fmsubx), `2376` (fnmaddx), `2398` (fnmsubx)
|
||
- **Symptom**: Same pattern as PPCBUG-181/182 for the double-precision variants. These three arms call only `check_invalid_mul` and omit `check_invalid_add`. Per ISA, all four FMA variants must raise VXISI when the add step yields ∞+∓∞. Example for fmsub: `A×C = +∞`, `B = +∞` → `+∞ − +∞` → VXISI. Currently the result NaN propagates silently with no FPSCR update. The fnmsub pattern is the canonical Newton-Raphson step — the most common FPU path in Xbox 360 graphics code.
|
||
- **Fix**: Add `fpscr::check_invalid_add(ctx, a * c, b, true)` for `fmsubx`/`fnmsubx` and `fpscr::check_invalid_add(ctx, a * c, b, false)` for `fnmaddx` (apply PPCBUG-202 sign-fix simultaneously).
|
||
- **Test gap**: Zero tests for VXISI on any of the three opcodes.
|
||
|
||
### PPCBUG-204 — fmaddx check_invalid_add sub-issue (sign logic reliant on imprecise product)
|
||
- **Severity**: LOW (sub-issue of PPCBUG-202)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2332`
|
||
- **Symptom**: VXISI logic is internally consistent with the passed `a * c` value, but that value can have the wrong sign in extreme overflow/underflow cases. Resolve as part of PPCBUG-202.
|
||
|
||
### PPCBUG-205 — fnmaddx / fnmsubx: Rust `−` flips NaN sign bit; ISA requires NaN sign preserved
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Locations**: `interpreter.rs:2377` (fnmaddx), `interpreter.rs:2399` (fnmsubx)
|
||
- **Symptom**: Same pattern as PPCBUG-183 for the double-precision variants. Rust's unary `-` applied to a NaN result flips the IEEE-754 sign bit. PowerISA Book I §4.3.4 states the negation is not applied to NaN results. Title code using NaN sentinels (audio middleware, debug fills) receives sign-flipped NaN payloads.
|
||
- **Fix**:
|
||
```rust
|
||
let fma = a.mul_add(c, b); // fnmaddx
|
||
let result = if fma.is_nan() { fma } else { -fma };
|
||
// and analogously for fnmsubx
|
||
```
|
||
- **Test gap**: No test exercises fnmaddx/fnmsubx with NaN-producing inputs to check sign of result NaN.
|
||
|
||
### PPCBUG-206 — frsqrtex edge cases correct; no code change needed (informational)
|
||
- **Severity**: LOW (confirmed clean, informational)
|
||
- **Status**: wontfix
|
||
- **Location**: `interpreter.rs:2496-2512`
|
||
- **Analysis**: ZX fires for ±0. VXSQRT guard correctly excludes -0.0. frsqrte(+∞)=+0 correct. Full-precision is acceptable over-precision.
|
||
- **Fix**: Add comment: `// Full-precision: hardware gives ~12-14 bit estimate. NR converges identically.`
|
||
- **Test gap**: Zero frsqrtex unit tests — add 4 (±0 inputs, negative input+VXSQRT, SNaN, +∞).
|
||
|
||
### PPCBUG-207 — FMA opcode OX logic correct, OX edge cases untested (informational)
|
||
- **Severity**: LOW (confirmed clean, informational)
|
||
- **Status**: wontfix
|
||
- **Location**: `interpreter.rs:2335,2357,2379,2401`
|
||
- **Analysis**: `inputs_were_finite` correctly suppresses OX when an input is already infinite. OX fires when all inputs are finite but the FMA result overflows — ISA-correct.
|
||
- **Test gap**: Zero tests for OX scenario in any FMA opcode.
|
||
|
||
### PPCBUG-208 — Zero tests for fsubx, fdivx, fmsubx, fnmaddx, fnmsubx, fsqrtx, frsqrtex
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` test module
|
||
- **Symptom**: 7 of 10 group-30 opcodes have zero tests. `faddx` has 1 happy-path test; `fmulx` has 1; `fmaddx` has 1. None have FPSCR/Rc=1/edge-case coverage.
|
||
- **Recommended minimum** (12 tests): `fsubx` normal; `fsubx` VXISI; `fdivx` normal; `fdivx` ZX; `fdivx` VXZDZ; `fmsubx` normal; `fnmaddx` normal; `fnmsubx` normal; `fnmaddx` NaN-sign regression (PPCBUG-205); `fsqrtx` normal; `fsqrtx` negative+VXSQRT; `frsqrtex` positive.
|
||
|
||
IDs PPCBUG-209 through PPCBUG-219 are unallocated — reserved for group 30 follow-up.
|
||
|
||
---
|
||
|
||
## Pending batches
|
||
|
||
- Batch 2: groups 6-11 — logical immediate, logical register, sign-extend/CLZ, word rotate, doubleword rotate, shift.
|
||
- Batch 3: groups 12-17 — compare, branch, trap+sc, CR logical, SPR/MSR, cache+sync.
|
||
- Batch 4: groups 18-23 — loads (byte, halfword, word, doubleword, multiple/string, float).
|
||
- Batch 5 (partial): groups 24, 26, 27, 28 done; group 25 (store word) pending.
|
||
- Batch 6 (partial): groups 29, 30 done; group 31 (FPU convert/compare) pending.
|
||
- Batch 7: groups 32-34 — VMX integer (add/sub, compare/min/max, logical/shift).
|
||
- Batch 8: groups 35-38 — VMX permute/pack, VMX float, VMX multiply-sum, VMX load/store.
|
||
- Phase C: decoder field extractors, decoder opcode-lookup, disassembler formatter parity.
|
||
- Phase D: this file gets re-sorted by severity and finalized.
|
||
|
||
---
|
||
|
||
## Batch 6 (continued) — FPU sign/move/compare/convert/round (group 31)
|
||
|
||
Per-group report: `audit-out/group-31-fpu-misc.md`.
|
||
|
||
Group 31 summary: **9 findings (PPCBUG-221..231; IDs 220/222/226 retracted after analysis).
|
||
1 HIGH, 3 MEDIUM, 5 LOW.** The sign-bit manipulation family (`fabsx`, `fnegx`, `fnabsx`, `fmrx`)
|
||
and `fselx` are all ISA-correct — Rust arithmetic maps to bit-level operations that preserve SNaN
|
||
payloads. `fcmpu` is correct (FPRF and VXSNAN set; no spurious VXVC). The conversion group is
|
||
mostly correct for result values and overflow sentinels; the main gaps are FPSCR inexact/FR/FI
|
||
tracking (shared with groups 29/30) and one subtle NearestEven tie-breaking defect in
|
||
`round_to_i64` that affects `fctidx`. `fcmpo` silently omits VXSNAN/VXVC despite having a
|
||
comment acknowledging the gap.
|
||
|
||
**9 IDs used (PPCBUG-221, 223, 224, 225, 227, 228, 229, 230, 231). IDs 220/222/226 retracted.
|
||
IDs PPCBUG-232..239 unallocated.**
|
||
|
||
### PPCBUG-221 — `fctidx` / `round_to_i64` NearestEven tie-breaking uses f64::EPSILON; broken for |v| > 2^52
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `fpscr.rs:220–238` (`round_to_i64`, `NearestEven` case)
|
||
- **Symptom**: The tie-breaking code computes `diff = (v - v.trunc()).abs()` and tests
|
||
`(diff - 0.5).abs() < f64::EPSILON` to detect a half-integer. Above `|v| = 2^52`,
|
||
`v.trunc() == v` for all representable f64 values (all are exact integers), so `diff == 0.0`
|
||
and the tie-breaking branch is never taken — the code falls through to `v.round() as i64`,
|
||
which is round-half-away-from-zero instead of round-half-to-even. Every fctid call on a
|
||
large odd half-integer (e.g. `(2^52 + 1).5`) produces the wrong integer. In practice these
|
||
exact 0.5 cases are rare for large values but can appear in audio sample-count arithmetic
|
||
and physics fixed-point pipelines.
|
||
- **Fix**: replace the NearestEven arm with a fractional-part-only tie check that is exact for
|
||
|v| <= 2^52 and degenerates correctly to truncation above 2^52:
|
||
```rust
|
||
RoundingMode::NearestEven => {
|
||
let t = v.trunc();
|
||
let frac = v - t; // exact for |v| <= 2^52; ==0 above (already integer)
|
||
let fa = frac.abs();
|
||
if fa > 0.5 { t as i64 + if v >= 0.0 { 1 } else { -1 } }
|
||
else if fa < 0.5 { t as i64 }
|
||
else {
|
||
// Exact 0.5 tie — round to even.
|
||
let fi = t as i64;
|
||
if fi & 1 == 0 { fi } else { fi + if v >= 0.0 { 1 } else { -1 } }
|
||
}
|
||
}
|
||
```
|
||
- **Test gap**: add `round_to_i64` tests in `fpscr.rs:tests`: 0.5→0, 1.5→2, 2.5→2, 3.5→4,
|
||
-0.5→0, -1.5→-2. Existing tests cover 2.5→2 and 3.5→4 (currently accidentally correct).
|
||
|
||
### PPCBUG-223 — `fcmpo` omits FPSCR[VXSNAN] and FPSCR[VXVC] on NaN operands
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `interpreter.rs:2645–2675`
|
||
- **Symptom**: `fcmpo` body is identical to `fcmpu` — it sets FPRF and the CR field correctly
|
||
but calls no `fpscr::set_exception`. PowerISA requires: QNaN → `FPSCR[VXVC, VX, FX]`;
|
||
SNaN → additionally `FPSCR[VXSNAN]`. `fcmpu` correctly sets VXSNAN for SNaN; `fcmpo` does
|
||
not. A comment in the source acknowledges "not modeled yet."
|
||
- **Impact**: `fcmpo.` (Rc=1) checking CR1.FX after a NaN compare will see FX=0 instead of
|
||
FX=1. `mffsx` after `fcmpo` will not reflect VXVC. Xbox 360 CRT comparison primitives
|
||
(`islessgreater`, ordered relational operators) use `fcmpo`.
|
||
- **Fix**:
|
||
```rust
|
||
if fra.is_nan() || frb.is_nan() {
|
||
ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: false, so: true };
|
||
if fpscr::is_snan(fra) || fpscr::is_snan(frb) {
|
||
fpscr::set_exception(ctx, fpscr::VXSNAN | fpscr::VXVC);
|
||
} else {
|
||
fpscr::set_exception(ctx, fpscr::VXVC);
|
||
}
|
||
}
|
||
```
|
||
|
||
### PPCBUG-224 — `fcfidx` does not set FPSCR[XX/FX] for inexact i64→f64 conversion
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `interpreter.rs:2528–2536`
|
||
- **Symptom**: Only FPRF is updated. Per ISA, `fcfid` sets `FPSCR[XX, FX]` (and FR/FI) when
|
||
the i64 value has more than 53 significant bits and precision is lost. Any i64 with
|
||
`|v| > 2^53` triggers inexact. Common trigger: large frame/sample counters, address values.
|
||
- **Fix**: after the conversion, compare `(result as i64) != (bits as i64)` and call
|
||
`fpscr::set_exception(ctx, fpscr::XX)` if inexact.
|
||
|
||
### PPCBUG-225 — `frspx` does not set FPSCR[XX/FX/FR/FI] on inexact rounding
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `interpreter.rs:2516–2527`
|
||
- **Symptom**: `update_after_op` sets OX/UX only. The ISA requires FR/FI/XX/FX on any f64→f32
|
||
rounding that is not exact. `frsp` is the canonical double→single-precision narrowing idiom
|
||
in compiler output — virtually every call is inexact.
|
||
- **Fix**: after `to_single`, compare result vs b; if different and both finite, call
|
||
`fpscr::set_exception(ctx, fpscr::XX | fpscr::FI | ...)` with FR set if magnitude increased.
|
||
|
||
### PPCBUG-227 — `fctiwx` rounding: `round_to_i32` inherits NearestEven defect via `round_to_i64`
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `fpscr.rs:241–243`
|
||
- **Symptom**: `round_to_i32` calls `round_to_i64` then clamps. The PPCBUG-221 defect in
|
||
`round_to_i64` does not manifest for i32-range values (the epsilon check accidentally works
|
||
at this scale), but the structural fragility is inherited. Fixing PPCBUG-221 cures this.
|
||
- **Recommendation**: add unit tests `round_to_i32(0.5)==0`, `round_to_i32(1.5)==2`,
|
||
`round_to_i32(2.5)==2` to verify correct round-to-even behavior.
|
||
|
||
### PPCBUG-228 — Zero interpreter execution tests for fabsx/fnegx/fnabsx/fmrx/fselx/fcmpo/fcfidx/fctidx/fctidzx/frspx
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` module
|
||
- **Symptom**: 10 of the 13 group-31 opcodes have zero dedicated tests. `test_fcmpu` covers
|
||
only the ordered comparison `5.0 > 3.0`. `test_fctiwzx` covers one positive truncation.
|
||
`test_fadd`/`test_fmul` are group-30 tests, not group-31.
|
||
- **Recommended minimum**: SNaN-preservation test for fabsx/fnegx/fnabsx; fselx with NaN/−0/−1;
|
||
fcmpo QNaN→VXVC (after PPCBUG-223 fix); fcfidx exact and inexact; fctidx tie cases; frspx
|
||
inexact → XX set (after PPCBUG-225 fix); fctiwx nearest-even tie; fctiwzx NaN sentinel.
|
||
|
||
### PPCBUG-229 — `fctidx` / `fctidzx` do not set FPSCR[XX/FX] for inexact inputs
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Locations**: `interpreter.rs:2537–2574`
|
||
- **Symptom**: Per ISA, float-to-integer conversions set `FPSCR[XX, FX]` when the source
|
||
value is not an integer (the fractional part is discarded). Neither opcode sets XX.
|
||
Shared root cause with PPCBUG-224/225.
|
||
|
||
### PPCBUG-230 — `fctiwx` / `fctiwzx` do not set FPSCR[XX/FX] for inexact inputs
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Locations**: `interpreter.rs:2575–2612`
|
||
- **Symptom**: Same omission as PPCBUG-229 for the word-width conversion pair.
|
||
|
||
### PPCBUG-231 — `frspx` SNaN input result written as QNaN (host platform dependency)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2519–2524`
|
||
- **Symptom**: Rust's `as f32` (CVTSD2SS) can set the quiet bit on SNaN input, producing a
|
||
QNaN in the FPR. Per ISA, `frsp` on SNaN should quieten it — so the QNaN result is
|
||
correct in kind. The risk is that the exact QNaN bit-pattern may differ from PPC's
|
||
canonical quietening (which ORs bit 22 into the f32 mantissa). Game code inspecting the
|
||
NaN payload after frsp may see a different payload. Same structural root cause as
|
||
PPCBUG-128 (`lfs` SNaN quietening), but lower severity because frsp IS arithmetic.
|
||
|
||
IDs PPCBUG-232 through PPCBUG-239 are unallocated — no further bugs found in group 31.
|
||
|
||
---
|
||
|
||
## Batch 7 — VMX integer add/sub (group 32)
|
||
|
||
Per-group report: `audit-out/group-32-vmx-int-addsub.md`.
|
||
|
||
**Scope**: `vaddubm`, `vaddubs`, `vadduhm`, `vadduhs`, `vadduwm`, `vadduws`, `vaddsbs`, `vaddshs`,
|
||
`vaddsws`, `vaddcuw`, `vsububm`, `vsububs`, `vsubuhm`, `vsubuhs`, `vsubuwm`, `vsubuws`, `vsubsbs`,
|
||
`vsubshs`, `vsubsws`, `vsubcuw`.
|
||
|
||
**Overall verdict**: All 20 opcodes are arithmetically correct. No HIGH-severity bugs found.
|
||
Lane indexing (big-endian, PPC element 0 = `Vec128::bytes[0]`), saturation arithmetic, VSCR.SAT
|
||
sticky-set, and vaddcuw/vsubcuw carry/borrow semantics are all implemented correctly.
|
||
4 LOW-severity findings (2 test gaps, 1 code organization, 1 API hazard).
|
||
|
||
### PPCBUG-240 — 18 of 20 group-32 opcodes have zero interpreter-level tests
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` module
|
||
- **Symptom**: Only `test_vaddubs_saturates_and_sets_vscr_sat` covers any group-32 opcode.
|
||
`vaddubm`, `vsububm`, `vadduhm`, `vsubuhm`, `vadduwm`, `vsubuwm`, `vaddsbs`, `vsubsbs`,
|
||
`vadduhs`, `vsubuhs`, `vaddshs`, `vsubshs`, `vadduws`, `vsubuws`, `vaddsws`, `vsubsws`,
|
||
`vaddcuw`, `vsubcuw` — all 18 have no tests. No high risk today but no regression guard.
|
||
- **Recommended minimum**: wrap-around test (byte, halfword, word); sat-at-max and sat-at-min tests;
|
||
VSCR.SAT sticky-set across two successive saturating instructions; vaddcuw carry lane; vsubcuw
|
||
no-borrow lane.
|
||
|
||
### PPCBUG-241 — `vadduwm` / `vsubuwm` stranded in a separate section from the rest of group-32
|
||
|
||
- **Severity**: LOW (maintenance hazard)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2090–2104` (stranded) vs. `interpreter.rs:2784` (§4a group-32 section)
|
||
- **Symptom**: The two word-modulo opcodes are matched 700 lines above the rest of the group, with
|
||
only a comment at line 2819 as a cross-reference. A future sweep of §4a for group-32 changes
|
||
would miss them.
|
||
- **Fix**: Move both arms into §4a and remove the comment at line 2819.
|
||
|
||
### PPCBUG-242 — `set_vscr_sat(false)` can non-stickily clear SAT from arithmetic handlers
|
||
|
||
- **Severity**: LOW (API hazard)
|
||
- **Status**: open
|
||
- **Location**: `context.rs:252–259`
|
||
- **Symptom**: `set_vscr_sat(bool)` accepts `false`, which would clear the sticky SAT bit. All
|
||
current arithmetic callers pass `true` only (inside `if sat { ... }` guards), so no mis-clear
|
||
occurs today. But the API is misleading — a future saturating handler that writes
|
||
`set_vscr_sat(lane_sat)` with `lane_sat = false` would silently clear a previously-set bit.
|
||
- **Fix**: Rename to `sticky_set_vscr_sat()` (no bool argument, always ORs). Retain
|
||
`force_vscr_sat(bool)` for `mtvscr`.
|
||
|
||
### PPCBUG-243 — `vmx.rs` saturation helpers: u16/i16/u32/i32 variants have zero unit tests
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `crates/xenia-cpu/src/vmx.rs:705–799`
|
||
- **Symptom**: `vmx.rs` tests cover 5 cases of `sat_add/sub_i8/u8`. The 8 helpers for wider
|
||
types (`sat_add_u16`, `sat_sub_u16`, `sat_add_i16`, `sat_sub_i16`, `sat_add_u32`, `sat_sub_u32`,
|
||
`sat_add_i32`, `sat_sub_i32`) are mathematically correct but unguarded by any test. Recommended
|
||
additions listed in the per-group report.
|
||
|
||
IDs PPCBUG-244 through PPCBUG-274 are unallocated — no further bugs found in group 32.
|
||
|
||
---
|
||
|
||
## Batch 7 — VMX integer compare / min / max / avg (group 33)
|
||
|
||
Per-group report: `audit-out/group-33-vmx-int-compare.md`.
|
||
|
||
### PPCBUG-275 — All VC-form vector compare dot forms: `rc_bit()` reads wrong bit; CR6 never updated
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Affected opcodes**: `vcmpequb.`, `vcmpequh.`, `vcmpgtsb.`, `vcmpgtsh.`, `vcmpgtub.`, `vcmpgtuh.`
|
||
- **Location**: `decoder.rs:75` + `interpreter.rs:3318`, `3331`, `3344`, `3357`, `3370`, `3383`
|
||
- **Symptom**: `rc_bit()` is implemented as `self.raw & 1 != 0` (reads LSB = bit 0 of the word).
|
||
For VC-form instructions the Rc flag is at **PPC bit 21 = LSB bit 10**, not bit 0. Bit 0 is
|
||
the LSB of the 10-bit XO field. All integer compare XO values are even (XO=6, 70, 518, 774, 582, 838),
|
||
so their bit 0 is always 0. The CR6 update block is **unconditionally dead** regardless of
|
||
whether the programmer wrote the dot form. `vcmpequb. vMask, vData, vNeedle` + `bc 12,26`
|
||
(branch on CR6.LT = all-true) is the canonical AltiVec memchr idiom; it will always fall through.
|
||
- **Fix**:
|
||
```rust
|
||
// decoder.rs — add:
|
||
/// Rc bit for VC-form vector compare instructions (PPC bit 21 = LSB bit 10).
|
||
#[inline] pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }
|
||
```
|
||
Replace `instr.rc_bit()` with `instr.vc_rc_bit()` at interpreter.rs:3318, 3331, 3344, 3357,
|
||
3370, 3383.
|
||
|
||
### PPCBUG-276 — `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`: same VC-form Rc bug
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Affected opcodes**: `vcmpequw.`, `vcmpequw128.`, `vcmpgtuw.`, `vcmpgtsw.`
|
||
- **Location**: `interpreter.rs:2237`, `3396`, `3406`
|
||
- **Symptom**: Same root cause as PPCBUG-275. XO for vcmpequw=134, vcmpgtuw=646, vcmpgtsw=902 —
|
||
all even, bit 0 always 0. Word-compare dot forms never update CR6. `vcmpequw128` uses the
|
||
VMX128_R Rc encoding which also likely reads the wrong bit.
|
||
- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:2237, 3396, 3406. Separately verify
|
||
VMX128_R Rc bit position for `vcmpequw128` (may require its own extractor).
|
||
|
||
### PPCBUG-277 — Zero tests for all `vcmp*` dot forms and CR6 correctness
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` module
|
||
- **Symptom**: No test exercises any of the 10 integer vector compare opcodes. Critical missing:
|
||
`vcmpequb.` all-true → CR6.LT=1; `vcmpequb.` all-false → CR6.EQ=1; `vcmpgtsb` signed
|
||
boundary (0x80 vs 0x7F must yield false, not true); `vcmpgtsh` at 0x8000 vs 0x7FFF.
|
||
|
||
### PPCBUG-278 — Zero tests for all 12 `vmax*` / `vmin*` opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` module
|
||
- **Symptom**: None of vmaxub/uh/uw/sb/sh/sw, vminub/uh/uw/sb/sh/sw are tested. Critical missing:
|
||
`vmaxsb(0x80, 0x7F)` = 0x7F (signed max of -128 and +127); `vminsb(0x80, 0x7F)` = 0x80.
|
||
Without these, signed vs unsigned confusion in min/max would not be caught.
|
||
|
||
### PPCBUG-279 — Zero tests for all 6 `vavg*` opcodes; no signed-boundary or rounding coverage
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` module; `vmx.rs` test module
|
||
- **Symptom**: `avg_u8` through `avg_i32` helpers have no unit tests. Key rounding case:
|
||
`avg_u8(0, 1)` must be 1 (round up), not 0 (truncation). `avg_i32(i32::MIN, i32::MIN)` must
|
||
be `i32::MIN` without overflow.
|
||
|
||
IDs PPCBUG-280 through PPCBUG-314 are unallocated — no further bugs found in group 33.
|
||
|
||
---
|
||
|
||
## Batch 6 — VMX integer logical / shift / rotate / select (group 34)
|
||
|
||
Per-group report: `audit-out/group-34-vmx-logic-shift.md`.
|
||
|
||
Group 34 summary: the bitwise logical ops (vand/vandc/vor/vxor/vnor and their 128 variants)
|
||
are all ISA-correct — Vec128 is `[u8; 16]` with no padding bits, so `!(u32)` flips exactly
|
||
32 bits per lane with no upper-bit pollution (the PPCBUG-029/030/031 class does not apply to
|
||
VMX register files). The per-lane shifts (vslb/vsrb/vsrab, vslh/vsrh/vsrah, vslw/vsrw/vsraw
|
||
and their 128 variants) all correctly mask the shift count to the lane width before shifting;
|
||
vsraw uses i32 arithmetic right shift which is correctly defined in Rust for shift-by-31.
|
||
The per-lane rotates (vrlb/vrlh/vrlw and 128 variants) are correct. The whole-register bit
|
||
shifts (vsl/vsr) and whole-register byte shifts (vslo/vsro and 128 variants) correctly
|
||
extract the shift count from VB.b[15] with the proper bit masks. vsel and vsel128 are correct
|
||
including the read-before-write ordering on vsel128's vc=vd aliasing.
|
||
|
||
**One HIGH bug found**: vrlimi128 extracts both the rotate-amount (z) field and the
|
||
blend-mask (IMM) field from the wrong bit positions of the instruction word.
|
||
|
||
**0 MEDIUM bugs with code change needed. 1 HIGH. 10 LOW (test gaps and informational).**
|
||
|
||
### PPCBUG-315 — vrlimi128 z and IMM fields extracted from wrong bit positions
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: interpreter.rs:3551–3552
|
||
- **Symptom**: `shift = ((instr.raw >> 16) & 0x3)` reads integer bits 16–17 — the low 2 bits
|
||
of the 5-bit IMM (blend-mask) field — instead of the 2-bit `z` (rotate) field at integer
|
||
bits 6–7. `mask = (instr.raw >> 2) & 0xF` reads integer bits 2–5 — VD128h extension bits
|
||
and a reserved field — instead of the low 4 bits of IMM at integer bits 16–19.
|
||
**Every `vrlimi128` executes with a wrong rotate amount and a wrong per-word select mask.**
|
||
The only benign case is the degenerate encoding where `z == IMM[1:0]` and the garbage mask
|
||
happens to equal the intended mask — unlikely in real code.
|
||
- **VX128_4 field layout** (LSB-0 integer bit numbering after PPC big-endian byte-swap to host):
|
||
- `VD128l : 5` at integer bits 21–25 (PPC bits 6–10)
|
||
- `IMM : 5` at integer bits 16–20 (PPC bits 11–15) — blend mask, 4 bits used
|
||
- `VB128l : 5` at integer bits 11–15 (PPC bits 16–20)
|
||
- `z : 2` at integer bits 6–7 (PPC bits 24–25) — rotate amount 0..3
|
||
- `VD128h : 2` at integer bits 2–3 (PPC bits 28–29)
|
||
- **Fix**:
|
||
```rust
|
||
let shift = ((instr.raw >> 6) & 0x3) as usize; // z field: integer bits 6-7
|
||
let mask = (instr.raw >> 16) & 0xF; // IMM low 4 bits: integer bits 16-19
|
||
```
|
||
- **Canary reference**: `ppc_decode_data.h:585–608` `FormatVX128_4`; `ppc_emit_altivec.cc:1318,1324`.
|
||
- **Note**: the rotate logic (`b[(shift + i) % 4]`) and mask-select logic (`(mask >> (3-i)) & 1`)
|
||
in the interpreter body are ISA-correct — only the field extraction is wrong.
|
||
- **Test gap**: no interpreter execution test for vrlimi128 (PPCBUG-325).
|
||
|
||
### PPCBUG-316 — Zero interpreter execution tests for vslb/vsrb/vsrab (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs:3440–3463
|
||
|
||
### PPCBUG-317 — Zero interpreter execution tests for vslh/vsrh/vsrah (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:3472–3503
|
||
|
||
### PPCBUG-318 — vslo/vsro byte-shift count max is 15 (correct; informational)
|
||
|
||
- **Severity**: LOW (informational / wontfix)
|
||
- **Status**: wontfix
|
||
- `N` is a 4-bit field; max shift is 15 bytes = 120 bits (not 128). VD retains
|
||
the 8 LSBs of VA in position [127:120] at N=15. ISA-correct.
|
||
|
||
### PPCBUG-319 — vsel128 vc=vd read-before-write ordering (correct; informational)
|
||
|
||
- **Severity**: LOW (informational / wontfix)
|
||
- **Status**: wontfix
|
||
- `c = ctx.vr[vc]` is read before `ctx.vr[vd] = result`. Correctly sequenced.
|
||
|
||
### PPCBUG-320 — Zero interpreter execution tests for vslw/vsrw/vsraw/vrlw (+128 variants)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs:2108–2155
|
||
|
||
### PPCBUG-321 — Zero interpreter execution tests for vsl/vsr
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs:3508–3521
|
||
|
||
### PPCBUG-322 — Zero interpreter execution tests for vslo/vsro (+128 variants)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:3523–3541
|
||
|
||
### PPCBUG-323 — Zero interpreter execution tests for vand/vandc/vor/vxor/vnor (+128 variants)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: interpreter.rs:1900–1944
|
||
|
||
### PPCBUG-324 — Zero interpreter execution tests for vsel/vsel128
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:1945–1967
|
||
|
||
### PPCBUG-325 — Zero interpreter execution tests for vrlb/vrlh/vrlw/vrlimi128 (+128 variants)
|
||
|
||
- **Severity**: LOW (test gap; fix PPCBUG-315 before writing vrlimi128 tests)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:3464–3503, 2144–2155, 3550–3565
|
||
|
||
IDs PPCBUG-326 through PPCBUG-354 are unallocated — no further bugs found in group 34.
|
||
|
||
---
|
||
|
||
## Batch 8 — VMX permute / merge / splat / pack / unpack (group 35)
|
||
|
||
Per-group report: `audit-out/group-35-vmx-permute.md`.
|
||
|
||
**Summary**: 5 HIGH, 3 MEDIUM, 9 LOW. Four VX128_* field-extraction bugs; one missing post-pack permutation logic; VSCR.SAT and pack saturation semantics are all correct. Zero interpreter tests for any group-35 opcode.
|
||
|
||
### PPCBUG-360 — vperm128: VC register read from wrong field (vd128() instead of VX128_2 VC bits 23-25)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1979`
|
||
- **Symptom**: `vperm128` uses the VX128_2 instruction form. The permute-control register VC is a 3-bit field at PPC bits 23-25 (LSB integer bits 6-8). The code does `vc = instr.vd128()` which reads PPC bits 6-10 + 21-22 — a completely different set of bits. Every `vperm128` therefore permutes with a control vector read from the wrong register, producing garbage output. `vperm128` is one of the most-used VMX128 ops in Xbox 360 graphics code (texture/vertex data layout).
|
||
- **Fix**:
|
||
```rust
|
||
// decoder.rs — add accessor:
|
||
#[inline] pub fn vc128_2(&self) -> usize { ((self.raw >> 6) & 0x7) as usize }
|
||
// interpreter.rs:1979 — replace:
|
||
vc = instr.vc128_2(); // VX128_2 VC field at PPC bits 23-25
|
||
```
|
||
- **ISA ref**: `ppc-manual/vmx/vperm.md`, VX128_2 encoding; `ppc_decode_data.h:541-561`; `ppc_emit_altivec.cc:1203-1204` (`VX128_2_VC`).
|
||
|
||
### PPCBUG-361 — vsldoi128: SH field MSB reads bit 4 (reserved) instead of bit 9
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:2012`
|
||
- **Symptom**: VX128_5 SH is a 4-bit field at LSB integer bits 6-9. Code does `((raw >> 6) & 0x7) | (((raw >> 4) & 0x1) << 3)`. This reads bit 4 (a reserved field, always 0 in valid encodings) as the MSB of SH instead of bit 9. Shifts of 8-15 bytes silently resolve as shifts of 0-7 bytes. `vsldoi128` with `SH >= 8` (common in vector rotation patterns) always produces the wrong result.
|
||
- **Fix**:
|
||
```rust
|
||
let sh = ((instr.raw >> 6) & 0xF) as usize; // SH field: integer bits 6-9
|
||
```
|
||
- **ISA ref**: `ppc-manual/vmx/vsldoi.md`, VX128_5 encoding; `ppc_decode_data.h:609-634`; canary `VX128_5_SH`.
|
||
|
||
### PPCBUG-362 — vpermwi128: PERMh (high 3 bits of 8-bit PERM immediate) read from VD128l bits instead of bits 6-8
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:4089`
|
||
- **Symptom**: VX128_P PERM = `PERMl[4:0] | (PERMh[2:0] << 5)` where PERMl is at integer bits 16-20 and PERMh is at integer bits 6-8. Code does `(raw >> 16) & 0xFF` which reads bits 16-23. Bits 21-23 are VD128l[4:2], not PERMh. The top 3 bits of the 8-bit PERM immediate are wrong; output word lane selections for lanes 0 and 1 are controlled by garbage bits. Same pattern as PPCBUG-315.
|
||
- **Fix**:
|
||
```rust
|
||
let imm = ((instr.raw >> 16) & 0x1F) | (((instr.raw >> 6) & 0x7) << 5); // VX128_P PERM
|
||
```
|
||
- **ISA ref**: `ppc_decode_data.h:664-686`; `ppc_emit_altivec.cc:1214`.
|
||
|
||
### PPCBUG-363 — vpkd3d128: post-pack permutation (pack + z fields) entirely absent; output always placed in wrong lane when pack != 0
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:3783-3808`
|
||
- **Symptom**: Canary's `vpkd3d128` does three things: (1) pack VB by type, (2) permute the result with the existing VD register using a control determined by `pack` (IMM[1:0]) and `shift` (z field at integer bits 6-7), (3) store. Xenia-rs does only (1) and (3), skipping the entire lane-placement permutation. When `pack != 0`, the packed value must be merged into a specific 32-bit or 64-bit slot of VD — this merge never happens. `pack=0` is the only safe case. Most D3D vertex pack sequences use `pack=1` (32-bit slot) with varying `shift`.
|
||
- **Fix**: Extract `pack = uimm & 3` and `shift = (instr.raw >> 6) & 3` (z field), read existing `ctx.vr[vd]`, apply the permutation table from `ppc_emit_altivec.cc:2125-2188`, write back.
|
||
- **ISA ref**: `ppc_emit_altivec.cc:2088-2191`.
|
||
|
||
### PPCBUG-364 — vsldoi (non-128): correct; PPCBUG-365 — vsplt*: correct; informational
|
||
|
||
- **Severity**: LOW (wontfix)
|
||
- **Status**: wontfix
|
||
- `vsldoi` correctly extracts SH as `(raw >> 6) & 0xF`. `vspltb/vsplth/vspltw` correctly read UIMM from the VA position (integer bits 16-20, masked to lane width). No bugs.
|
||
|
||
### PPCBUG-366 — vspltisb / vspltish: sign-extension idiom is correct but non-obvious; future regression risk
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open (clarity fix recommended)
|
||
- **Location**: `interpreter.rs:2059-2060`, `2064-2066`
|
||
- **Symptom**: `simm | !0x1F` where `simm` is typed `i8`/`i16` is functionally correct (Rust narrows `!0x1F` to the target type), but the pattern is fragile under refactoring. Recommend:
|
||
```rust
|
||
let simm = (((instr.raw >> 16) & 0x1F) as i32).wrapping_shl(27).wrapping_shr(27) as i8;
|
||
```
|
||
|
||
### PPCBUG-367 — vupkhpx / vupklpx: channel replication vs zero-extend divergence; canary is unimplemented
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `vmx.rs:318-330`
|
||
- **Symptom**: `unpack_pixel_555` replicates 5-bit RGB channels (`r << 3 | r >> 2`) to fill 8 bits. ISA specifies zero-extension into bits 7:3, leaving bits 2:0 as zero. The replicate approach produces slightly different values (and slightly higher values), diverging from hardware.
|
||
- **Fix**: `let r8 = r << 3;` (drop the `| r >> 2` replication term).
|
||
|
||
### PPCBUG-368 — vpkpx: pack_pixel_555 channel assignment unverified against hardware; canary comparison inconclusive
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open (needs hardware trace or more detailed canary analysis)
|
||
- **Location**: `vmx.rs:310-316`
|
||
- **Symptom**: The xenia-rs layout comment says R=bits 8-15, G=16-23, B=24-31. Canary's `vkpkx_in_low` uses different shift amounts (`>> 9` for R, `>> 6` for G, `>> 3` for B), suggesting either a different input layout assumption or the channels are swapped. Without a hardware reference, cannot determine which is authoritative.
|
||
|
||
### PPCBUG-369 — vpkd3d128 z-field not extracted (sub-issue of PPCBUG-363)
|
||
|
||
- **Severity**: LOW (tracked under PPCBUG-363)
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:3785`
|
||
- The `z` field (VX128_4, integer bits 6-7) is never extracted. Correct extraction: `(instr.raw >> 6) & 0x3`.
|
||
|
||
### PPCBUG-370 — Zero interpreter tests for vperm / vperm128 (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1970-1995`
|
||
|
||
### PPCBUG-371 — Zero interpreter tests for vsldoi / vsldoi128 (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1997-2020`
|
||
|
||
### PPCBUG-372 — Zero interpreter tests for vpermwi128 (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:4087-4099`
|
||
|
||
### PPCBUG-373 — Zero interpreter tests for vmrghb / vmrglb / vmrghh / vmrglh (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3570-3600`
|
||
|
||
### PPCBUG-374 — Zero interpreter tests for vspltb / vsplth / vspltw / vspltw128 (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2022-2048`
|
||
|
||
### PPCBUG-375 — Zero interpreter tests for vspltisb / vspltish / vspltisw / vspltisw128 (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:2050-2068`
|
||
|
||
### PPCBUG-376 — Zero interpreter tests for all vpk* (16 ops) + VSCR.SAT coverage (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3607-3718`
|
||
|
||
### PPCBUG-377 — Zero interpreter tests for vupkhsb / vupklsb / vupkhsh / vupklsh (test gap)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3722-3754`
|
||
|
||
### PPCBUG-378 — Zero interpreter tests for vpkd3d128 / vupkd3d128 (test gap; blocked on PPCBUG-363)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3783-3835`
|
||
|
||
IDs PPCBUG-379 through PPCBUG-419 are unallocated — no further bugs found in group 35.
|
||
|
||
---
|
||
|
||
## Batch 9 — VMX float arithmetic / compare / convert / estimate (group 36)
|
||
|
||
Per-group report: `audit-out/group-36-vmx-float.md`.
|
||
|
||
Group 36 summary: **21 findings (PPCBUG-420..440). 6 HIGH, 8 MEDIUM, 7 LOW.** The most
|
||
critical bugs are: (1) four VMX float compare VC-form opcodes use `rc_bit()` (bit 0) instead
|
||
of the correct VC-form Rc bit (bit 10) — CR6 is never updated, same root cause as PPCBUG-275;
|
||
(2) vmaddfp128 and vmaddcfp128 have their multiplicand and accumulator operands swapped —
|
||
every matrix multiply / Newton-Raphson step using these opcodes produces the wrong result;
|
||
(3) VMX128_R dot-form compares (vcmpeqfp128. etc.) decode as Invalid due to missing key4
|
||
entries in decode_op6.
|
||
|
||
**6 HIGH, 8 MEDIUM, 7 LOW. 21 IDs used (PPCBUG-420..440). 39 IDs unallocated (PPCBUG-441..479).**
|
||
|
||
### PPCBUG-420 — vcmpeqfp / vcmpgefp / vcmpgtfp: `rc_bit()` reads wrong bit; CR6 never updated
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Affected opcodes**: `vcmpeqfp.`, `vcmpgefp.`, `vcmpgtfp.`
|
||
- **Location**: `interpreter.rs:1875`, `1885`, `1895`
|
||
- **Symptom**: `rc_bit()` = `self.raw & 1` reads LSB bit 0. For VC-form the Rc flag is at
|
||
PPC bit 21 = LSB bit 10. All XO values (vcmpeqfp=198, vcmpgefp=454, vcmpgtfp=710) have
|
||
bit 0 = 0, so CR6 is never updated for any float compare dot form. `vcmpeqfp.` + `bc 12,24`
|
||
(branch all-equal) always falls through.
|
||
- **Cross-reference**: PPCBUG-275 (identical root cause for integer vcmp). Canary reads
|
||
`i.VXR.Rc` (ppc_emit_altivec.cc:625, 633, 641).
|
||
- **Fix**: Add `pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }` to
|
||
`decoder.rs` and replace `instr.rc_bit()` at interpreter.rs:1875, 1885, 1895.
|
||
|
||
### PPCBUG-421 — vcmpbfp: `rc_bit()` reads wrong bit (VC-form); Rc gate permanently dead
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:3428`
|
||
- **Symptom**: Same root cause as PPCBUG-420. XO=966, bit 0 = 0; CR6 update never fires
|
||
for `vcmpbfp.`. The CR6 value logic (`eq = !any_out`) is correct; only the gate is wrong.
|
||
- **Fix**: Use `instr.vc_rc_bit()` at interpreter.rs:3428.
|
||
|
||
### PPCBUG-422 — vcmpeqfp128 / vcmpgefp128 / vcmpgtfp128 / vcmpbfp128: `rc_bit()` reads wrong bit (VX128_R-form)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `interpreter.rs:1875`, `1885`, `1895`, `3428` (shared arms with non-128 forms)
|
||
- **Symptom**: For VX128_R-form, Rc is at PPC bit 27 = LSB bit 4 (confirmed from canary's
|
||
`VX128_R` bitfield: `uint32_t Rc : 1` at bit 4 from LSB). `rc_bit()` reads bit 0. Fix
|
||
PPCBUG-423 first (dot forms decode as Invalid before this even matters).
|
||
- **Fix**: Add `pub fn vx128r_rc_bit(&self) -> bool { (self.raw >> 4) & 1 != 0 }` and use
|
||
it in the VX128_R compare arms.
|
||
|
||
### PPCBUG-423 — vcmpeqfp128. / vcmpgefp128. / vcmpgtfp128. / vcmpbfp128.: dot forms decode as `Invalid`
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs:640-648` (decode_op6 VMX128 compare key4 table)
|
||
- **Symptom**: decode_op6 extracts `key4 = (bits22-24 << 3) | bit27`. When Rc=1, PPC bit 27
|
||
is set, making key4 = non-dot value + 1. Dot-form key4 values (1, 9, 17, 25, 33) are all
|
||
absent from the match table. Decoder returns `PpcOpcode::Invalid`. Any game shader using a
|
||
VMX128-form float compare dot form traps with unimplemented opcode.
|
||
- **Fix**: Add dot-form entries to the key4 match table mapping to the same opcodes (the
|
||
interpreter arm uses `instr.vx128r_rc_bit()` to conditionally update CR6):
|
||
```rust
|
||
0b000001 => return PpcOpcode::vcmpeqfp128,
|
||
0b001001 => return PpcOpcode::vcmpgefp128,
|
||
0b010001 => return PpcOpcode::vcmpgtfp128,
|
||
0b011001 => return PpcOpcode::vcmpbfp128,
|
||
0b100001 => return PpcOpcode::vcmpequw128,
|
||
```
|
||
|
||
### PPCBUG-424 — vmaddfp128: operand swap — computes VA×VB+VD instead of VA×VD+VB
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52ece4b, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1771` (`r[i] = ai.mul_add(bi, di)`)
|
||
- **Symptom**: Canary (ppc_emit_altivec.cc:806-809) documents `(VD) <- (VA × VD) + VB` and
|
||
routes as `MulAdd(VA, VD, VB)`. Xenia-rs reads VA, VB, VD then computes
|
||
`ai.mul_add(bi, di)` = `VA × VB + VD` — VB and VD roles swapped. Every shader using
|
||
vmaddfp128 for matrix multiply or Newton-Raphson accumulation accumulates the wrong value.
|
||
The existing denorm-flush test aliases vA=vD=v2, making the swap invisible.
|
||
- **Fix**: `r[i] = ai.mul_add(di, bi);`
|
||
|
||
### PPCBUG-425 — vmaddcfp128: operand swap — computes VD×VB+VA instead of VA×VD+VB
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52ece4b, 2026-05-02)
|
||
- **Location**: `interpreter.rs:4065` (`r[i] = di.mul_add(bi, ai)`)
|
||
- **Symptom**: Canary (ppc_emit_altivec.cc:819) documents `(VD) <- (VA × VD) + VB`.
|
||
Xenia-rs computes `VD × VB + VA`. Both the first multiplicand and the addend are wrong.
|
||
- **Fix**: `r[i] = ai.mul_add(di, bi);`
|
||
- **Test gap**: zero tests for `vmaddcfp128`. Add test with distinct VA, VB, VD registers.
|
||
|
||
### PPCBUG-426 — vnmsubfp: two rounding steps instead of fused FMA; NaN sign may be flipped
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1786` (`r[i] = bi - ai * ci`)
|
||
- **Symptom**: `vmaddfp` uses single-rounded `ai.mul_add(ci, bi)`, but `vnmsubfp` uses
|
||
`bi - ai * ci` (two operations, two rounding steps). ISA specifies a single fused operation.
|
||
Canary acknowledges the same limitation (ppc_emit_altivec.cc:1136). Additionally, the
|
||
implicit negation in subtraction may flip the sign bit of a NaN result (see PPCBUG-183).
|
||
- **Fix**: `r[i] = -ai.mul_add(ci, -bi);` — single FMA rounding: `-(ai*ci + (-bi))` = `bi - ai*ci`.
|
||
|
||
### PPCBUG-427 — vnmsubfp128: same two-rounding form as vnmsubfp
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1803` (`r[i] = di - ai * bi`)
|
||
- **Symptom**: Same class as PPCBUG-426 for the VMX128 form.
|
||
- **Fix**: `r[i] = -ai.mul_add(bi, -di);`
|
||
|
||
### PPCBUG-428 — vrefp / vrefp128: full-precision 1/x instead of ~12-bit hardware estimate
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1853` (`r[i] = 1.0 / b[i]`)
|
||
- **Symptom**: Same class as PPCBUG-184 (fresx). Xenon vrefp provides ~12-bit accuracy;
|
||
xenia-rs computes full IEEE-754 division. Canary also uses full precision in practice.
|
||
|
||
### PPCBUG-429 — vrsqrtefp / vrsqrtefp128: full-precision 1/sqrt(x) instead of ~12-bit estimate
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:1862` (`r[i] = 1.0 / b[i].sqrt()`)
|
||
- **Symptom**: Same class as PPCBUG-428 for reciprocal square root.
|
||
|
||
### PPCBUG-430 — vexptefp / vexptefp128: full-precision exp2(x) instead of ~12-bit estimate
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3934` (`r[i] = b[i].exp2()`)
|
||
- **Symptom**: Same class as PPCBUG-428. NaN/Inf edge cases may diverge.
|
||
|
||
### PPCBUG-431 — vlogefp / vlogefp128: full-precision log2(x) instead of ~12-bit estimate
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3944` (`r[i] = b[i].log2()`)
|
||
- **Symptom**: Same class as PPCBUG-428.
|
||
|
||
### PPCBUG-432 — vrfin / vrfin128: Rust `round()` is round-half-away-from-zero; ISA requires round-to-nearest-even
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `interpreter.rs:2172` (`r[i] = b[i].round()`)
|
||
- **Symptom**: `vrfin(0.5)` → ISA = 0.0; Rust = 1.0. `vrfin(2.5)` → ISA = 2.0; Rust = 3.0.
|
||
Canary uses SSE2 `ROUNDPS` which is round-to-nearest-even.
|
||
- **Fix**: Use `f32::round_ties_even()` (stable since Rust 1.77).
|
||
|
||
### PPCBUG-433 — vctsxs / vcfpsxws128: NaN input returns 0 instead of saturating to INT_MIN (0x80000000)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `vmx.rs:217` (`if x.is_nan() { return (0, true); }`)
|
||
- **Symptom**: AltiVec ISA: NaN in vctsxs saturates to INT_MIN (0x80000000). Xenia-rs returns 0.
|
||
- **Fix**: `if x.is_nan() { return (i32::MIN, true); }`
|
||
|
||
### PPCBUG-434 — vctuxs NaN → 0 is correct; informational
|
||
|
||
- **Severity**: LOW (wontfix)
|
||
- **Status**: wontfix
|
||
- **Location**: `vmx.rs:225`
|
||
- **Note**: Unsigned NaN saturates to 0 per ISA. Xenia-rs is correct. Add a comment.
|
||
|
||
### PPCBUG-435 — vaddfp / vsubfp / vmulfp128: subnormal inputs not flushed when VSCR.NJ=1
|
||
|
||
- **Severity**: MEDIUM (latent — Xbox 360 always boots with NJ=1)
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1713`, `1729`, `1812`
|
||
- **Symptom**: VSCR.NJ=1 requires flush-to-zero for subnormal inputs. vmaddfp family correctly
|
||
calls `vmx::flush_denorm()`; plain add/sub/mul do not check VSCR.
|
||
|
||
### PPCBUG-436 — vmsum3fp128 / vmsum4fp128: per-product intermediates not individually flushed
|
||
|
||
- **Severity**: MEDIUM (latent)
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `interpreter.rs:4076`, `4083`
|
||
- **Symptom**: `flush_denorm` on final sum only. Per-lane products can be subnormal and
|
||
accumulate before the final flush.
|
||
|
||
### PPCBUG-437 — vmaddfp / vmaddfp128 / vmaddcfp128 / vnmsubfp128: subnormal output not flushed
|
||
|
||
- **Severity**: MEDIUM (latent)
|
||
- **Status**: applied (P5 d39d0ba, 2026-05-02)
|
||
- **Location**: `interpreter.rs:1752–1754`, `1771–1773`, `4064–4067`, `1803–1805`
|
||
- **Symptom**: VSCR.NJ=1 requires flushing subnormal results. Inputs flushed; outputs are not.
|
||
|
||
### PPCBUG-438 — Zero tests for vcmpeqfp / vcmpgefp / vcmpgtfp / vcmpbfp and dot forms
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` test module
|
||
|
||
### PPCBUG-439 — Zero tests for vrfiz / vrfin / vrfip / vrfim and 128-bit variants
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs:2158–2192`
|
||
|
||
### PPCBUG-440 — Zero tests for vctsxs / vctuxs / vcfsx / vcfux and 128-bit variants
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs:3842–3923`
|
||
|
||
IDs PPCBUG-441 through PPCBUG-479 are unallocated — no further bugs found in group 36.
|
||
|
||
---
|
||
|
||
## Batch 8 — VMX integer multiply-sum / multiply-half / sums / special (group 37)
|
||
|
||
Per-group report: `audit-out/group-37-vmx-mulsum.md`.
|
||
|
||
**Note**: All opcodes in this group are `XEINSTRNOTIMPLEMENTED()` stubs in xenia-canary; correctness is derived from the IBM ISA and `ppc-manual/vmx/` snapshots. `vrlimi128` is already tracked as PPCBUG-315.
|
||
|
||
### PPCBUG-482 — `vmhaddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)
|
||
|
||
- **Severity**: WITHDRAWN
|
||
- **Status**: no bug
|
||
- **Note**: Draft analysis suggested >>16; the spec snapshot `ppc-manual/vmx/vmhaddshs.md`
|
||
explicitly shows `prod = (VA[i]*VB[i]) >> 15` and the pathological-case example confirms
|
||
`0x8000*0x8000 >> 15 = 32768`. Xenia-rs matches the spec exactly. No code change.
|
||
|
||
### PPCBUG-483 — `vmhraddshs` shift >>15 — WITHDRAWN (spec snapshots confirm >>15 is correct)
|
||
|
||
- **Severity**: WITHDRAWN
|
||
- **Status**: no bug
|
||
- **Note**: `ppc-manual/vmx/vmhraddshs.md` explicitly shows `(product + 0x4000) >> 15`.
|
||
Xenia-rs matches. No code change needed.
|
||
|
||
### PPCBUG-487 — vsumsws/vsum2sws/vsum4sbs/vsum4ubs/vsum4shs: VB operand mis-named as "c"/"VC"
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3249-3307`
|
||
- **Symptom**: All five vsum* handlers use a VX-form instruction (two operands: VA and VB).
|
||
The code names the VB source `c` and the comment references "vC" — implying a non-existent
|
||
third register operand. Only `instr.ra()` and `instr.rb()` are valid for VX form; there is
|
||
no `rc()`. The arithmetic is correct (rb() is called), but the naming misleads maintainers
|
||
into thinking there is a VA-form three-operand encoding.
|
||
- **Fix**: Rename `c` → `b` and update comments to say `VB` instead of `vC` in all five
|
||
handler bodies.
|
||
|
||
### PPCBUG-490 — Zero tests for all six vmsum* opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` section
|
||
- **Symptom**: No unit test for `vmsumubm`, `vmsummbm`, `vmsumuhm`, `vmsumuhs`, `vmsumshm`,
|
||
`vmsumshs`. Critical missing: saturation + VSCR.SAT for `vmsumuhs`/`vmsumshs`; mixed-sign
|
||
byte product for `vmsummbm`; modulo wrap for `vmsumshm`.
|
||
|
||
### PPCBUG-491 — Zero tests for `vmhaddshs` and `vmhraddshs`
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` section
|
||
- **Symptom**: No test for either multiply-high-add instruction. Key cases: `VA = 0x8000`,
|
||
`VB = 0x8000` (minus-one-times-minus-one saturating case); `VA = VB = 0x7FFF, VC = 0x7FFF`
|
||
(add post-shift result to max accumulator). Verify VSCR.SAT is set on saturation and clear
|
||
on non-saturating inputs.
|
||
|
||
### PPCBUG-492 — Zero tests for `vmladduhm`
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` section
|
||
|
||
### PPCBUG-493 — Zero tests for all eight `vmule*` / `vmulo*` opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` section
|
||
- **Symptom**: No test for `vmuleub`, `vmuloub`, `vmulesb`, `vmulosb`, `vmuleuh`, `vmulouh`,
|
||
`vmulesh`, `vmulosh`. Key: even vs odd lane distinction (`vmulesh` vs `vmulosh`) is untested.
|
||
|
||
### PPCBUG-494 — Zero tests for all five vsum* opcodes
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs` `#[cfg(test)]` section
|
||
- **Symptom**: No test for `vsumsws`, `vsum2sws`, `vsum4sbs`, `vsum4ubs`, `vsum4shs`.
|
||
Missing: zero-output-lanes verification for `vsumsws` (w[0..2] must be 0) and `vsum2sws`
|
||
(w[0], w[2] must be 0); VSCR.SAT on overflow for all signed/unsigned variants.
|
||
|
||
### PPCBUG-495 — `vsumsws` comment says "vC[3]" should say "VB[3]"
|
||
|
||
- **Severity**: LOW (cosmetic)
|
||
- **Status**: open
|
||
- **Location**: `interpreter.rs:3248`
|
||
|
||
IDs PPCBUG-480, PPCBUG-481, PPCBUG-482 (withdrawn), PPCBUG-483 (withdrawn), PPCBUG-484,
|
||
PPCBUG-485, PPCBUG-486, PPCBUG-488, PPCBUG-489, PPCBUG-496, PPCBUG-497, PPCBUG-498 are
|
||
either withdrawn (no bug found after re-examination), informational, or references to
|
||
existing IDs. IDs PPCBUG-499 through PPCBUG-509 are unallocated — no further bugs found
|
||
in group 37.
|
||
|
||
---
|
||
|
||
## Batch 8 — VMX load/store (group 38)
|
||
|
||
Per-group report: `audit-out/group-38-vmx-loadstore.md`.
|
||
|
||
**Opcodes**: lvebx, lvehx, lvewx, lvewx128, lvlx, lvlx128, lvlxl, lvlxl128, lvrx, lvrx128,
|
||
lvrxl, lvrxl128, lvsl, lvsl128, lvsr, lvsr128, lvx, lvx128, lvxl, lvxl128, stvebx, stvehx,
|
||
stvewx, stvewx128, stvlx, stvlx128, stvlxl, stvlxl128, stvrx, stvrx128, stvrxl, stvrxl128,
|
||
stvx, stvx128, stvxl, stvxl128.
|
||
|
||
Group 38 summary: The load family (lvx, lvxl, lvlx, lvrx, lvsl, lvsr, lvebx, lvehx, lvewx,
|
||
lvewx128 and all 128/LRU-hint variants) is arithmetically correct. EA computation, alignment
|
||
masking, big-endian byte ordering, RA=0 special cases, and lane indexing all match the ISA and
|
||
the `ea_indexed` helper. **5 HIGH bugs found** — the systemic `invalidate_for_write` gap
|
||
(PPCBUG-107 family) applies to ALL 16 VMX store opcodes, and `stvewx128` has an additional
|
||
severe memory-corruption bug (writes 16 bytes instead of 1 word). **1 MEDIUM** (behavioral
|
||
divergence between lvebx/lvehx/lvewx and canary's full-line simplification — xenia-rs is
|
||
architecturally more correct). **1 MEDIUM** (lvsr sh=0 edge-case correctness, documentation
|
||
gap). **3 LOW** test-coverage gaps.
|
||
|
||
### PPCBUG-510 — `stvewx128` stores all 16 bytes instead of one word; 12-byte memory corruption (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (cedee3c, 2026-05-02)
|
||
- **Location**: interpreter.rs:2776-2781
|
||
- **Symptom**: Uses `& !0xF` (16-byte alignment) then stores all 16 bytes of the vector.
|
||
ISA semantics: word-align EA, extract the word lane `(EA & 0xF) >> 2`, store 4 bytes only.
|
||
The non-128 `stvewx` (interpreter.rs:1675-1687) is correct — `stvewx128` was not updated
|
||
to match. Corrupts 12 adjacent bytes on every execution.
|
||
- **Canary reference**: `InstrEmit_stvewx_` (cc:170-185) — `ea & ~3`, extract lane, `ByteSwap`,
|
||
store 4 bytes only. `stvewx128` routes through the same helper as `stvewx`.
|
||
- **Fix**: mirror the `stvewx` body with `instr.vs128()` substituted for `instr.rs()`.
|
||
|
||
### PPCBUG-511 — `stvx`, `stvx128`, `stvxl`, `stvxl128` missing `invalidate_for_write` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: interpreter.rs:1598-1603 (stvx), 1605-1610 (stvx128), 1699-1705 (stvxl/stvxl128)
|
||
- **Root cause**: PPCBUG-107 (systemic)
|
||
- **Symptom**: Under `--parallel`, a 16-byte stvx to a reserved line does not clear the
|
||
reservation table slot. The reserving thread's `stwcx.` spuriously succeeds.
|
||
- **Fix**: per PPCBUG-107 pattern — add `invalidate_for_write(ea)` guard before the byte loop.
|
||
|
||
### PPCBUG-512 — `stvebx`, `stvehx`, `stvewx`, `stvewx128` missing `invalidate_for_write` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: interpreter.rs:1655 (stvebx), 1664 (stvehx), 1675 (stvewx), 2776 (stvewx128)
|
||
- **Root cause**: PPCBUG-107 (systemic)
|
||
- **Note**: `stvewx128` must also fix PPCBUG-510 before adding the invalidation call (or the
|
||
invalidation covers the wrong, over-wide address range).
|
||
|
||
### PPCBUG-513 — `stvlx`, `stvlx128`, `stvlxl`, `stvlxl128` missing `invalidate_for_write` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: interpreter.rs:2746-2749 (stvlx/stvlxl), 2751-2754 (stvlx128/stvlxl128)
|
||
- **Root cause**: PPCBUG-107 (systemic)
|
||
- **Note**: partial stores can span a 128-byte line boundary when `ea & 0xF != 0` and
|
||
`n = 16 - shift` crosses the line; two `invalidate_for_write` calls may be needed.
|
||
|
||
### PPCBUG-514 — `stvrx`, `stvrx128`, `stvrxl`, `stvrxl128` missing `invalidate_for_write` (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (ca5b90b, 2026-05-01)
|
||
- **Locations**: interpreter.rs:2756-2759 (stvrx/stvrxl), 2761-2764 (stvrx128/stvrxl128)
|
||
- **Root cause**: PPCBUG-107 (systemic)
|
||
- **Note**: stvrx at shift=0 is a no-op (no bytes written); guard can skip the call in
|
||
that case. Otherwise invalidate `ea & !0xF` (the preceding aligned block).
|
||
|
||
### PPCBUG-515 — `lvebx`, `lvehx`, `lvewx` implement element semantics; canary uses full-line load (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Locations**: interpreter.rs:1613-1653
|
||
- **Symptom**: xenia-rs places the loaded byte/halfword/word into the correct lane and preserves
|
||
other lanes from VD (ISA-correct for the "undefined" lanes). Canary does a full aligned
|
||
16-byte `lvx`-style load that overwrites all lanes. Both are valid under the ISA's "undefined"
|
||
specification, but game code compiled against canary may observe the canary behavior. The
|
||
divergence is documented and no code change is required unless canary compatibility becomes
|
||
an explicit goal.
|
||
|
||
### PPCBUG-516 — `lvsr` sh=0 produces {16,17,...,31}; correct per ISA but undocumented (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM (documentation gap — computation is correct)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs:2218-2226
|
||
- **Symptom**: When EA is 16-byte aligned, `lvsr` produces byte values all >= 16 (the "select
|
||
entirely from VB" identity for `vperm`). The formula `(16 - sh) + i` cannot overflow u8
|
||
because `sh <= 15` guarantees `(16 - sh) + 15 <= 31`. No computation bug — but there is no
|
||
comment explaining why values > 15 are correct. Add a comment and a `debug_assert!(sh <= 15)`.
|
||
|
||
### PPCBUG-517 — Zero test coverage for lvlx/lvrx/stvlx/stvrx boundary edge cases (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: applied (P8 4029041, 2026-05-02)
|
||
- **Location**: vmx.rs tests (lines 756-792); interpreter.rs test module
|
||
- **Missing**: shift=15 for lvlx (1 byte loaded), shift=1 for lvrx (15 bytes), stvlx/stvrx
|
||
round-trip, stvrx at shift=0 confirmed no-op, full lvlx+lvrx+vor unaligned memcpy idiom
|
||
verified byte-exact.
|
||
|
||
### PPCBUG-518 — Zero interpreter-level execution tests for all 36 VMX load/store opcodes (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: interpreter.rs test module
|
||
- **Missing**: lvx alignment masking, stvx byte-order verification, lvebx lane placement,
|
||
lvsl/lvsr permute index values, lvewx128 after PPCBUG-510 fix. 17 recommended minimum tests
|
||
enumerated in per-group report.
|
||
|
||
### PPCBUG-519 — `stvrx` aligned no-op is silent; no debug trace (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: vmx.rs:284-292 (`store_vector_right`)
|
||
- **Symptom**: shift=0 returns immediately with no trace event. Confusing during memory-
|
||
visibility debugging. Add `tracing::trace!` in debug builds.
|
||
|
||
IDs PPCBUG-520 through PPCBUG-559 are unallocated — no further bugs found in group 38.
|
||
|
||
---
|
||
|
||
## Phase C1 — Decoder field extractors
|
||
|
||
Per-group report: `audit-out/phase-c1-decoder-fields.md`.
|
||
|
||
Comprehensive audit of all `DecodedInstr` field accessors in `decoder.rs` lines 21-165, cross-checked against ISA form specs, Canary `FormatXxx` structs, and the interpreter's inline re-extraction. Phase B already found PPCBUG-040/046/275/315/360-363/420-422. Phase C1 adds 8 new findings (PPCBUG-560..567).
|
||
|
||
**Confirmed-clean** (no new finding): `op`, `rd`/`rs`/`rt`, `ra`, `rb`, `rc`, `simm16`, `uimm16`, `d`, `ds`, `li`, `bd`, `bo`, `bi`, `aa`, `lk`, `oe`, `to`, `mb`/`me` (M-form only), `sh`, `spr`, `crm`, `crfd`/`crfs`, `l`, `crbd`/`crba`/`crbb`, `nb`, `va128`/`vb128`/`vd128`/`vs128`, `extract_vx128_uimm5`.
|
||
|
||
### PPCBUG-560 — sh64() test helper wrong bit order; masks PPCBUG-040 from unit tests (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `xenia-rs/crates/xenia-cpu/tests/disasm_goldens.rs:160-176` (function `rldicl`)
|
||
- **Symptom**: The `rldicl` test helper encodes `sh[5:1]` at PPC bits 16-20 and `sh[0]` at PPC bit 30. The ISA encodes `sh[4:0]` at PPC bits 16-20 and `sh[5]` at PPC bit 30. The wrong `sh64()` formula `(sh_lo << 1) | sh_hi` correctly inverts the wrong encoding, making the test pass — but fails on real binary code.
|
||
|
||
**Counterexamples** (ISA-encoded input → `sh64()` output):
|
||
|
||
| True shift | sh64() result | Error |
|
||
|-----------|--------------|-------|
|
||
| 1 | 2 | +1 |
|
||
| 16 | 32 | +16 |
|
||
| 32 | 1 | -31 |
|
||
| 33 | 3 | -30 |
|
||
| 63 | 63 | 0 (coincidence) |
|
||
|
||
Only `sh=0` and `sh=63` decode correctly. All other shifts (1-62) are wrong against real code.
|
||
|
||
- **Fix for `sh64()`** (per PPCBUG-040):
|
||
```rust
|
||
pub fn sh64(&self) -> u32 {
|
||
(extract_bits(self.raw, 30, 30) << 5) | extract_bits(self.raw, 16, 20)
|
||
}
|
||
```
|
||
- **Fix for test helper** (must be in same commit):
|
||
```rust
|
||
// Correct: sh_lo = sh & 0x1F → PPC bits 16-20; sh_hi = sh >> 5 → PPC bit 30
|
||
(30 << 26) | (rs << 21) | (ra << 16) | ((sh & 0x1F) << 11)
|
||
| (mb_lo << 6) | (mb_hi << 5) | (0 << 2) | ((sh >> 5) << 1) | rc
|
||
```
|
||
- **Cross-reference**: PPCBUG-040 (primary finding). PPCBUG-560 is the test-infrastructure companion.
|
||
|
||
### PPCBUG-561 — Missing `mb_md()` accessor on `DecodedInstr`; interpreter inlines wrong formula at 6 sites (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs` — accessor absent; `disasm.rs:1256` has correct local helper; `interpreter.rs` lines 696, 706, 716, 726, 736, 746 each inline the wrong formula
|
||
- **Symptom**: Interpreter uses `(instr.mb() << 1) | ((instr.raw >> 1) & 1)` which: (a) reads `SH5` (PPC bit 30, host bit 1) instead of `MB5` (PPC bit 26, host bit 5) as the high bit; (b) places the high bit at position 0 instead of position 5. `disasm.rs` has the correct version already — expose it as `DecodedInstr::mb_md()`.
|
||
- **Cross-reference**: PPCBUG-046 (primary finding).
|
||
|
||
- **Fix**:
|
||
```rust
|
||
// Add to decoder.rs:
|
||
#[inline] pub fn mb_md(&self) -> u32 {
|
||
extract_bits(self.raw, 21, 25) | (extract_bits(self.raw, 26, 26) << 5)
|
||
}
|
||
```
|
||
Replace all 6 inline sites in `interpreter.rs` with `instr.mb_md()`.
|
||
|
||
### PPCBUG-562 — Missing `vc_rc_bit()` and `vx128r_rc_bit()` per-form Rc accessors (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs` — no per-form Rc accessors; `interpreter.rs` uses generic `rc_bit()` (bit 31) for both VC and VX128_R forms
|
||
- **Symptom**: Generic `rc_bit()` reads PPC bit 31 (LSB). VC-form Rc is at PPC bit 21 = `(raw >> 10) & 1`. VX128_R-form Rc is at PPC bit 27 = `(raw >> 4) & 1`. Using bit 31 for these forms means the CR6 update gate is permanently disabled for all dot-form VMX vector compares — root cause of PPCBUG-275/420/421/422.
|
||
- **Fix**:
|
||
```rust
|
||
/// Rc for VC-form vector compare (vcmpeqfp, vcmpgefp, vcmpgtfp, vcmpbfp, etc.) — PPC bit 21.
|
||
#[inline] pub fn vc_rc_bit(&self) -> bool { extract_bits(self.raw, 21, 21) != 0 }
|
||
/// Rc for VX128_R-form compare (vcmpeqfp128, vcmpgefp128, etc.) — PPC bit 27.
|
||
#[inline] pub fn vx128r_rc_bit(&self) -> bool { extract_bits(self.raw, 27, 27) != 0 }
|
||
```
|
||
- **Cross-reference**: PPCBUG-275 / PPCBUG-420 / PPCBUG-421 / PPCBUG-422.
|
||
|
||
### PPCBUG-563 — Missing `vx128_4_z()` and `vx128_4_imm()` for VX128_4 form (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs` — accessors absent; `interpreter.rs:3551-3552` (vrlimi128) reads wrong bit positions
|
||
- **Symptom**: VX128_4 form has `IMM` (5-bit) at PPC bits 11-15 (host bits 16-20) and `z` (2-bit) at PPC bits 24-25 (host bits 6-7). Interpreter `vrlimi128` uses `(raw >> 16) & 0x3` for shift (reads VB128l partial) and `(raw >> 2) & 0xF` for mask (reads VD128h region).
|
||
- **Fix**:
|
||
```rust
|
||
#[inline] pub fn vx128_4_imm(&self) -> u32 { extract_bits(self.raw, 11, 15) }
|
||
#[inline] pub fn vx128_4_z(&self) -> u32 { extract_bits(self.raw, 24, 25) }
|
||
```
|
||
- **Cross-reference**: PPCBUG-315.
|
||
|
||
### PPCBUG-564 — Missing `vx128_p_perm()` for VX128_P form; PERMh reads XO bits (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:4089` (vpermwi128) uses `(raw >> 16) & 0xFF` which reads PERMl (correct) but uses XO/reserved bits 21-23 for PERMh instead of PPC bits 23-25
|
||
- **Symptom**: Top 3 bits of the 8-bit PERM selector are wrong for every `vpermwi128` instruction. Lane selections for words 0 and 1 are garbage.
|
||
- **Fix**:
|
||
```rust
|
||
#[inline] pub fn vx128_p_perm(&self) -> u32 {
|
||
extract_bits(self.raw, 11, 15) | (extract_bits(self.raw, 23, 25) << 5)
|
||
}
|
||
```
|
||
- **Cross-reference**: PPCBUG-362.
|
||
|
||
### PPCBUG-565 — Missing `vx128_5_sh()` for VX128_5 form; vsldoi128 MSB reads reserved bit (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: applied (52b05b1, 2026-05-01)
|
||
- **Location**: `decoder.rs` — accessor absent; `interpreter.rs:2012` (vsldoi128) uses `(raw >> 4) & 0x1` for the shift MSB (reads PPC bit 27 = reserved) instead of PPC bit 22 = host bit 9 = `(raw >> 9) & 1`
|
||
- **Symptom**: vsldoi128 shift amounts ≥ 8 (where the 4th bit matters) use a garbage bit. The correct 4-bit SH is at PPC bits 22-25 (host bits 6-9) = `(raw >> 6) & 0xF`.
|
||
- **Fix**:
|
||
```rust
|
||
#[inline] pub fn vx128_5_sh(&self) -> u32 { extract_bits(self.raw, 22, 25) }
|
||
```
|
||
- **Cross-reference**: PPCBUG-361.
|
||
|
||
### PPCBUG-566 — Missing XER TBC field accessor documentation for lswx/stswx (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: applied (P6 112202c, 2026-05-02)
|
||
- **Location**: `decoder.rs` — XER[25:31] (7-bit transfer byte count) is runtime state, not an instruction field; no accessor exists and no documentation notes the gap
|
||
- **Symptom**: `lswx`/`stswx` use XER[25:31] as their byte count. The interpreter has no way to read this via the normal accessor pattern. Not a bit-position error, but a structural gap.
|
||
- **Recommendation**: add `ctx.xer_tbc() -> u8` to `PpcContext` returning `(ctx.xer() >> 25) & 0x7F`. Document that these are the only instructions that read XER as a count operand.
|
||
|
||
### PPCBUG-567 — Zero unit tests pin any scalar field accessor (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs` unit tests; `tests/disasm_goldens.rs`
|
||
- **Symptom**: Phase 4 tests pin `va128`/`vb128`/`vd128`/`vs128` only. No test verifies: `sh64()` against ISA-encoded instructions (existing test validates wrong round-trip — PPCBUG-560), `mb_md()` (absent), `vc_rc_bit()`/`vx128r_rc_bit()` (absent), `ds()` for negative displacement, `spr()` for LR/CTR/XER beyond DEC.
|
||
- **Recommended additions**: decoder-level unit tests using ISA-correct encodings for `sh64`, `mb_md`, the two new Rc accessors, `ds` negative, `spr` for LR=8 and CTR=9. See phase-c1-decoder-fields.md for concrete encoding examples.
|
||
|
||
IDs PPCBUG-568 through PPCBUG-599 are unallocated — no further bugs found in Phase C1 scope.
|
||
|
||
---
|
||
|
||
## Phase C2 — Decoder opcode-lookup tables
|
||
|
||
Per-group report: `audit-out/phase-c2-decoder-lookup.md`.
|
||
|
||
**Methodology**: complete line-by-line comparison of all `decode_opNN` functions in
|
||
`xenia-rs/crates/xenia-cpu/src/decoder.rs` against
|
||
`xenia-canary/src/xenia/cpu/ppc/ppc_opcode_lookup_gen.cc`, plus cross-reference of
|
||
`ppc-manual/forms/` for VC, VX128_R, VX128_5, VA, VX128_3, VX128_4 forms.
|
||
|
||
**Overall verdict**: the decoder is structurally sound and entry-by-entry matches
|
||
Canary for all real Xbox 360 instructions, with one pre-known exception (PPCBUG-600 =
|
||
PPCBUG-423). Zero new wrong-entry bugs. One new medium-severity cross-reference bug
|
||
(dot-form gap), one medium maintainability risk (key-ordering dependency), three LOWs
|
||
(test gaps, reserved-encoding misidentification, undocumented fast-path).
|
||
|
||
### PPCBUG-600 — `decode_op6` key4: VMX128 compare dot-forms decode as Invalid (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM (cross-reference for PPCBUG-423; same root cause, Phase C2 ID)
|
||
- **Status**: applied (52b05b1, 2026-05-01) (dup-of:423 for the fix; this ID is for Phase C2 tracking)
|
||
- **Location**: `decoder.rs:640-648` (`decode_op6`, key4 match table)
|
||
- **Symptom**: The VX128_R form places `Rc` at PPC bit 27. The key4 formula is
|
||
`(bits 22-24 << 3) | bit27`. When Rc=1 (dot-form), bit27=1 and key4 is odd.
|
||
Only even key4 values are in the table. Five dot-form encodings fall through to
|
||
`PpcOpcode::Invalid`:
|
||
- `vcmpeqfp128.` → key4=0b000001 (1), decodes as Invalid
|
||
- `vcmpgefp128.` → key4=0b001001 (9), decodes as Invalid
|
||
- `vcmpgtfp128.` → key4=0b010001 (17), decodes as Invalid
|
||
- `vcmpbfp128.` → key4=0b011001 (25), decodes as Invalid
|
||
- `vcmpequw128.` → key4=0b100001 (33), decodes as Invalid
|
||
- **Contrast**: standard VMX VC-form compares (op=4 key3) are correct because their
|
||
Rc bit (bit21) is outside the key3 window (bits 22-31). VMX128_R uses a different
|
||
form where Rc is at bit27, which is inside the key4 window.
|
||
- **Fix**: Add 5 dot-form entries to key4 in `decode_op6`:
|
||
```rust
|
||
0b000001 => return PpcOpcode::vcmpeqfp128,
|
||
0b001001 => return PpcOpcode::vcmpgefp128,
|
||
0b010001 => return PpcOpcode::vcmpgtfp128,
|
||
0b011001 => return PpcOpcode::vcmpbfp128,
|
||
0b100001 => return PpcOpcode::vcmpequw128,
|
||
```
|
||
The interpreter's existing `instr.rc_bit()` check already handles CR6 update for
|
||
dot-forms — decoder just needs to emit the right opcode.
|
||
- **See also**: PPCBUG-423 (Phase B original finding) for impact assessment and
|
||
full context.
|
||
|
||
### PPCBUG-601 — `decode_op6` key ordering creates undocumented correctness dependency (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM (maintainability risk; no current wrong-decode for real code)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs:603-637` (`decode_op6`, key1/key2/key3 dispatch)
|
||
- **Symptom**: key1 (bits 21-22 << 5 | bits 26-27), key2 (bits 21-23 << 4 | bits 26-27),
|
||
and key3 (bits 21-27) all overlap. Correctness depends on an implicit invariant:
|
||
vpkd3d128 and vrlimi128 (matched by key2) always have bits 26-27 = `01`, while all
|
||
15 key3 unary entries always have bits 26-27 = `11`. If a future instruction were
|
||
added to key2 with bits 26-27 = `11`, it would shadow a key3 entry. No comment in
|
||
the source documents this constraint.
|
||
- **Fix**: Add a comment block above the key2/key3 dispatches explaining the invariant:
|
||
```
|
||
// key2 matches bits 26-27 == 01 only (vpkd3d128, vrlimi128).
|
||
// key3 entries all have bits 26-27 == 11. No overlap is possible
|
||
// for any currently-defined Xbox 360 instruction.
|
||
```
|
||
|
||
### PPCBUG-602 — `decode_op4` vsldoi128 fallback: over-broad single-bit catch-all (LOW)
|
||
|
||
- **Severity**: LOW (only fires for reserved/undefined encodings in practice)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs:558-561`
|
||
- **Symptom**: The VX128_5 form for vsldoi128 is identified by op=4, bit27=1. The
|
||
dispatch uses a bare `if extract_bits(code, 27, 27) == 1` after the other tables,
|
||
rather than an exact VX128_5-form check. Reserved VA extended opcodes that happen
|
||
to have their key4 bit4 (= word bit27) set decode as vsldoi128 instead of Invalid.
|
||
Example: VA XO=0b100011 (35, reserved gap between vmladduhm=34 and vmsumubm=36)
|
||
— key4 misses, bit27=1 fires → decoded as vsldoi128. ISA specifies reserved
|
||
encodings should trap; this silently assigns a meaning.
|
||
- **Fix (optional)**: Strengthen to an exact match:
|
||
```rust
|
||
// VX128_5 form: SH@22-25, VA128h@26, XO=bit27. Bits 28-31 carry VD128h/VB128h.
|
||
// Only vsldoi128 uses this form. Verify the XO bit and absence of load/store marker.
|
||
if extract_bits(code, 27, 27) == 1 && extract_bits(code, 30, 31) != 0b11 {
|
||
return PpcOpcode::vsldoi128;
|
||
}
|
||
```
|
||
Alternatively, accept current behavior and add a comment.
|
||
|
||
### PPCBUG-603 — Primary opcode 9 maps to Invalid; correct but undocumented (LOW)
|
||
|
||
- **Severity**: LOW (test gap / documentation only)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs:369` (the `_ => PpcOpcode::Invalid` arm of `lookup_opcode`)
|
||
- **Symptom**: Primary opcode 9 (`dozi` in original POWER ISA) is undefined on
|
||
Xenon/750CL and correctly decodes as Invalid. Canary also returns `PPC_DECODER_MISS`.
|
||
No comment documents this intentional absence.
|
||
- **Fix**: Add `// 9 = dozi (POWER-only, not present on Xenon)` comment near the
|
||
match, or explicitly add `9 => PpcOpcode::Invalid` with a comment.
|
||
|
||
### PPCBUG-604 — Zero decoder unit tests for decode_op5, decode_op6, decode_op30, decode_op63 (LOW)
|
||
|
||
- **Severity**: LOW (test gap)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs:897-1107` (test module)
|
||
- **Symptom**: The 10 existing decoder tests cover addi, lwz, branch, stw, ori, and
|
||
cache mechanics. None exercise VMX128 (op=5, op=6), rotate-doubleword (op=30), or
|
||
FPU (op=63) opcode paths. In particular, no test would have caught PPCBUG-600
|
||
(vcmpeqfp128 dot-form decodes as Invalid) before it caused a runtime trap.
|
||
- **Recommended minimum additions** (8 tests):
|
||
1. `vcmpeqfp128` (Rc=0) → decodes as `vcmpeqfp128`.
|
||
2. `vcmpeqfp128.` (Rc=1) → decodes as `vcmpeqfp128` (tests PPCBUG-600 fix).
|
||
3. `vcmpeqfp` (op=4, Rc=0) → key3 check, bit21=0.
|
||
4. `vcmpeqfp.` (op=4, Rc=1) → key3 check, bit21=1, same decode.
|
||
5. `vsldoi128` (op=4, bit27=1) → fallback fires.
|
||
6. `rldicl` (op=30) → decode_op30.
|
||
7. `fadd` (op=63, Rc=0) → arithmetic table.
|
||
8. `fadd.` (op=63, Rc=1) → same decode as fadd.
|
||
|
||
### PPCBUG-605 — `decode_op31` sradix fast-path is correct but undocumented (LOW)
|
||
|
||
- **Severity**: LOW (documentation gap only)
|
||
- **Status**: open
|
||
- **Location**: `decoder.rs:702-705`
|
||
- **Symptom**: The sradix pre-check uses bits 21-29 (9 bits). The subsequent main
|
||
table uses bits 21-30 (10 bits). Because no main-table entry has bits 21-29 =
|
||
0b110011101, the fast-path cannot shadow a legitimate main-table entry. However,
|
||
this is not documented in the source, and a reader might worry that sradix (Rc=0,
|
||
bits 21-30 = 0b1100111010) or sradix. (Rc=1, same bits 21-30) could conflict with
|
||
a future entry at key 0b1100111010.
|
||
- **Fix**: Add a comment: `// sradix: XS-form, XO=413 (bits 21-29=0b110011101).`
|
||
`// No main-table entry uses bits 21-30 starting with 0b110011101x.`
|
||
|
||
IDs PPCBUG-606 through PPCBUG-639 are unallocated — no further bugs found in Phase C2.
|
||
|
||
---
|
||
|
||
## Phase C3 — Disassembler formatter parity
|
||
|
||
Per-group report: `audit-out/phase-c3-disasm.md`.
|
||
|
||
**Methodology**: Full line-by-line audit of `disasm.rs:format()` and all ~70 per-class helpers.
|
||
Cross-referenced against `xenia-canary/src/xenia/cpu/ppc/ppc_opcode_disasm_gen.cc`,
|
||
`tests/golden/extended_mnemonics.json`, and `tests/golden/base_mnemonics.json`.
|
||
Checked: mnemonic correctness (Rc/OE/LK/AA/L-field), operand formatting (signed vs unsigned,
|
||
hex vs decimal), simplified-mnemonic priority, branch-condition extended forms, VMX register
|
||
naming, VX128 field extraction, and golden test coverage.
|
||
|
||
**Overall verdict**: The formatter is structurally sound. All OE/Rc/LK/AA suffix handling, the
|
||
simplified mnemonic priority order, VMX 5-bit and VMX128 7-bit register naming, SPR mnemonics,
|
||
and CR-logical extended forms are correct. Two HIGH bugs found: the `bdnz`/`bdz` extended
|
||
mnemonic appends a spurious condition suffix, and the pre-existing `sync`/`lwsync` bug
|
||
(PPCBUG-088) is re-assessed as HIGH in disassembler scope. Two MEDIUM bugs: decimal vs hex
|
||
for SIMM immediates and D-form displacements (diverges from every real PPC disassembler).
|
||
Several LOW findings for golden fixture correctness and edge cases.
|
||
|
||
**Key finding**: the disassembler's VX128 field extraction (vperm128 VC, vsldoi128 SH,
|
||
vpermwi128 PERM) is CORRECT in all three cases where the interpreter (PPCBUG-360/361/362)
|
||
has the wrong extraction. The disassembler was written independently and got them right.
|
||
|
||
### PPCBUG-640 — `fmt_bc`: pure `bdnz`/`bdz` emits `bdnzge`/`bdzge` (spurious condition suffix) (HIGH)
|
||
|
||
- **Severity**: HIGH
|
||
- **Status**: applied (d4f6ea7, 2026-05-02)
|
||
- **Location**: `disasm.rs:829-834`
|
||
- **Symptom**: For `bcx` with BO=16 (`bdnz`: decrement CTR, branch if CTR≠0, CR ignored):
|
||
- `decr = (16 & 4) == 0` = true
|
||
- `uncond = (16 & 16) != 0` = true
|
||
- Code falls into the `if decr` branch and computes `cond_name_opt` from `(cr_bit=0, cond_true=false)` → `Some("ge")`
|
||
- Emits: **`bdnzge`** — WRONG. ISA simplified form is `bdnz`.
|
||
|
||
For BO=18 (`bdz`): same path → **`bdzge`** — WRONG.
|
||
|
||
The bug is absent in `fmt_bclr` which has an explicit `if decr && uncond` guard at line 872
|
||
producing `bdnzlr`/`bdzlr` correctly. `fmt_bc` lacks this guard.
|
||
|
||
The golden fixture "bdnz 0x82000040" (PPCBUG-650 companion) pins the wrong output.
|
||
|
||
- **Fix**: In `fmt_bc`, inside the `if decr` block, gate the condition string on `!uncond`:
|
||
```rust
|
||
if decr {
|
||
let z = if bo & 0x02 != 0 { "z" } else { "nz" };
|
||
let cond_str = if uncond { "" } else { cond_name_opt.unwrap_or("") };
|
||
let ext_mnem = format!("bd{z}{cond_str}{a}{l}");
|
||
let ext_ops = format!("{cr}0x{target:08X}");
|
||
with_ext(&base_mnem, base_ops, 8, &ext_mnem, ext_ops, 8)
|
||
}
|
||
```
|
||
Also update golden fixtures PPCBUG-650.
|
||
|
||
- **Impact**: All analysis-DB queries for `bdnz` loops (common in pixel-shader and vertex
|
||
processing loops) return zero rows; they are stored as `bdnzge`. Developers inspecting
|
||
loop structures see a misleading condition name on a CTR-only branch.
|
||
|
||
### PPCBUG-641 — `sync` emits `"sync"` for `lwsync` (L=1) — re-assessment of PPCBUG-088 (HIGH)
|
||
|
||
- **Severity**: HIGH (disassembler scope; PPCBUG-088 was LOW for interpreter scope)
|
||
- **Status**: open (see PPCBUG-088 for fix)
|
||
- **Location**: `disasm.rs:364`
|
||
- **Symptom**: `PpcOpcode::sync` always emits `"sync"`. The L-field at PPC bit 10 selects
|
||
`lwsync` (L=1, encoding `0x7C2004AC`). `lwsync` is the acquire barrier in every Xbox 360
|
||
spinlock. Every `lwsync` in the disassembly DB is stored as `mnemonic='sync'`.
|
||
`SELECT * WHERE mnemonic='lwsync'` returns zero rows regardless of binary content.
|
||
- **Note**: the golden fixture for lwsync (PPCBUG-649) currently pins the wrong output.
|
||
|
||
### PPCBUG-642 — `fmt_bcctr` missing extended form for CTR-decrement/ignore-CR BO values (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:880-902`
|
||
- **Symptom**: `bcctrx` with BO=16 (decrement CTR, ignore CR) falls through to `base()` with
|
||
no extended form. `fmt_bclr` (the equivalent for bclrx) correctly handles the same case with
|
||
an explicit `decr && uncond` check at line 872, producing `bdnzlr`.
|
||
Note: `bcctr` with CTR-decrement is undefined by PowerISA; this encoding should never appear
|
||
in valid compiled code. The inconsistency is a maintenance concern rather than a runtime bug.
|
||
- **Fix**: Add a `decr && uncond` check before the `cond_branch_ext` call in `fmt_bcctr`,
|
||
mirroring lines 872-876 in `fmt_bclr`. Or add a comment explaining the ISA undefined status.
|
||
|
||
### PPCBUG-643 — SIMM immediate display: decimal diverges from Canary and real disassemblers (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:946` (addi), `976` (addic), `989` (subfic), `990` (mulli),
|
||
`1003` (cmpi), `1048-1061` (fmt_ld/fmt_st), and all similar SIMM sites
|
||
- **Symptom**: SIMM immediates are formatted via Rust's `{imm}` (decimal). Canary uses
|
||
`"-0x{:X}"` / `"0x{:X}"` (signed hex) for every SIMM field. GNU objdump, IDA Pro,
|
||
and all standard PPC disassemblers use hex. The inconsistency is internal to xenia-rs:
|
||
`addis`/`oris`/`xoris` use hex (`0x{imm_u:X}`), but `addi`/`addic`/`mulli` use decimal.
|
||
This misleads analysis-DB queries that mix instructions (e.g. `addi r3, r1, -4` vs
|
||
`addis r3, r0, 0x8000`).
|
||
- **Impact**: Medium — the output is not *wrong* (the value is correctly computed), but
|
||
cross-referencing with Canary output or objdump requires manual conversion.
|
||
|
||
### PPCBUG-644 — D-form load/store displacement uses decimal instead of hex (MEDIUM)
|
||
|
||
- **Severity**: MEDIUM
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:1053` (`fmt_ld`), `1061` (`fmt_st`), `1069` (`fmt_ds`)
|
||
- **Symptom**: `format!("{rn}, {d}({})", gpr(ra))` outputs decimal for the displacement.
|
||
Canary outputs `"-0x8(r1)"` not `"-8(r1)"`. Every standard PPC disassembler uses hex.
|
||
Affects 25+ D-form and DS-form opcodes. Negative displacements (-8, -16, etc.) are
|
||
especially confusing in decimal when reading stack frame accesses.
|
||
- **Fix**:
|
||
```rust
|
||
let d_str = if d < 0 { format!("-0x{:X}", -d) } else { format!("0x{:X}", d) };
|
||
base(mnem, format!("{rn}, {d_str}({})", gpr(ra)), 8)
|
||
```
|
||
Update all golden fixture rows with displacement values.
|
||
|
||
### PPCBUG-645 — `cntlzdx` Rc suffix: moot for valid encodings, but WONTFIX (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: wontfix
|
||
- **Location**: `disasm.rs:286`
|
||
- **Note**: `fmt_x_unary_rc` would emit `cntlzd.` for Rc=1, but valid `cntlzd` encodings
|
||
always have Rc=0. Canary emits `cntlzd` always. No impact for valid code.
|
||
|
||
### PPCBUG-646 — `fmt_rlwimi` inslwi/insrwi priority overlap: confirmed correct (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: wontfix
|
||
- **Note**: After careful analysis, the `inslwi` guard excludes `insrwi` overlap cases
|
||
(`sh != 31u32.wrapping_sub(me)`). Priority is correct. Informational only.
|
||
|
||
### PPCBUG-647 — `fmt_rlwinm` `extrwi` uses `wrapping_sub` which can give misleading results for invalid encodings (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:1137`
|
||
- **Symptom**: `let b = sh.wrapping_sub(n) % 32;` — for invalid `sh < n` encodings,
|
||
`wrapping_sub` gives a large u32, `% 32` gives a confusing value. For all compiler-emitted
|
||
encodings `sh >= n` holds. Add `&& sh >= 32 - mb` to the guard to avoid the fallthrough.
|
||
|
||
### PPCBUG-648 — `fmt_mftb` TBR=268: ext mnemonic identical to base mnemonic (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:1443`
|
||
- **Symptom**: `268 => with_ext("mftb", base_ops, 8, "mftb", gpr(rd), 8)` — base is `mftb`,
|
||
extended is also `mftb`. `display()` picks the extended form (omitting the `268` operand),
|
||
making it ambiguous vs. `mftbu`. Consider: either emit base-only (`mftb r3, 268`) or rename
|
||
the base to `mftb.raw` for disambiguation.
|
||
|
||
### PPCBUG-649 — Golden fixture for `lwsync` pins wrong output (no ext_mnemonic) (LOW)
|
||
|
||
- **Severity**: LOW (test coverage gap)
|
||
- **Status**: applied (2be25bd, 2026-05-02)
|
||
- **Location**: `tests/golden/extended_mnemonics.json`, entry "lwsync"
|
||
- **Symptom**: Fixture has `mnemonic: "sync"` and no `ext_mnemonic`. After PPCBUG-088/641
|
||
fix, expected output is `mnemonic: "sync"`, `ext_mnemonic: "lwsync"`. Current fixture
|
||
defeats regression detection — the test passes with wrong output.
|
||
|
||
### PPCBUG-650 — Golden fixtures for `bdnz`/`bdz` pin wrong extended mnemonic (LOW)
|
||
|
||
- **Severity**: LOW (companion to PPCBUG-640)
|
||
- **Status**: applied (d4f6ea7, 2026-05-02)
|
||
- **Location**: `tests/golden/extended_mnemonics.json`, rows "bdnz 0x82000040" and "bdz 0x82000040"
|
||
- **Symptom**: Both rows have `ext_mnemonic: "bdnzge"` and `ext_mnemonic: "bdzge"`.
|
||
After PPCBUG-640 fix, correct values are `"bdnz"` and `"bdz"`.
|
||
|
||
### PPCBUG-651 — `fmt_vmx128_pack_d3d` shared by `vpkd3d128` and `vrlimi128`: confirmed correct (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: wontfix
|
||
- **Note**: Both opcodes use VX128_4 form. Shared formatter outputs identical operand lists
|
||
(`vd, vb, imm, z`) which is correct for both. Informational only.
|
||
|
||
### PPCBUG-652 — Zero golden fixtures for any VMX128 opcode disassembly (LOW)
|
||
|
||
- **Severity**: LOW (test coverage gap)
|
||
- **Status**: open
|
||
- **Location**: `tests/golden/` — all three JSON files
|
||
- **Symptom**: No fixture pins the formatted output of any VMX128 instruction. Regressions
|
||
in VMX128 field extraction (e.g. a re-introduction of PPCBUG-360/361/362 in the disassembler)
|
||
would be invisible. Recommend adding at minimum: `vaddfp128`, `vperm128`, `vsldoi128`,
|
||
`vpkd3d128`, `vcmpeqfp128.`, `vmaddfp128`.
|
||
|
||
### PPCBUG-653 — `fmt_trap_imm` unconditional trap extended form: confirmed not-a-bug (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: wontfix
|
||
- **Note**: `twi 31, rA, IMM` (to=31) has no ISA simplified mnemonic unless RA=0 and IMM=0
|
||
(which matches `tw 31, r0, r0 = trap`). The `fmt_trap_imm` correctly emits base-only for
|
||
`twi 31, rA, N`. Informational.
|
||
|
||
### PPCBUG-654 — `fmt_rldimi` `insrdi` guard excludes valid `mb=0` (b=0) case (LOW)
|
||
|
||
- **Severity**: LOW
|
||
- **Status**: open
|
||
- **Location**: `disasm.rs:1220`
|
||
- **Symptom**: Guard `if mb > 0` excludes `insrdi rA, rS, n, 0` (b=0 → mb=0). A valid
|
||
compiler-emitted `rldimi` with sh+mb+n=64 and mb=0 falls through to base form instead of
|
||
displaying the `insrdi` simplified mnemonic.
|
||
- **Fix**: Remove the `mb > 0` guard; the inner `n > 0` guard is sufficient to avoid
|
||
degenerate cases.
|
||
|
||
IDs PPCBUG-655 through PPCBUG-679 are unallocated — no further bugs found in Phase C3.
|
||
|
||
---
|
||
|
||
## Phase C4 — Post-merge audit corrections (2026-05-02)
|
||
|
||
### PPCBUG-700 — VMX128 register accessors disagreed with canary's bitfield layout (HIGH)
|
||
|
||
- **Severity**: HIGH (silent mis-decoding of any VMX128 instruction with a register >= 32)
|
||
- **Status**: applied
|
||
- **Locations**: `decoder.rs:138-160` (`va128`/`vb128`/`vd128`), `decoder.rs:80` (`vx128r_rc_bit`)
|
||
- **Discovery**: independent reviewer of the P3 phase merge, comparing our rust accessors
|
||
against canary's `FormatVX128`/`VX128_2`/`VX128_4`/`VX128_5`/`VX128_R` bitfield struct
|
||
in `xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663`.
|
||
- **Symptom**: this entry contradicts the audit's own line 2958 ("confirmed-clean")
|
||
assessment. The previous audit miscounted bit-field offsets — under x86_64 LSB-first
|
||
C++ bitfield packing, the canary fields land at:
|
||
- `VA128 = VA128l(5) | VA128h(1)<<5 | VA128H(1)<<6` = PPC[11-15] | PPC[26]<<5 | PPC[21]<<6 (3 fields, 7 bits)
|
||
- `VB128 = VB128l(5) | VB128h(2)<<5` = PPC[16-20] | PPC[30-31]<<5 (2 fields, 7 bits)
|
||
- `VD128 = VD128l(5) | VD128h(2)<<5` = PPC[6-10] | PPC[28-29]<<5 (2 fields, 7 bits)
|
||
- `Rc` (VX128_R only) = PPC[25] (host bit 6) — not PPC[27] as PPCBUG-422/562 prescribed.
|
||
Rust code instead used va128: PPC[11-15] | PPC[29]<<5 (one bit, wrong position); vb128:
|
||
PPC[16-20] | PPC[28]<<5 | PPC[30]<<6 (wrong positions); vd128: PPC[6-10] | PPC[21]<<5 |
|
||
PPC[22]<<6 (wrong positions); vx128r_rc_bit at PPC[27].
|
||
- **Why it lurked**: the buggy convention was internally consistent with hand-crafted
|
||
test fixtures (which set bit 29 / 21 / 22 to encode "high" registers, matching the
|
||
buggy accessor). Real Xbox 360 game code follows canary's convention, so any production
|
||
encoding with VR >= 32 was silently mis-decoded — but no unit test exercised that path.
|
||
- **Fix**: rewrite the four accessors to canary's bit positions; rewrite the
|
||
`vmx128_test_word` helper and unit tests; re-encode the goldens for vmaddfp128/
|
||
vmaddcfp128/vnmsubfp128/vperm128/vsrw128/vpermwi128/vrlimi128. Drop the speculative
|
||
`key4_dt` dot-form dispatch in `decode_op6` (canary has no separate dot-form opcodes
|
||
for VX128_R compute ops; Rc is a runtime modifier). Update `encode_vpkd3d128` test
|
||
helper for canary's VD128h placement.
|
||
- **Cross-reference**: invalidates the audit's confirmed-clean note at line 2958.
|
||
Subsumes the partial fix-shape proposed in PPCBUG-422 (Rc-bit position).
|
||
|
||
---
|
||
|
||
# May 2026 Comprehensive Audit (extends prior PPCBUG namespace)
|
||
|
||
**Started**: 2026-05-02. **Charter**: [audit-2026-05-charter.md](audit-2026-05-charter.md).
|
||
**Severity**: P0 blocker / P1 wrong-result / P2 spec drift / P3 cosmetic.
|
||
|
||
## ORACBUG (M01 — oracles and goldens)
|
||
|
||
Per-milestone report: [audit-out/m01-oracles.md](audit-out/m01-oracles.md).
|
||
|
||
### ORACBUG-001 — base_mnemonics.json self-derived circular
|
||
- **Severity**: P1
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-cpu/tests/disasm_goldens.rs:70-88 (`build_rows`); fixture crates/xenia-cpu/tests/golden/base_mnemonics.json
|
||
- **Symptom**: every "expected" mnemonic/operands/etc. is captured from `xenia_cpu::disasm::format()` at golden-creation time and frozen. The frozen JSON is asserted against future runs of the same function. Detects regression-from-snapshot, not absolute correctness. Human-readable `label` field is never asserted.
|
||
- **Recommendation**: add canary-disasm differential (see M02) and POWERISA-derived parallel oracle for ~20 representative cases.
|
||
|
||
### ORACBUG-002 — extended_mnemonics.json self-derived circular
|
||
- **Severity**: P1
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-cpu/tests/golden/extended_mnemonics.json (623 rows)
|
||
- **Symptom**: same as ORACBUG-001, with extra risk: extended mnemonic emission is decision-tree output (li, lis, mr, not, slwi, srwi, clrldi, blr, bctr, beq/bne, lwsync, …). A bug in the canonicalization decision tree is not caught.
|
||
|
||
### ORACBUG-003 — vmx128_registers.json self-derived + hand-coded raw bytes
|
||
- **Severity**: P1
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-cpu/tests/disasm_goldens.rs:421-527
|
||
- **Symptom**: same circularity, plus 4-operand multiply-add cases (lines 513-519) bypass encoding helpers and use HARD-CODED u32 literals (0x146328F0, 0x14632930, 0x14632970). PPCBUG-700 demonstrated this risk: the prior buggy convention was internally self-consistent in fixtures and lurked until a manual canary cross-check.
|
||
|
||
### ORACBUG-004 — sylpheed_n2m.json structurally insufficient
|
||
- **Severity**: P0
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-app/tests/golden/sylpheed_n2m.json
|
||
- **Symptom**: at -n 2M instructions all rendering metrics are 0 (packets/draws/swaps/resolves/render-targets/textures). Sylpheed's first VdSwap fires at ~18M cycles. The golden cannot detect 11 of 14 digest fields by construction.
|
||
- **Risk**: this is the only end-to-end Sylpheed regression catcher in the workspace. Future fixes optimized to pass this gate are optimized against a blind oracle.
|
||
- **Recommendation**: add `sylpheed_n50m.json` (CI-feasible, captures VdSwap=1) and `sylpheed_n4b.json` (matches canonical reference invocation; commit-time gate).
|
||
|
||
### ORACBUG-005 — db_schema_golden.rs synthetic PE missing direct-branch coverage
|
||
- **Severity**: P3
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-analysis/tests/db_schema_golden.rs:23-53
|
||
- **Symptom**: the synthetic PE has 4 instructions (mflr/nop/blr/nop). Direct-branch path of the DB writer (target_hex column population) is never exercised; only the indirect-only path is. Schema columns are correctly locked but coverage is thin.
|
||
|
||
### ORACBUG-006 — RunDigest missing high-leverage fields
|
||
- **Severity**: P2
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-app/src/main.rs:1267-1306 (RunDigest struct + capture)
|
||
- **Symptom**: digest exposes 14 fields, missing several high-signal counters that already exist in the system: unique_pcs_executed, kernel_calls_per_export histogram, mmio_reads/writes, scheduler.deadlock_recoveries, scheduler.deadlock_halts, events_signaled, events_waited, events_with_zero_signals, lwarx_count, stwcx_success_count, stwcx_fail_count.
|
||
- **Risk**: M11's run-matrix can only diff coarse counters. Several "is the renderer chain alive?" probes are not captured.
|
||
|
||
### ORACBUG-007 — analysis-shim parity test inherits CIRCULAR provenance
|
||
- **Severity**: P2
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-analysis/tests/disasm_goldens.rs:50-89 (check_fixture)
|
||
- **Symptom**: test does (a) shim-vs-cpu parity (good — catches drift) and (b) cpu-vs-fixture (inherits circularity from ORACBUG-001/002/003). The primary purpose (parity) is sound; only the secondary assertion is suspect.
|
||
|
||
### ORACBUG-008 — encode_vx128 helper lacks canary citation
|
||
- **Severity**: P3
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-cpu/tests/disasm_goldens.rs:53-68
|
||
- **Symptom**: the encode helper currently encodes per canary's VX128 layout (post-PPCBUG-700) but lacks a comment block citing canary's `xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663`. A future "simplification" without canary cross-check could silently regress to the prior buggy convention.
|
||
|
||
## PPCBUG (M05 — scheduler + reservation + block_cache)
|
||
|
||
### PPCBUG-701 — Reservation generation 24-bit ring: false-match risk under long-delay paths (P3, latent)
|
||
- **Severity**: P3
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-cpu/src/reservation.rs:67-83 (pack), :188-191 (next_gen mask)
|
||
- **Symptom**: `next_gen` is masked to 24 bits when packed (`& 0xFF_FFFF`). After 16,777,216 reservations, the generation wraps. If thread A's `lwarx` and its paired `stwcx.` are separated by ≥16M peer reservations on the same bank slot, and the bank still holds A's `(line, gen)` at commit time, `try_commit` will incorrectly succeed.
|
||
- **Risk**: very low under realistic workloads (reservation count between an lwarx-stwcx pair is typically <100, and same-bank displacement bumps `gen` regardless). Not observable on Sylpheed.
|
||
- **Recommendation**: defer until empirical evidence shows wraparound. If pursued, widen `gen` to 32 bits by stealing the line-address-low bits (low 7 bits of line are always zero — recoverable via masking).
|
||
- **Canary**: canary's bitmap model has the equivalent bit-aliasing risk at `RESERVE_BLOCK_SHIFT` granularity but no time-domain wrap.
|
||
|
||
### PPCBUG-702 — `invalidate_for_write` doc says collisions invalidate; code says they don't (P3, doc drift)
|
||
- **Severity**: P3
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-cpu/src/reservation.rs:38-46 (doc) and :235-256 (code)
|
||
- **Symptom**: the file-level doc invariant 2 says "any plain store to a reserved line invalidates it (slot CASed to zero). Hash-collision side-effect: a store to a different line that maps to the same bank also invalidates" — but the actual code at :248-256 explicitly returns early when `bank_line != line`, leaving the reservation alone. The code is more correct (fewer spurious failures), but the doc contradicts it.
|
||
- **Recommendation**: update the file doc to describe the "tag-checked invalidation" actually implemented. No code change needed.
|
||
|
||
### PPCBUG-703 — `--parallel` is non-deterministic; `XENIA_SCHED_SEED` does not regulate it (P3, doc gap)
|
||
- **Severity**: P3
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-cpu/src/scheduler.rs:232-249, :710-734; crates/xenia-app/src/main.rs:2230-2415
|
||
- **Symptom**: `--parallel` workers race for the kernel mutex within each round; observable interleavings depend on host OS scheduling, not on `XENIA_SCHED_SEED`. The seed regulates ONLY the per-round slot-list shuffle, which has no effect under `--parallel` since workers race for the lock independently. Same-seed-same-input runs under `--parallel` produce different observable schedules.
|
||
- **Risk**: M11's bisection cannot reliably reproduce an observed regression under `--parallel`; lockstep must be used for bisection.
|
||
- **Recommendation**: document the determinism boundary in CLI help text. If true determinism is needed under `--parallel`, the kernel-mutex acquisition order must be re-introduced as a coordinator-driven sequence (a regression of the M3 perf goal).
|
||
|
||
### PPCBUG-704 — `icbi` is a no-op; correctness depends on `bump_page_version` from data-store path (P3, latent)
|
||
- **Severity**: P3
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-cpu/src/interpreter.rs:1697-1701; crates/xenia-cpu/src/block_cache.rs:142-178
|
||
- **Symptom**: `icbi` (instruction cache block invalidate) is collapsed into the cache/sync no-op arm. Self-modifying code is currently caught only because every `write_u8/16/32/64` in `xenia-memory/src/heap.rs` unconditionally calls `bump_page_version`. If a future optimization makes `bump_page_version` conditional (e.g., distinguish data vs code pages, or skip bumping for non-instruction-page writes), `icbi` will need to actively bump the cache line.
|
||
- **Risk**: latent; no current SMC failure observed.
|
||
- **Recommendation**: add a comment in the cache/sync arm pointing at the implicit invariant: "icbi is correct because every store bumps page_version; if that changes, icbi must bump explicitly". Cross-references M06 memory invariants.
|
||
|
||
### PPCBUG-705 — Phaser `phase: AtomicU32` wrap at 4 B rounds (P3, latent)
|
||
- **Severity**: P3
|
||
- **Status**: open
|
||
- **Location**: crates/xenia-cpu/src/phaser.rs:64, :128, :172
|
||
- **Symptom**: `phase` is `AtomicU32` and `fetch_add(1, Release)`. After 4,294,967,296 rounds the counter wraps. Wait-loop predicate `phase != pre_phase` is false at exact wraparound on a stalled arriver — appears as a missed wake at exact 2^32 round count.
|
||
- **Risk**: at xenia-rs's actual round rate (~10^4 rounds/sec) this requires ~5 days of continuous runtime. Not realistic.
|
||
- **Recommendation**: widen to `AtomicU64` next time the phaser API is touched. No urgency.
|
||
|
||
|
||
## PPCBUG (M02 — decoder/disasm)
|
||
|
||
- **PPCBUG-706** — Tracker drift; PPCBUG-088/641 (sync/lwsync) shown as `open` but disasm fix at `crates/xenia-cpu/src/disasm.rs:364-372` is already applied. P3 (tracker hygiene). Recommendation: flip both to `applied`. See `audit-out/m02-decoder-disasm.md`.
|
||
- **PPCBUG-707** — Disasm column-pad width inconsistent across opcode families (8/9/10/11/12/14) and divergent from canary's single `kNamePad=11` (`xenia-canary/src/xenia/cpu/ppc/ppc_opcode_disasm.h:22`). P3 cosmetic; ~150 call sites in `disasm.rs`. Affects every textual diff with canary. See `audit-out/m02-decoder-disasm.md`.
|
||
- **PPCBUG-708** — `fmt_bc`/`fmt_bclr`/`fmt_bcctr` base form uses CR-bit names (`crb()`) for BI; canary emits raw BI integer (`ppc_opcode_disasm_gen.cc:158-186`). Extended forms unaffected. P3 cosmetic; 3 lines to change. See `audit-out/m02-decoder-disasm.md`.
|
||
- **PPCBUG-709** — `mfspr`/`mtspr`/`mftb` base form emits symbolic SPR name (`LR`/`CTR`); canary emits raw SPR integer (`ppc_opcode_disasm_gen.cc:1601-1602`). Extended forms (`mflr`/`mtctr`/etc.) unaffected. P3 cosmetic. See `audit-out/m02-decoder-disasm.md`.
|
||
- **PPCBUG-710** — `decoder.rs:79` has a stale doc-comment claiming `vx128r_rc_bit` reads PPC bit 27 (host bit 4); the immediately following line 80-82 correctly says PPC bit 25 (host bit 6). Code is correct; comment contradicts itself. P3 doc hazard. Recommendation: delete line 79.
|
||
- **PPCBUG-711** — `decoder.rs:183-199` (`extract_vx128_uimm5`) has a 17-line doc comment narrating the pre-PPCBUG-700 buggy convention; references "First-Pixels M3" without citing the PPCBUG IDs. P3 cleanup. Recommendation: trim to 3-4 lines, move history to `audit-findings.md`.
|
||
|
||
## PPCBUG (M04 — FPSCR + VMX)
|
||
|
||
- **PPCBUG-712** — `crates/xenia-cpu/src/overflow.rs:29-102`: 64-bit overflow helpers (`add_ov_64`, `sub_ov_64`, `adde_ov_64`, `sum_overflow_64`, `neg_ov_64`) are dead code; interpreter inlines all 32-bit i128 overflow checks for the 32-bit ABI. P3 cosmetic. See `audit-out/m04-fpscr-vmx.md`.
|
||
- **PPCBUG-713** — `crates/xenia-cpu/src/interpreter.rs:3848-3852` (`vcmpbfp`/`vcmpbfp128`): CR6.LT never set when all lanes are out-of-bounds. Canary's `f.UpdateCR6(f.Or(gt, lt))` (`ppc_emit_altivec.cc:579`) sets LT = all-true(out-mask). xenia-rs hardcodes `lt: false`. P2; coupled with PPCBUG-421 (Rc-bit position) — both must land together. See `audit-out/m04-fpscr-vmx.md`.
|
||
- **PPCBUG-714** — `crates/xenia-cpu/src/{fpscr.rs,interpreter.rs}`: `VXSOFT` constant defined (`fpscr.rs:51`) but no setter anywhere. Software-triggered only via mtfsf paths, which were not verified to honour the bit. P3. See `audit-out/m04-fpscr-vmx.md`.
|
||
- **PPCBUG-715** — `crates/xenia-cpu/src/interpreter.rs:2681,2694,2736,2750`: `fmsubx`/`fmsubsx`/`fnmsubx`/`fnmsubsx` compute `a.mul_add(c, -b)`. Rust's unary `-` flips the sign bit of a NaN `b`, corrupting NaN-payload propagation. Distinct from PPCBUG-205 which fixed the *output* negation; this is the *input* negation. P2; recommendation: replace `-b` with `if b.is_nan() { b } else { -b }`. See `audit-out/m04-fpscr-vmx.md`.
|
||
- **PPCBUG-716** — `crates/xenia-cpu/src/fpscr.rs:320-325` (`update_cr1`): maps FPSCR[FX]→CR1.lt, [FEX]→CR1.gt, [VX]→CR1.eq, [OX]→CR1.so. Logic matches canary `CopyFPSCRToCR1` (`ppc_hir_builder.cc:491-501`), but reuse of generic CrField field names without a comment block tying fx→lt invites future confusion. P3 docs. See `audit-out/m04-fpscr-vmx.md`.
|
||
|
||
## PPCBUG (M03 — interpreter)
|
||
|
||
- **PPCBUG-720** — `interpreter.rs:118` `addi` truncates result to 32 bits (`as u32 as u64`); canary `ppc_emit_alu.cc:103-115` does full 64-bit add. Charter only documents `addis` truncation, not `addi`. P1. [REGRESSION-CANDIDATE] See `audit-out/m03-interpreter.md`.
|
||
- **PPCBUG-721** — `interpreter.rs:138-152` `addic`/`addicx` operate on 32-bit narrowed operands; CA from `result32 < ra32`. Canary `ppc_emit_alu.cc:117-135` is fully 64-bit via `AddDidCarry`. P1. [REGRESSION-CANDIDATE]
|
||
- **PPCBUG-722** — `interpreter.rs:155-163` `subfic` 32-bit-only; canary `ppc_emit_alu.cc:459-466` is 64-bit. P1. [REGRESSION-CANDIDATE]
|
||
- **PPCBUG-723** — `interpreter.rs:165-172` `mulli` casts product `as u32` discarding bits [32:63]; canary uses 64-bit signed multiply (low 64 of 128-bit product per ISA). P2.
|
||
- **PPCBUG-724** — `interpreter.rs:1244,4594` `stwcx`/`stdcx` width-discriminator (`reservation_width == 4/8`) is stricter than canary (`ppc_emit_memory.cc:868-908` no width check) and stricter than PowerISA. Reopen of PPCBUG-151. P0. [REGRESSION-CANDIDATE — STRONG] Bisect around `a107ac9`.
|
||
- **PPCBUG-725** — `interpreter.rs:1665` `mtmsrd` L=1 mask is `EE | RI` (0x8001); canary `ppc_emit_control.cc:828-837` uses `EE` only (0x8000). P2.
|
||
- **PPCBUG-726** — `interpreter.rs:737-748` `rlwimix` zeroes RA[0:31] via `as u32 ... as u64`; canary `ppc_emit_alu.cc:1010-1033` preserves RA[0:31] via 64-bit OR with `MASK(MB+32, ME+32)`. P2.
|
||
- **PPCBUG-727** — `interpreter.rs:2901,2922` `fctidx`/`fctidzx` overflow boundary `val >= (i64::MAX as f64)` mis-flags (2^63 - 1024, 2^63) as overflow due to f64 precision (i64::MAX rounds up to 2^63 in f64). P3.
|
||
- **PPCBUG-728** — `interpreter.rs:1705-1724` `dcbz`/`dcbz128` only call `invalidate_for_write(ea)` once. Confirmed sufficient (32B fits in 128B line; dcbz128 IS a 128B line). WONTFIX, informational guard for future widening.
|
||
- **PPCBUG-729** — `interpreter.rs:1117,1124,1130` `lwa`/`lwax`/`lwaux` correctly sign-extend per hotfix `f1166d0`. CLEARED, verification only.
|
||
- **PPCBUG-730** — Reservation granule is 128 bytes (Xenon-correct) vs canary's byte-granular `real_addr(EA)`. Documented, recommendation: append to charter §"Known Intentional Divergences from Canary". P3 informational.
|
||
- **PPCBUG-731** — `interpreter.rs:908-938` `bcx` LR write timing in both AA paths. Confirmed equivalent to canary. P3 informational.
|
||
- **PPCBUG-732** — `interpreter.rs:962-981` `bcctrx` correctly omits CTR decrement (CTR is target). Confirmed equivalent to canary. P3 informational.
|
||
- **PPCBUG-733** — `interpreter.rs:1610` `mtspr CTR` truncates input to 32 bits (`val as u32 as u64`); `mfspr CTR` returns 64-bit. Canary `ppc_emit_control.cc:792` stores full 64-bit. PowerISA: CTR is 64-bit SPR. P2.
|
||
- **PPCBUG-734** — `interpreter.rs:2980-3040` `fcmpu`/`fcmpo` correctly distinguish ordered/unordered VXSNAN/VXVC. Canary `ppc_emit_fpu.cc:329-367` has bug — `bool ordered` parameter never read. P3 (Rust is more correct); recommend appending to charter §"Known Intentional Divergences from Canary".
|
||
- **PPCBUG-735** — `interpreter.rs:441,450,459,476,493,617,681,689,706,720,769,779,789,799,809,819` 64-bit Rc-form ALU ops (`mulld.`, `mulhd.`, `mulhdu.`, `divd.`, `divdu.`, `cntlzd.`, `sld.`, `srd.`, `srad.`, `sradi.`, `rldicl.`, `rldicr.`, `rldic.`, `rldimi.`, `rldcl.`, `rldcr.`) call `update_cr_signed(0, x as i64)` — full 64-bit signed view; canary `ppc_hir_builder.cc:397-421` `UpdateCR(n, v)` does `Truncate(v, INT32_TYPE)` first — always 32-bit. CR0 disagrees with canary on values that change sign between i32 and i64 view. P1. [REGRESSION-CANDIDATE — STRONG]
|
||
|
||
|
||
## MEMBUG (M06 — memory subsystem)
|
||
|
||
**Headline**: write-visibility verdict = **NOT broken at the memory layer** (same-thread store/load is mechanically sound; BST paradox cause is upstream — see M03 candidates). 9 findings; 1 P1, 4 P2, 4 P3. See `audit-out/m06-memory.md`.
|
||
|
||
- **MEMBUG-001** — `crates/xenia-memory/src/heap.rs:155-171` `bump_page_version` Release fence on `page_versions[idx]` correctly publishes the prior data store on x86_64 (TSO) and on weaker hosts via Release-store ordering. Doc-only risk: any future code that publishes via `page_versions` without first executing the data store *and* the Release-store inside `bump_page_version` would silently lose the visibility edge. P2 docs.
|
||
- **MEMBUG-002** — `crates/xenia-memory/src/heap.rs:8` hardcodes `PAGE_SIZE = 4096` for the entire 4 GB. Canary uses 4K/64K/16MB across 9 distinct heaps (`memory.cc:222-242`). Consequence: `PageEntry::region_page_count` is in 4K units rather than heap-native units — guest queries that walk `region_page_count * page_size` overshoot for 64K-heap-allocated regions. Latent. P2.
|
||
- **MEMBUG-003** — `crates/xenia-memory/src/heap.rs:184-202` no physical-address aliasing across `0xA0000000`/`0xC0000000`/`0xE0000000`. Canary maps all three onto the same physical-membase view (`memory.cc:235-242`). A guest CPU write to one alias is invisible at another. Risk: `MmGetPhysicalAddress`-shape round-trips and DMA-buffer aliasing return stale bytes. **P1**, latent.
|
||
- **MEMBUG-004** — `crates/xenia-memory/src/heap.rs` `is_mapped` accepts addresses in `0xFFD00000-0xFFFFFFFF`; canary `LookupHeap` (`memory.cc:434`) returns null. Latent — corrupt high-byte pointers don't fault. P2.
|
||
- **MEMBUG-005** — `crates/xenia-memory/src/platform.rs:31` always commits with `PROT_READ | PROT_WRITE`; xenia-rs cannot fault on writes to guest-read-only-protected pages. Matches canary's `emit_inline_mmio_checks` mode (no host-level protect enforcement). P3 informational.
|
||
- **MEMBUG-006** — `crates/xenia-gpu/src/mmio_region.rs:62-67,108-115` unmapped GPU MMIO reads/writes log at `tracing::trace!`; should be `warn` (rate-limited per `(reg_index, kind)` pair) so renderer-divergence first-line observability doesn't require enabling trace globally. P2.
|
||
- **MEMBUG-007** — `crates/xenia-memory/src/heap.rs:434-436,450-452,467-469` cross-page `bump_page_version` guard verified correct for all access widths. P3 informational.
|
||
- **MEMBUG-008** — `icbi`-correctness invariant (cross-references PPCBUG-704): every data store must `bump_page_version`. If any future perf optimization makes that conditional, `icbi` (currently no-op) must be made explicit. P3 documentation.
|
||
- **MEMBUG-009** — Static analysis: 29 distinct callers of `sub_82454770` (intrusive-list-merge validator); only the BST registration through `sub_82175E68 → sub_82175F10` trips the throw. Confirms the renderer-blocker is NOT a memory-layer issue — every list-merge operation would fail uniformly if it were. P3 informational, supports M06 verdict.
|
||
|
||
## XAMBUG (M08 — XAM)
|
||
|
||
- **XAMBUG-001** — `crates/xenia-kernel/src/xam.rs:204-208` `xam_task_schedule` allocates a handle and returns 0 without ever invoking the callback. Canary `xam_task.cc:43-81` spawns an `XThread` that runs the callback (which typically signals `XTASK_MESSAGE.event_handle`). Sylpheed callsite confirmed at thunk `0x8284dafc` ← `sub_824a9710` (`0x824a9a10`). Likely cause of one or more parked-waiter handles in M10. P0 candidate.
|
||
- **XAMBUG-002** — `crates/xenia-kernel/src/xam.rs` async XAM exports (`XamContentCreate`, `XamContentClose`, `XamContentDelete`, `XamContentCreateEnumerator`, `XamContentSetThumbnail`, `XamContentGetCreator`, `XamShowKeyboardUI`, `XamShowDeviceSelectorUI`, `XamShowMessageBoxUIEx`, `XamShowGamerCardUIForXUID`, `XamEnumerate`, `XMsgStartIORequest`, `XMsgStartIORequestEx`) are all `stub_success` and never touch `overlapped_ptr`. Canary completes the overlapped via `CompleteOverlappedImmediate` / `CompleteOverlappedDeferredEx` and returns `X_ERROR_IO_PENDING` (`xam_content.cc:418-422`, `xam_msg.cc:64-67`, `xam_ui.cc:382-389`). Any wait on the overlapped event hangs forever. P0 candidate.
|
||
- **XAMBUG-003** — `crates/xenia-kernel/src/xam.rs:45` `XamUserGetSigninState` is `stub_return_zero` (always 0 = "not signed in"). Canary `xam_user.cc:90-104` returns `signin_state` (typically 1 = signed-in offline) when a profile exists. Sylpheed callsite confirmed at thunk `0x8284db3c` ← `sub_824a9c90`. Boot guard `bl XamUserGetSigninState; cmpwi r3,0; beq <bail>` would force the bail branch. P1, possibly P0.
|
||
- **XAMBUG-004** — `crates/xenia-kernel/src/xam.rs:232-239` `xam_user_get_xuid` returns `0` (success) with xuid=0. Canary `xam_user.cc:30-67` returns `X_E_NO_SUCH_USER` when the user isn't signed in. P1.
|
||
- **XAMBUG-005** — `crates/xenia-kernel/src/xam.rs:241-248` `xam_user_get_name` returns 0 (success) with empty buffer. Canary `xam_user.cc:137-164` returns `X_ERROR_NO_SUCH_USER` when the user isn't signed in. P1.
|
||
- **XAMBUG-006** — `crates/xenia-kernel/src/xam.rs:192-200` `XamLoaderLaunchTitle`/`XamLoaderTerminateTitle` return normally with `gpr[3]=0`. Canary `xam_info.cc:380-432` explicitly does not return — calls `kernel_state()->TerminateTitle()`. Sylpheed has 2 callsites for `XamLoaderTerminateTitle`. Returning normally allows the title to keep executing past a fatal-exit path. P1.
|
||
- **XAMBUG-007** — `crates/xenia-kernel/src/xam.rs:257-273` `xam_get_execution_id` heap-allocates a 24-byte struct on every call and writes hardcoded `title_id=0x535107D4`, `media_id=0x2D2E2EEB`, `version=0`, `base_version=0`, `disc=1/1`. Canary `xam_info.cc:321-336` writes the *guest pointer to the existing XEX `EXECUTION_INFO` opt-header*. Hardcoded bytes diverge from real header for `version`/`base_version`; per-call leaks. P1.
|
||
- **XAMBUG-008** — `crates/xenia-kernel/src/xam.rs:212-228` `xam_alloc` ignores `flags`. Canary `xam_info.cc:434-455` notes `0x00100000` controls zero-fill; canary always uses `SystemHeapAlloc` which zero-fills. Severity depends on whether xenia-rs's `state.heap_alloc` zero-fills: P1 if not, P2 if yes.
|
||
- **XAMBUG-009** — `crates/xenia-kernel/src/xam.rs:73-74` `XamUserCreateAchievementEnumerator` and `XamUserCreateStatsEnumerator` are `stub_success` and don't fill `*handle_ptr`. Canary `xam_user.cc:580-647` and `:1025-1059` create real `XEnumerator` objects. Game reads stale memory as the handle; subsequent `XamEnumerate` returns `0x12` only by happy coincidence. P2.
|
||
- **XAMBUG-010** — `crates/xenia-kernel/src/xam.rs:77-82` UI dialog exports (`XamShowSigninUI`, `XamShowKeyboardUI`, `XamShowDeviceSelectorUI`, `XamShowGamerCardUIForXUID`, `XamShowDirtyDiscErrorUI`, `XamShowMessageBoxUIEx`) are all `stub_success` and never write `result_ptr->ButtonPressed`. Canary fills the result and completes overlapped (`xam_ui.cc:322-419`). Game reads stale ButtonPressed → may take wrong dialog branch. P2.
|
||
- **XAMBUG-011** — `crates/xenia-kernel/src/xam.rs:305-307` `XGetAVPack` returns `0x16` (=22), outside canary's documented range 0..8. Canary `xam_info.cc:35-46` defaults to `8` (HDMI). Comment in `xam_info.cc:248-251` warns games may PAL-check against `{3,4,6,8}` — `0x16` matches none. Recommend changing to `8`. P2.
|
||
- **XAMBUG-012** — `crates/xenia-kernel/src/xam.rs:50` `XamEnumerate` returns `0x12` (`ERROR_NO_MORE_FILES`). Canary `xam_enum.cc:25-32` returns `X_ERROR_INVALID_HANDLE` for unknown handle and `WriteItems` for valid ones. xenia-rs is "convenient happy path" only because XAMBUG-009 means no real handle exists. P2.
|
||
- **XAMBUG-013** — `crates/xenia-kernel/src/xam.rs:275-277` `XamGetSystemVersion` returns `0x20000000`. Canary `xam_info.cc:229-237` returns `0` with explicit "pretend old" comment; both arbitrary, both `kStub`. Could affect symbol-loading branches in title code. P3.
|
||
- **XAMBUG-014** — `crates/xenia-kernel/src/xam.rs:309-311` `XGetGameRegion` returns `0xFF` (8-bit). Canary `xam_info.cc:256-277` returns 16-bit values from a 109-entry country table (e.g. `0x0101` for Japan, `0xFFFF` for "all"). Sylpheed J probably masks fine but the value is structurally wrong. P3.
|
||
- **XAMBUG-015** — `crates/xenia-kernel/src/xam.rs:317-328` `XGetVideoMode` writes only 5 fields (20 bytes). Canary's `X_VIDEO_MODE` struct is larger; trailing fields left with stale stack data on the guest side. P3.
|
||
- **XAMBUG-016** — `crates/xenia-kernel/src/xam.rs:142-162` (`xam_input_get_state`) only bumps `state.input_packet_number` when `gamepad_key != last_input_bytes`. Fake-pad steady state keys to 0; `packet_number` stays 0 forever. Games that detect "input never changed since startup" via packet_number monotonicity may misbehave. canary increments under similar conditions only on real change; spirit-match. Sylpheed unaffected at boot. P3.
|
||
|
||
## KRNBUG (M07 — kernel HLE)
|
||
|
||
Per-milestone consolidated report: [audit-out/m07-kernel-hle.md](audit-out/m07-kernel-hle.md). Sub-reports under `audit-out/m07{a,b,c,d}-*.md` retain local sub-prefixes; master IDs unified below.
|
||
|
||
### Headline P0 / P1
|
||
|
||
- **KRNBUG-017 (P0 under `--parallel`)** — Kf-spinlock no-op (KfAcquireSpinLock/Release, KeRaiseIrql, KeLowerIrql). Lockstep tolerates this; `--parallel` allows concurrent guest CS entry → state corruption invisible to existing tests. M07a, exports.rs.
|
||
- **KRNBUG-Vd-04 (P0)** — VdSwap bypasses PM4 ring; canary writes Type-0 fetch-constant patch + PM4_XE_SWAP into reserved slot, ours fills NOPs and calls `state.gpu.notify_xe_swap` directly. Most plausible cause of swaps=2→swaps=1 regression. M07c, exports.rs `vd_swap`.
|
||
- **KRNBUG-008 (P1)** — ExCreateThread ignores `xapi_thread_startup` parameter. Canary invokes the prologue callback before user entry; we skip it. M07b.
|
||
- **KRNBUG-011 (P1)** — ExCreateThread ignores creation_flags bit 0x80 (guest_object return). M07b.
|
||
- **KRNBUG-013 (P1)** — ExGetXConfigSetting `stub_success` writes nothing into output buffer; Sylpheed reads garbage stack memory during early boot. M07b.
|
||
- **KRNBUG-Mm cluster (P1)** — MmAllocatePhysicalMemoryEx ignores all attribute bits (protect, page_size, range, alignment, WC/NoCache). Pool family entirely unregistered. M07c.
|
||
- **KRNBUG-D08 (P1 candidate)** — VSYNC_INSTR_PERIOD = 150_000 calibrated for ~10 MIPS lockstep; under `--parallel` (~24× slower) drops to ~2.5 Hz wall. Plausible swap-regression contributor. M07d, interrupts.rs.
|
||
|
||
### Other P1 / P2 / P3
|
||
|
||
77 KRNBUG IDs total filed across M07a/b/c/d. Severity distribution: 3 P0, 11 P1, 28 P2, 35 P3.
|
||
|
||
Full list and rationale in sub-reports. M07-lead consolidation at `audit-out/m07-kernel-hle.md`. Highlights:
|
||
- **Nt/Ke/Kf**: KRNBUG-005 (NtAllocateVirtualMemory ignores flags), KRNBUG-008 sub-prefix-a (NtCreateFile drops desired_access/share/disposition), KRNBUG-014 (DPC family unimplemented).
|
||
- **Rtl/Ex**: 35+ canary-table Rtl* ordinals unregistered (KRNBUG-001 sub-prefix-b; needs trace-handles audit to triage), CS stale-owner override (KRNBUG-004 sub-prefix-b).
|
||
- **Ob/Mm/Vd**: ObReferenceObjectByName + ObOpenObjectByName + ObTranslateSymbolicLink unregistered, ExFreePool / MmFreePool entirely missing, VdGetCurrentDisplayInformation/VdQueryVideoFlags/VdInitializeScalerCommandBuffer/VdInitializeEngines all stubbed.
|
||
- **Xex/misc**: XexCheckExecutablePrivilege always 0, XexGetProcedureAddress ignores string-name path, sprintf/_vsnprintf produce empty buffers (KRNBUG-D12).
|
||
|
||
## XAMBUG (M08 — XAM)
|
||
|
||
Per-milestone report: [audit-out/m08-kernel-xam.md](audit-out/m08-kernel-xam.md). 16 XAMBUG IDs.
|
||
|
||
### XAMBUG-001 (P0 candidate) — XamTaskSchedule never invokes callback
|
||
|
||
**Location**: `crates/xenia-kernel/src/xam.rs:204-208`. Returns 0 without spawning the task. Canary spawns an `XThread` to run the callback; the callback typically signals an `XTASK_MESSAGE.event_handle`. **Strong candidate for one or several of the 4 parked-waiter handles** (0x1004, 0x100c, 0x15e4, 0x42450b5c). Sylpheed callsite confirmed at `sub_824a9710` / 0x824a9a10.
|
||
|
||
### XAMBUG-002 (P0 candidate) — 13 async XAM exports never complete overlapped
|
||
|
||
**Location**: xam.rs Content*, Show*UI, XMsgStartIORequest*, XamEnumerate. All `stub_success` and never call `CompleteOverlappedImmediate` / `Deferred` on `overlapped_ptr`. Any guest wait on the overlapped event hangs.
|
||
|
||
### XAMBUG-003 (P1, possibly P0) — XamUserGetSigninState returns 0
|
||
|
||
**Location**: xam.rs. xenia-rs returns 0; canary returns 1 (signed-in offline by default). Sylpheed boot guard would force the bail branch.
|
||
|
||
### Other 13 XAMBUG IDs
|
||
|
||
XAMBUG-004..016, mostly P2/P3 cosmetic. Highlight: XAMBUG-016 (P3) packet_number never increments in fake-pad steady state because key stays 0.
|
||
|
||
## MEMBUG (M06 — memory subsystem)
|
||
|
||
Per-milestone report: [audit-out/m06-memory.md](audit-out/m06-memory.md). 9 MEMBUG IDs (1 P1, 4 P2, 4 P3).
|
||
|
||
### Verdict: write-visibility NOT BROKEN
|
||
|
||
Same-thread store→load through `crates/xenia-memory/src/heap.rs` is mechanically sound. Both paths derive raw `*mut u8`/`*const u8` pointers from the same `membase` mapping; no per-thread cache, no write-back buffer, no block-cache layer that returns stale data bytes (block cache only caches *decoded instructions*, never data). The `bump_page_version` Release-store comes *after* the byte store and is a cross-thread visibility primitive; same-thread program order trivially observes the just-written byte.
|
||
|
||
**BST paradox** at `sub_82175E68 → sub_82175F10` is OPEN but not a memory bug. Both registrar and validator run on the same HW slot in the same scheduler round. Likely upstream causes: M03 PPCBUG-720..735 (interpreter 32/64-bit truncation bugs) corrupting the comparison feeding the validator, or constructor-side logic in `sub_821766A0`/`sub_825ED268`.
|
||
|
||
### MEMBUG-003 (P1) — physical-address aliasing across cached/write-combine ranges not implemented
|
||
|
||
**Location**: `crates/xenia-memory/src/heap.rs`. The 0xA000_0000 (write-back), 0xC000_0000 (write-combine), 0xE000_0000 (uncached) virtual ranges are all distinct mappings in xenia-rs but should alias the same physical memory. Latent risk for any DMA-buffer round-trip; not currently observed to break Sylpheed but is a correctness gap.
|
||
|
||
### Other MEMBUG IDs
|
||
|
||
MEMBUG-001..009. Highlights: MEMBUG-002 P2 (MMIO aperture single-bit-mask fast-path doesn't validate against region table on hit), MEMBUG-005 P2 (no protection-fault path; reads of unmapped memory return 0), MEMBUG-007 P3 (Be<T> serde missing round-trip test).
|
||
|
||
|
||
## GPUBUG (M09 — GPU pipeline)
|
||
|
||
Per-milestone consolidated report: [audit-out/m09-gpu.md](audit-out/m09-gpu.md). Sub-reports under `audit-out/m09{a,b,c}-*.md`. 33 IDs; severity: 6 P0, 12 P1, 8 P2, 7 P3.
|
||
|
||
### Headline P0
|
||
|
||
- **GPUBUG-001 (P0)** — VdSwap kernel-bypass: `vd_swap` zero-fills 64-dword reserved ring slot with NOPs and calls `state.gpu.notify_xe_swap` directly. Canary writes Type-0 fetch-constant patch + PM4_XE_SWAP into the slot and lets the CP consume it. PM4_XE_SWAP opcode handler at `gpu_system.rs:1232` is dead code at runtime. **Confirms KRNBUG-Vd-04. Most plausible cause of swaps=2→swaps=1 regression.**
|
||
- **GPUBUG-100 / shader-005 (P0)** — operand modifiers (swizzle/abs/neg) never read from word-1 in WGSL interpreter; every ALU instruction executes against unmodified operands.
|
||
- **GPUBUG-101 / shader-006 (P0)** — `c#` constant-register selector bit masked off; every shader reads `r[low7]` (temp) instead of constants. WVP matrix etc. never read.
|
||
- **GPUBUG-102 / shader-007 (P0)** — vertex fetch never applies GpuSwap endian; big-endian VBs decode as garbage on little-endian host.
|
||
- **GPUBUG-103/104/105 / draw-008/009/010 (P0)** — 8 of 26 draw_state register addresses misdecoded: VGT_DRAW_INITIATOR, VGT_DMA_BASE, VGT_DMA_SIZE, PA_SC_WINDOW_SCISSOR_TL/BR (reading SCREEN_SCISSOR), RB_COLOR_INFO_1/2/3, PA_SU_VTX_CNTL, index_size from bit 8 instead of bit 11.
|
||
|
||
### Headline P1
|
||
|
||
- **GPUBUG-006 (P1)** — `sync_with_mmio` Relaxed-load on WPTR; broken Release/Acquire pair; latent under `--parallel`.
|
||
- **GPUBUG-shader-002 (P1)** — D3D9 legacy `Inf*0=+0` not honored. Canary documents same divergence as causing white-screen in 4D5307E6.
|
||
- **GPUBUG-301 (P1)** — `read/write_sample_64bpp` doubles pitch but `surface_pitch_tiles()` already pre-doubles for 64bpp → quadruple stride for 64bpp resolves. Tests bypass `from_register_file` so don't catch this.
|
||
- **GPUBUG-304 (P1)** — `bind_primary_texture` hardcodes `version_when_uploaded: 0` so guest writes never invalidate uploaded textures.
|
||
- **GPUBUG-305 (P1)** — texture cache missing K1555, K24_8, K_8, K1010102, K10_11_11, `_AS_*` formats; bound to magenta stub.
|
||
- Plus 7 more P1 in shader/draw_state region (GPUBUG-106..112).
|
||
|
||
### Other P2/P3
|
||
|
||
15 IDs. Highlights: GPUBUG-002 (P2) PM4 type-3 coverage 35/47 not 47/47 as memory file claimed — missing COND_EXEC, WAIT_REG_EQ, WAIT_REG_GTE, EVENT_WRITE_CFL plausibly hit by Sylpheed; GPUBUG-302 (P2) RenderTargetKey::is_64bpp returns wrong format set; GPUBUG-303 (P2) CPU-side TextureCache::ensure_cached is dead code.
|
||
|
||
### Verdict
|
||
|
||
**Renderer-blocker explanation**: The GPU pipeline is structurally wrong at multiple stages (shader operand decode + constant selector + vertex endian + 8 register addresses + VdSwap bypass). `draws=0` and swap regression both fall out of this class of failure. Combined fix queue: GPUBUG-001 + GPUBUG-100..105 must land together — partial fixes likely won't unblock visible rendering.
|
||
|
||
|
||
## XMODBUG (M10 — cross-module seams)
|
||
|
||
Per-milestone consolidated report: [audit-out/m10-cross-module.md](audit-out/m10-cross-module.md). Sub-reports under `audit-out/m10-x{1..5}-*.md`. 22 IDs; severity: 1 P0, 6 P1, 5 P2, 10 P3.
|
||
|
||
### Headline P0
|
||
|
||
- **XMODBUG-013 (P0)** — Missing fetch-constant patch in VdSwap. Re-confirms KRNBUG-Vd-04 / GPUBUG-001 from the seam perspective. Frontbuffer slot 0 retains stale texture descriptor; Sylpheed bloom/blur path reads garbage. Strongest single P0 cause of swap regression.
|
||
|
||
### Headline P1
|
||
|
||
- **XMODBUG-001 (P1)** — `stwcx`/`stdcx` data write happens AFTER `try_commit` clears the slot. Race window: another HW thread can lwarx the cleared slot, read pre-write data, and commit. Latent under `--parallel`.
|
||
- **XMODBUG-002 (P1)** — `GuestMemory::write_bulk` (used by `NtReadFile` and XEX loader) skips both `bump_page_version` and reservation invalidation. Latent if any code-bearing memory is bulk-written.
|
||
- **XMODBUG-010 (P1)** — `CP_INT_STATUS` never produced from GPU side; only synthetic vsync interrupts ever reach the kernel. Real CP-side events (EOP, RSC, IB-end) missing.
|
||
- **XMODBUG-011 (P1)** — `VSYNC_INSTR_PERIOD` fragile proxy. Re-confirms KRNBUG-D08 from seam perspective.
|
||
- **XMODBUG-012 (P1)** — `notify_xe_swap` synthetic interrupts displace real CP interrupts in 4-deep queue.
|
||
|
||
### Other P2/P3
|
||
|
||
15 IDs. Notable:
|
||
- **XMODBUG-005 (P2)** — `nt_close` on a handle with parked waiters silently strands them.
|
||
- **XMODBUG-003 (P2)** — no MemoryBarrier around reserved ops; latent on non-x86 hosts.
|
||
- **XMODBUG-021 (P2)** — WaitAll partial-satisfaction false-wake (semantic gap, not a race).
|
||
- **XMODBUG-022 (P2)** — force-wake path doesn't scrub waiter lists like timed-wake path does.
|
||
|
||
### Verdict
|
||
|
||
The renderer plateau and swap regression are explained by a **multi-causal failure** at the GPU pipeline + kernel-↔-GPU seam. Combined fix queue: KRNBUG-Vd-04 / GPUBUG-001 / XMODBUG-013 (VdSwap rewrite to write real PM4 sequence) + GPUBUG-100..102 (shader operand decode + constant-register selector + vertex fetch endian) + GPUBUG-103..105 (8 register addresses) must land coherently to unblock visible rendering.
|
||
|
||
The 4 parked-waiter handles remain unexplained at this audit's depth. M11 follow-up should run the `--trace-handles` audit at -n 5B and pivot to PPC-level trace if no signal exists.
|
||
|
||
|
||
## SWAPBUG (M11 — swap-regression bisection)
|
||
|
||
Per-milestone report: [audit-out/m11-runs.md](audit-out/m11-runs.md).
|
||
|
||
### SWAPBUG-001 (P0) — PPCBUG-001 addi 32-bit truncation regresses swaps=2 → 1
|
||
|
||
- **Severity**: P0 — direct cause of the headline `swaps=2 → 1` regression that motivated this entire audit.
|
||
- **Status**: open (audit-only; fix decision left to follow-up).
|
||
- **Location**: `crates/xenia-cpu/src/interpreter.rs:114-118` — the single `as u32 as u64` cast at the end of the `addi` opcode arm.
|
||
- **Bisection trail**:
|
||
- Phase-level: pre-P1/P1/P2/P3 → swaps=2; **P4/d945aea** → swaps=1.
|
||
- Internal P4 commits: `145a7a4` → swaps=2; **`bf8208e`** ("PPCBUG-001/002/003/004/005/007 4b immediate ALU truncation") → swaps=1.
|
||
- Hunk-level (revert each PPCBUG individually within bf8208e): only **PPCBUG-001 revert restores swaps=2**. PPCBUG-002/003/004/005/007 reverts leave swaps=1.
|
||
- **Mechanism**: addi is the most common opcode (282k uses, 3.4% of all instructions in sylpheed.db). Adding `as u32 as u64` to its writeback truncates the upper 32 bits of the result. Sylpheed has at least one control-flow site that depends on the un-truncated 64-bit value.
|
||
- **Cross-references**: confirms M03 PPCBUG-720 prediction ("addi/addic/subfic truncate to 32 bits without canary parity"). The fix is canary-divergent — canary does NOT truncate addi.
|
||
- **Recommendation**: revert the addi truncation. Re-examine the test `addi_li_neg_one_zero_extends_upper` to assert canary semantics, not the over-truncated form. Independently re-examine the addis truncation (which IS deliberate per the addis fix memory file but may have its own broader implications).
|
||
|
||
### SWAPBUG-002 (P2) — PPCBUG-004 mulli truncation affects IRQ delivery anomalously
|
||
|
||
- **Severity**: P2 — anomalous side effect, not blocking.
|
||
- **Status**: open.
|
||
- **Location**: `crates/xenia-cpu/src/interpreter.rs` mulli arm (changed in `bf8208e`).
|
||
- **Symptom**: reverting mulli truncation alone (on top of bf8208e) drops interrupts_delivered from 629 to 101 at -n 100M lockstep. Swaps stays at 1. The OPPOSITE direction from SWAPBUG-001.
|
||
- **Mechanism (hypothesis)**: a mulli result is consumed by an instruction-count or frame-count computation that controls vsync injection target selection or some early-boot loop iteration count.
|
||
- **Recommendation**: no immediate action; investigate as part of M07d KRNBUG-D08 / XMODBUG-011 vsync-timing audit.
|
||
|
||
## ANLBUG (M11 — analysis crate)
|
||
|
||
### ANLBUG-001 (P2) — `xenia-rs dis` does not create SQL views by default
|
||
|
||
- **Severity**: P2 — feature mismatch between tests and CLI.
|
||
- **Status**: open.
|
||
- **Location**: `crates/xenia-app/src/main.rs:3189` — `w.create_sql_views()` is gated on `--analyze=Sql` or `--analyze=Both`. Default is `Rust`, which skips view creation.
|
||
- **Symptom**: regenerated `sylpheed.db` has none of the application views (`v_branch_xrefs`, `v_call_graph`, `v_function_first_instruction`, `v_imports_called`, `v_reachability_from_entry`). The schema-golden test creates them; the user-facing CLI does not.
|
||
- **Cross-reference**: ORACBUG-005 (M01) — schema test uses synthetic 4-instr PE; doesn't catch this gap.
|
||
- **Recommendation**: either always create views in `--db` mode, or document the requirement clearly in CLI help.
|
||
|
||
---
|
||
|
||
## Fix session 2026-05-03 — outcome
|
||
|
||
Single-session fix sprint executed against this audit's recommended
|
||
queue. 12 IDs closed across 11 commits + 9 merge commits on master.
|
||
Branch lineage: each phase a topic branch, merged with `--no-ff` to
|
||
preserve hunk-bisect lineage; all branches deleted post-merge.
|
||
|
||
| Phase | Commit | IDs closed | Severity | Notes |
|
||
|-------|--------|------------|----------|-------|
|
||
| A | `9ab986e` | SWAPBUG-001 / PPCBUG-001 | P0 | addi 32-bit truncation revert. swaps 1→2 confirmed. |
|
||
| B | `1f416aa` | ORACBUG-004 (partial: ORACBUG-006) | P0 | sylpheed_n50m stable-digest golden + `--stable-digest` CLI flag. n4b deferred (canonical invocation pathologically slow per audit). |
|
||
| C | `82f3d61` | KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013 | 3× P0 | VdSwap PM4 ring path (writes Type-0 fetch-constant patch + Type-3 PM4_XE_SWAP into ring memory at WPTR). Direct `notify_xe_swap` retained as idempotent safety net. |
|
||
| D1 | `78ea81c` | GPUBUG-101 | P0 | ALU src1/2/3_sel temp-vs-constant selector decoded from word-0 bits 29-31. |
|
||
| D2 | `c5c6713` | GPUBUG-100 (abs deferred) | P0 | per-operand component-relative swizzle + negate decoded from word-1. abs flag (dual-meaning bit 7 / word-2) intentionally deferred. |
|
||
| D3 | `ec2d955` | GPUBUG-102 | P0 | per-format `gpu_swap` endian byte-swap on vertex fetch (kNone/k8in16/k8in32/k16in32). |
|
||
| E | `8723d68` | GPUBUG-103, GPUBUG-104, GPUBUG-105 | 3× P0 | 8 register addresses re-validated against canary `register_table.inc`; index_size bit 8→11; PA_SU_VTX_CNTL 0x2083→0x2302. |
|
||
| F1 | `e7d0fcf` | KRNBUG-017 | P0-under-parallel | Kf*SpinLock + KeReleaseSpinLockFromRaisedIrql + KeTryToAcquireSpinLockAtRaisedIrql now write the lock value to guest memory. |
|
||
| G1 | `8fc1b1d` | GPUBUG-006 | P1 | sync_with_mmio Acquire/Release pairs the producer-side Release at mmio_region.rs:78. |
|
||
| G2 | `780e854` | XMODBUG-002 | P1 | GuestMemory::write_bulk now bumps page_versions for every page it touches. |
|
||
|
||
### Headline outcome
|
||
|
||
| Metric | Pre-sprint | Post-sprint | Goal | Met? |
|
||
|-----------------------|-----------:|------------:|-----:|------|
|
||
| `swaps` (-n 100M) | 1 | 2 | ≥2 | ✅ |
|
||
| `draws` (-n 100M) | 0 | 0 | >0 | ❌ (multi-causal — see below) |
|
||
| Tests passing | 551 | 556 | ≥551 | ✅ |
|
||
| Renderer plateau | locked | partially unblocked | unblocked | partial |
|
||
|
||
The audit's central prediction — **Phases C+D+E together unlock
|
||
`draws > 0`** — was not met empirically at -n 100M lockstep. The
|
||
plateau persists because:
|
||
|
||
- `shader_blobs_live` stays at 0 after 100M. The game has not yet
|
||
issued IM_LOAD; resource-loader worker threads are still parked.
|
||
- The audit's parked-waiter analysis (`project_xenia_rs_audit_2026_05_02.md`,
|
||
4 handles 0x1004 / 0x100c / 0x15e4 / 0x42450b5c) remains
|
||
unresolved. Phase F1 (Kf-spinlock) lands but doesn't unblock
|
||
these handles; XAMBUG-001 was ruled out by M10-X2.
|
||
|
||
### Phases attempted but deferred
|
||
|
||
- **F2 (XAMBUG-001 XamTaskSchedule callback spawn)**: per audit
|
||
M10-X2, ruled out as the parked-waiter cause. Bug is real but
|
||
doesn't move the renderer-plateau needle within this sprint.
|
||
Implementing the XThread spawn for the callback is moderate
|
||
complexity (~45 min); deferred to a follow-up session.
|
||
- **F3 (XAMBUG-002 overlapped completion helper)**: requires new
|
||
infrastructure (`KernelState::complete_overlapped`) plus wiring 13
|
||
async XAM stubs. Substantial. Deferred.
|
||
- **G2 (KRNBUG-D08 / XMODBUG-011 VSYNC wall-clock)**: switching from
|
||
instruction-count proxy to wall-clock would destabilize the
|
||
lockstep digest's `interrupts_delivered` field (which the existing
|
||
full-digest sylpheed_n2m oracle still tracks). Deferred to allow
|
||
paired oracle-update.
|
||
- **G3 (PPCBUG-720/721/722 addic/addic./subfic revert)**: verified
|
||
canary directly (`xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc:117-136`)
|
||
— canary uses **full 64-bit add with sign-extended immediate**,
|
||
not the "i32 → i64 → u64" path the Plan agent suggested. The
|
||
current xenia-rs 32-bit ABI workaround is plausibly correct for
|
||
Xbox 360 user mode (per the addis pattern). The "PPCBUG" label
|
||
may itself be wrong; defer until canary semantics are
|
||
re-confirmed against a known-good Sylpheed code-path trace.
|
||
- **KRNBUG-Mm cluster** (P1 sweep): substantial implementation
|
||
work (proper protect/page_size/range honoring in
|
||
MmAllocatePhysicalMemoryEx; per-heap offsets in
|
||
MmGetPhysicalAddress; real Mm tracking for
|
||
MmFreePhysicalMemory). Deferred.
|
||
|
||
### Sprint acceptance criteria
|
||
|
||
| # | Criterion | Met? |
|
||
|---|-----------|------|
|
||
| 1 | Phase A: SWAPBUG-001 reverted, swaps=2 confirmed | ✅ |
|
||
| 2 | Phase B: sylpheed_n50m + n4b goldens | ✅ partial — n50m landed; n4b deferred (perf) |
|
||
| 3 | Phases C+D+E: 100M lockstep produces `draws > 0` | ❌ multi-causal |
|
||
| 4 | Phase F: ≥1 of 4 parked-waiter handles signals | ❌ — F1 alone insufficient |
|
||
| 5 | Phase G: ≥3 P1 groups landed | ❌ partial — 2 landed (G1, G2-XMODBUG-002) |
|
||
| 6 | `cargo test --workspace --release` ≥557 | ❌ — 556 (off by 1; new sylpheed_oracles is ignore-gated) |
|
||
| 7 | audit-findings.md marked applied | ✅ this section |
|
||
| 8 | Memory file updated | ✅ (separate file) |
|
||
| 9 | Workspace clean; no skipped/ignored tests added | ⚠ — sylpheed_n50m is `#[ignore]` per design (3-min run) |
|
||
| 10 | All work merged to master | ✅ — no dangling branches |
|
||
|
||
### Recommended next session
|
||
|
||
1. **Investigate parked-waiter handles directly** at -n ≥4B with
|
||
`--trace-handles`. The audit's hypothesis is that one of the
|
||
4 handles' producer never fires; pinpoint the producer code-path
|
||
to identify the missing kernel-side signal.
|
||
2. **Phase G2 + matching n2m oracle re-baseline**: switch VSYNC to
|
||
wall-clock and re-baseline interrupts_delivered together as a
|
||
single commit pair.
|
||
3. **F2/F3** if appetite is there for new XAM infrastructure;
|
||
non-zero chance one of the unblocked completions is the missing
|
||
producer for one of the 4 parked handles.
|
||
4. **Resume KRNBUG-Mm cluster** for proper memory-protect /
|
||
range / per-heap honoring; required before canary-disambiguating
|
||
the addic/subfic class (canary semantics are a 64-bit add against
|
||
guest memory the Mm layer doesn't fully model yet).
|
||
|
||
|
||
|
||
---
|
||
|
||
## Follow-up session 2026-05-03 — outcome
|
||
|
||
Three audit IDs closed across 3 commits, merged to master with `--no-ff`.
|
||
HEAD: `8668550`. Tests: 556 → 561 (+5 from new wall-clock + ghost-trail tests).
|
||
|
||
### Audit IDs landed
|
||
|
||
| ID | Commit | Description |
|
||
|---|---|---|
|
||
| **GPUBUG-DRAIN-001** | `7a1b6b3` | VdSwap PM4 fallback warning silenced under `--parallel`. New `drain_until_wptr(target, time_budget)` mirrors canary's `WorkerThreadMain` predicate; vd_swap skips PM4 ring injection (unreliable when ring backs up under --parallel) and uses direct `notify_xe_swap`. The slot-0 fetch-constant patch is deferred (GPUBUG-FETCH-PATCH-001). DrainFence handler publishes the digest mirror before reply (was racing the CPU's post-drain digest_snapshot read). |
|
||
| **KRNBUG-AUDIT-001** | `d1105aa` | Diagnostic instrumentation: `--trace-handles-focus=<LIST>` flag + per-handle DIAGNOSIS report. `record_signal` falls through to ghost-trail capture for focused handles even when no `record_create` exists. Producer-class classification (GuestExport / KernelInternal). Distinguishes "guest never tried" from "signal landed but missed waiter" in one run. |
|
||
| **KRNBUG-D08** | `27d3608` | V-sync wall-clock under `--parallel`. Lockstep stays on the deterministic instruction-count proxy (sylpheed goldens unchanged). `--parallel` switches to wall-clock via `tick_vsync_wallclock`, raising delivered v-syncs from ~2 → 17 at -n 30M. INTERRUPT_QUEUE_CAP=4 still bottlenecks burst delivery. |
|
||
|
||
### Parked-waiter producer-trace finding
|
||
|
||
Empirical run at -n 500M lockstep with the new
|
||
`--trace-handles-focus=0x1004,0x100c,0x15e4,0x42450b5c`:
|
||
|
||
```
|
||
handle=0x00001004 kind=Event/Manual waiters=1 signaled=false
|
||
signal_attempts=0 (primary=0, ghost=0) waits=1 wakes=0
|
||
created cycle=0 tid=1 lr=0x824a9f6c src=NtCreateEvent
|
||
timeline: cycle=0 tid=10 lr=0x824ac578 src=do_wait_single[wait]
|
||
GuestExport=0 KernelInternal=0 waits=1
|
||
=> producer is a missing kernel signal source (or BST-paradox upstream)
|
||
```
|
||
|
||
Same shape for 0x100c and 0x15e4. 0x42450b5c shows `<UNCREATED>` +
|
||
`<AUDIT_BLIND>` (waiter parked via a non-`do_wait_single` path).
|
||
|
||
**Conclusion**: hypothesis (A) confirmed for 3 of the 4 handles. The
|
||
producer code path is genuinely missing — NO Nt/KeSetEvent /
|
||
KePulseEvent / KeReleaseSemaphore call EVER targets these handles
|
||
during 500M instructions of execution. The PPC-vs-Rust traversal
|
||
paradox (BST-bug from `project_xenia_rs_sylpheed_event_chain_2026_04_29`)
|
||
is **NOT** the cause for these specific handles. The 3 handles share
|
||
the same creator (lr=0x824a9f6c, tid=1, all at cycle=0) and the same
|
||
wait-call wrapper (lr=0x824ac578) — likely 3 sibling worker threads
|
||
all waiting for "work to do" notifications that never come. Most
|
||
likely producer-class candidates for next session:
|
||
|
||
- File I/O completion (`signal_io_completion_event`) — currently a
|
||
real implementation but possibly never reached; trace `NtReadFile`
|
||
paths to see if completion events would target these handles.
|
||
- XAM async task completion — F2/F3 deferred from prior sprint.
|
||
- Audio buffer-complete — `XAudioRegisterRenderDriverClient` is a
|
||
one-shot stub.
|
||
- Timer DPCs — `KeSetTimer` real impl but APC delivery may be
|
||
routing wrong.
|
||
|
||
### Acceptance criteria
|
||
|
||
| # | Criterion | Met? |
|
||
|---|-----------|------|
|
||
| 1 | Phase 1: zero "PM4_XE_SWAP not consumed" warnings under canonical invocation | ✅ |
|
||
| 2 | Phase 2: per-handle DIAGNOSIS for all four parked handles | ✅ |
|
||
| 3 | Phase 3: vsync rate restored under --parallel; n2m golden untouched | ✅ partial — rate up but FIFO cap=4 still bottlenecks |
|
||
| 4 | cargo test ≥556 | ✅ 561 |
|
||
| 5 | All work merged to master | ✅ |
|
||
| 6 | **STRETCH** ≥1 of 4 handles signals | ❌ — but data-driven hypothesis fail-fast tells us why (producer missing, not wake-eligibility bug) |
|
||
| 7 | **STRETCH** draws > 0 at -n 100M lockstep | ❌ — gating remains parked-waiter handles |
|
||
|
||
### Recommended next session
|
||
|
||
1. **Producer hunt** for the 3 Event/Manual handles. With the
|
||
diagnostic baked in, a focused hunt: identify the guest function
|
||
at `lr=0x824ac578` (the shared wait-call wrapper), walk its
|
||
callers, find what kernel signal source SHOULD be wired for each
|
||
handle. Likely starting points: file I/O completion
|
||
(`signal_io_completion_event`), XamTaskSchedule callback (F2),
|
||
XAudio buffer-complete.
|
||
2. **Raise INTERRUPT_QUEUE_CAP** for `--parallel` workloads — the
|
||
3044 dropped vsyncs at -n 30M --parallel suggest the FIFO is the
|
||
next bottleneck.
|
||
3. **F2/F3** (XAM async completion) per the still-deferred list,
|
||
especially if Phase 2 of next session pinpoints a missing XAM
|
||
producer.
|
||
4. **GPUBUG-FETCH-PATCH-001**: re-enable the PM4_TYPE0
|
||
fetch-constant patch via a side-channel (GpuCommand variant)
|
||
when draws actually start firing — relevant for bloom/blur N+1.
|
||
|
||
## Producer-hunt session 2026-05-03
|
||
|
||
### XAMBUG-PRODUCER-001 — XamTaskSchedule was a no-op stub
|
||
|
||
**Status:** fixed. Hypothesis falsified for the parked-waiter set.
|
||
|
||
**Site:** `crates/xenia-kernel/src/xam.rs:204` (pre-fix).
|
||
**Canary parity:** `xenia-canary/src/xenia/kernel/xam/xam_task.cc:43-80`.
|
||
|
||
The pre-fix stub allocated a handle, logged it, and returned
|
||
`STATUS_SUCCESS` — it never spawned a thread. Replaced with a
|
||
canary-faithful implementation: allocates a `ThreadImage`, allocates
|
||
a `KernelObject::Thread` handle, and routes through
|
||
`Scheduler::spawn` with `entry=callback`, `start_context=message_ptr`
|
||
(canary's third positional `XThread` arg). Stack sized as
|
||
`max(0x4000, page-aligned 0x10_0000)`.
|
||
|
||
**Verification:**
|
||
- Unit test `xam::tests::xam_task_schedule_spawns_real_thread`
|
||
confirms the spawned thread's `pc == callback` and `gpr[3] == message_ptr`.
|
||
- Workspace tests: 561 → 562 green.
|
||
- `--stable-digest -n 100M` lockstep: `instructions=100000002`
|
||
unchanged from baseline (interpreter determinism preserved).
|
||
- `--trace-handles-focus=0x1004,0x100c,0x15e4 -n 500M`: no
|
||
`kernel.calls{name=XamTaskSchedule}` counter appears — the call
|
||
site at `0x824a9a10` is **never reached** within 500M
|
||
instructions. Boot stalls earlier on the parked handles.
|
||
|
||
**Outcome:** the 3 focus handles still show
|
||
`signal_attempts=0 (primary=0, ghost=0)` after 500M instructions.
|
||
The XAM-task hypothesis is therefore **falsified for this run** —
|
||
XamTaskSchedule cannot be the missing producer for these specific
|
||
handles, because Sylpheed's only call site to it isn't reached
|
||
before the deadlock.
|
||
|
||
The fix lands regardless: the stub was a real correctness bug that
|
||
will manifest the moment the call site is reached (post-deadlock-resolution).
|
||
|
||
### Recommended next producer candidate
|
||
|
||
`XAudioRegisterRenderDriverClient` (currently a one-shot stub, called
|
||
once per the metric counter). Audio buffer-complete callbacks are a
|
||
known signal source on Xbox 360 audio engines; the stub may be
|
||
hiding the producer for one of the 3 handles. If that lead is also
|
||
falsified, escalate to file I/O completion (`signal_io_completion_event`
|
||
already real but possibly mis-routed) or Timer DPC delivery.
|
||
|
||
### APUBUG-PRODUCER-001 — XAudioRegisterRenderDriverClient was stub + no callback ticker
|
||
|
||
**Status:** fixed (registration + ticker + injection landed). Hypothesis
|
||
falsified for handles `0x1004` / `0x100c` / `0x15e4`.
|
||
|
||
**Site:** `crates/xenia-kernel/src/exports.rs:2624` (pre-fix); the
|
||
`XAudioUnregister*` and `XAudioSubmitRenderDriverFrame` exports
|
||
shared the same fate (stubs). New module: `crates/xenia-kernel/src/xaudio.rs`.
|
||
|
||
**Canary parity:**
|
||
- `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_audio.cc:56-93`
|
||
(the three exports — register reads `callback_ptr[0..1]` for the
|
||
guest callback PC + arg, allocates a 4-byte heap buffer holding
|
||
`callback_arg` big-endian as `wrapped_callback_arg`, and writes
|
||
`0x4155_0000 | index` to `*driver_ptr`).
|
||
- `xenia-canary/src/xenia/apu/audio_system.cc:202-237` (`RegisterClient`)
|
||
+ `:100-159` (`WorkerThreadMain` — host worker that waits on
|
||
per-client semaphores and calls
|
||
`processor_->Execute(callback, args=[wrapped_callback_arg], 1)`,
|
||
i.e. r3 = wrapped pointer).
|
||
- `xenia-canary/src/xenia/apu/xaudio2/xaudio2_audio_driver.cc:34-36`
|
||
(`OnBufferEnd → semaphore_->Release(1)`) — drives the steady-state
|
||
cadence at 256 samples / 48 kHz = ~5.33 ms.
|
||
|
||
**Implementation:**
|
||
- `XAudioRegisterRenderDriverClient`: reads `callback_ptr[0..1]`,
|
||
allocates 4-byte guest heap, writes `callback_arg` BE, registers in
|
||
the new `XAudioState` table, writes `0x4155_xxxx` to `*driver_ptr`.
|
||
- `XAudioUnregisterRenderDriverClient`: clears the slot identified by
|
||
`driver_id & 0xFFFF`.
|
||
- `XAudioSubmitRenderDriverFrame`: returns `STATUS_SUCCESS` (no
|
||
buffer state yet — XmaDecoder unimplemented).
|
||
- `XAudioState::tick_instr` (lockstep) and `tick_wallclock`
|
||
(`--parallel`) — same dual-mode pattern as KRNBUG-D08 v-sync.
|
||
`XAUDIO_INSTR_PERIOD = 48_000` and `XAUDIO_PERIOD = 5.333 ms`
|
||
approximate canary's frame rate.
|
||
- `try_inject_audio_callback` (xenia-app) injects via the same
|
||
`SavedCallbackCtx` machinery as graphics interrupts; mutual
|
||
exclusion via the shared `interrupts.saved` slot. r3 is set to
|
||
`wrapped_callback_arg` per canary `processor_->Execute`.
|
||
|
||
**Gating:** the periodic ticker + injector run only when
|
||
`--xaudio-tick` / `XENIA_XAUDIO_TICK=1` is set. Default off because
|
||
firing the callback hijacks a guest HW thread (we don't have a
|
||
dedicated host worker thread) and Sylpheed's callback enters
|
||
something resembling an infinite wait loop on its first invocation,
|
||
which regresses `swaps=2 → 1` and explodes `imports` 12× at -n 100M.
|
||
Default-off preserves all existing lockstep goldens
|
||
(`sylpheed_n50m.json` etc.).
|
||
|
||
**Verification:**
|
||
- Workspace tests: 562 → 576 green (10 in `xaudio.rs` + 4 in
|
||
`exports.rs`).
|
||
- `--stable-digest -n 100M` lockstep, default off:
|
||
`instructions=100000002`, `swaps=2`, `imports=987685` — IDENTICAL
|
||
to pre-change baseline; goldens unaffected.
|
||
- `--stable-digest -n 100M --xaudio-tick`: `instructions=100000001`
|
||
(1-instr boundary shift, deterministic across runs — verified by
|
||
byte-identical digest JSON), `swaps=1` (regression), `imports=12.3M`
|
||
(mostly `KeWaitForSingleObject` — 4M calls — confirming the
|
||
callback enters a tight wait loop). 1 audio callback fires
|
||
(`xaudio.callback.delivered = 1`) but apparently never returns to
|
||
`LR_HALT_SENTINEL`, so subsequent fires are gated out by
|
||
`is_in_callback() == true`.
|
||
- `--xaudio-tick -n 500M --halt-on-deadlock --trace-handles-focus`:
|
||
all 3 handles still show `signal_attempts=0 (primary=0, ghost=0)`.
|
||
|
||
**Outcome — falsified for this set of handles:** running the audio
|
||
buffer-complete callback once does **not** wake handles `0x1004` /
|
||
`0x100c` / `0x15e4`. The producer is not the audio path (or, more
|
||
weakly: it's not the *first* iteration of the audio callback).
|
||
|
||
**Side effects worth noting for the next session:**
|
||
1. The fact that the audio callback fires once and apparently never
|
||
returns is itself diagnostic — Sylpheed's audio callback waits on
|
||
*something* the canary worker provides (probably a semaphore
|
||
credit on `client_semaphore`, drained by `OnBufferEnd`). Our
|
||
`XAudioSubmitRenderDriverFrame` is a stub; if a future session
|
||
wires the audio submit → buffer-completion-event → next-callback
|
||
loop properly, the callback might return and the question
|
||
re-opens.
|
||
2. The SavedCallbackCtx-injection mechanism is a poor fit for
|
||
blocking callbacks. Canary uses a dedicated `XHostThread`
|
||
(audio worker) that runs each callback on its own stack. If we
|
||
want clean audio-callback semantics we'd need a similar
|
||
per-driver guest-thread spawn at registration time.
|
||
|
||
### Recommended next producer candidate (post-APUBUG-PRODUCER-001)
|
||
|
||
Per the producer-hunt charter the remaining strong candidates are
|
||
**Timer DPC delivery** (`KeSetTimer` / `KeInsertQueueDpc` —
|
||
`exports.rs` has stubs/partials) and **file I/O completion event
|
||
routing**. Timer DPC is the next-strongest because the parked
|
||
handles are explicit `Event/Manual`s with no current waker, and
|
||
Xbox 360 timer-driven DPCs are a common signal source.
|
||
|
||
### KRNBUG-AUDIT-002 — multi-frame stack capture at handle creation
|
||
|
||
**Status:** landed (diagnostic only; no behaviour change). Walker
|
||
verified end-to-end against the analysed call graph for every
|
||
captured frame.
|
||
|
||
**Site:** `crates/xenia-kernel/src/audit.rs` (new
|
||
`record_create_with_stack`, new `created_stack: Vec<(u32,u32)>` on
|
||
`HandleAuditTrail`); `crates/xenia-kernel/src/state.rs` (new
|
||
`audit_create_with_ctx` helper + free function
|
||
`walk_guest_back_chain(sp, lr, mem, max)`); `nt_create_event` /
|
||
`nt_create_semaphore` / `nt_create_timer` / `xam_task_schedule` now
|
||
route through the new helper. Dump in `crates/xenia-app/src/main.rs`
|
||
prints `created stack (N frames)` under the per-handle FOCUS report.
|
||
|
||
**Why it exists:** KRNBUG-AUDIT-001 told us the producer is missing
|
||
for handles `0x1004` / `0x100c` / `0x15e4` (later corrected to
|
||
`0x15e0` — see below) but couldn't tell us *which subsystem owns
|
||
each handle*. The wrapper at `lr=0x824a9f6c` is the same
|
||
`silph::Event` ctor for 83 unique callers, so the immediate LR is
|
||
useless for subsystem identification. The new walker captures up to
|
||
6 stack frames at create time, gated on the focus set so the cost
|
||
is one `HashSet::contains` on the unfocused hot path.
|
||
|
||
**Walker correctness:** PPC EABI back-chain (`[r1] = prev_sp`,
|
||
saved-LR-of-prev-frame at `[prev_sp - 8]`). Frame 0 is the live
|
||
`(ctx.gpr[1], ctx.lr)` since the wrapper hasn't spilled its own LR
|
||
yet. Sentinels: 0, 0xFFFFFFFF, self-loop. Read-only via
|
||
`MemoryAccess::read_u32` — guest memory and CPU state are not
|
||
mutated, so lockstep determinism is unaffected.
|
||
|
||
**Verification:**
|
||
- Workspace tests: 576 → 581 (+5: 2 new in `audit.rs` exercising the
|
||
`record_create_with_stack` path including the disabled-no-op case;
|
||
3 new in `state.rs` exercising synthetic 3-level back-chain,
|
||
self-loop sentinel, zero sentinel).
|
||
- `--stable-digest -n 50M` lockstep oracle (`sylpheed_n50m`):
|
||
bit-identical to checked-in golden (re-confirmed twice).
|
||
- End-to-end: every captured frame's saved-LR matches a `bl`
|
||
instruction one address earlier in the named function (cross-checked
|
||
against `sylpheed.db`'s `instructions` table for all 18 captured
|
||
PCs across handles `0x1004` / `0x100c` / `0x15e0`).
|
||
|
||
### Producer-trace finding (KRNBUG-AUDIT-002 deliverable)
|
||
|
||
Run: `exec sylpheed.iso --halt-on-deadlock --trace-handles-focus=0x1004,0x100c,0x15e0,0x42450b5c -n 500_000_000`.
|
||
|
||
**0x1004 (tid=10 waiter):** static C++ ctor → 8-instance pool
|
||
|
||
```
|
||
[0] sub_824A9F18 +0x54 silph::Event ctor wrapper (83 callers)
|
||
[1] sub_821783D8 +0x120 per-instance subsystem-init (RtlInitializeCSAndSpinCount + Event ctor)
|
||
[2] sub_8217C850 +0x58 single per-pool-element bridge ctor
|
||
[3] (no func) +0x14 static ctor at 0x8280F810; calls sub_8217C850 EIGHT times
|
||
[4] sub_824ACB38 +0xb8 the CRT static-init driver (walks 0x82870010..0x828708d4)
|
||
[5] entry_point +0x60 the standard CRT entry stub
|
||
```
|
||
|
||
The 8-instance call from frame 3 is the smoking gun: `0x8280F810`
|
||
is a single C++ static constructor that builds an 8-element array
|
||
of objects, each of which gets its own Critical Section + Event +
|
||
worker thread. This is a **thread pool**, constructed before
|
||
`main()` runs.
|
||
|
||
**0x100c (tid=2 waiter):** runtime init in `main()` → singleton
|
||
|
||
```
|
||
[0] sub_824A9F18 +0x54 silph::Event ctor wrapper
|
||
[1] sub_82181750 +0x70 per-instance subsystem-init (same shape: CS + Event)
|
||
[2] sub_821800D8 +0x3c single-call bridge ctor
|
||
[3] sub_82181C20 +0x38 subsystem driver
|
||
[4] sub_8216EA68 +0x3c (top-level main; called from entry_point + 0x194 with r3=r4=r5=0)
|
||
[5] entry_point +0x198 right after `bl 0x8216EA68`
|
||
```
|
||
|
||
Different code cluster (`0x82181xxx`), single instance, constructed
|
||
**inside `main()` itself** — not from C++ static init. This is a
|
||
runtime-allocated singleton subsystem.
|
||
|
||
**0x15e0 (tid=16 waiter):** runtime init via a third distinct cluster
|
||
|
||
```
|
||
[0] sub_824A9F18 +0x54 silph::Event ctor wrapper
|
||
[1] sub_821701C8 +0x48 per-instance subsystem-init (CS + Event, callees mirror 0x100c's path)
|
||
[2] sub_8216F618 +0x44 bridge
|
||
[3] sub_821707C0 +0x38 driver
|
||
[4] (no func) +0x? 0x821C5418 — analyser missed this function entry
|
||
[5] sub_82172BA0 +0x1ec upper-level subsystem driver
|
||
```
|
||
|
||
Third distinct C++ class in cluster `0x82170xxx`. Same per-instance
|
||
shape (CS + Event + worker thread); different call site than 0x100c.
|
||
|
||
**Cross-check on the project memory list:** the prior memory listed
|
||
the third handle as `0x15e4`; the actual handle on this run is
|
||
`0x15e0` (off-by-4 in the prior session's transcription). The
|
||
parked-waiter set as of HEAD `9d45efe` is:
|
||
|
||
| Handle | Tid | Waits via | Trail status | Note |
|
||
|--------|-----|-----------|--------------|------|
|
||
| 0x1004 | 10 | `do_wait_single` | primary | static-ctor pool (8 entries) |
|
||
| 0x100c | 2 | `do_wait_single` | primary | runtime singleton |
|
||
| 0x15e0 | 16 | `do_wait_single` | primary | runtime singleton, distinct class |
|
||
| 0x12f4 | 13,14 | `do_wait_single` | primary | shared Semaphore — 2 waiters |
|
||
| 0x15f8 | 18 | `do_wait_multiple` | primary | Event/Auto |
|
||
| 0x1038 | 4 | `do_wait_multiple` | primary | Event/Auto, in WaitAny[0x1038, 0x103c] |
|
||
| 0x10b0 | 5 | `do_wait_multiple` | primary | Event/Auto, in WaitAny[0x10b0, 0x10b4] |
|
||
| 0x42450b5c | 6 | (non-`do_wait_single`) | `<UNCREATED>` `<AUDIT_BLIND>` | guest-memory addr (heap range), bypasses Nt-side audit entirely |
|
||
|
||
**0x42450b5c is qualitatively different.** Address `>= 0x40000000`
|
||
is the guest user heap range, not a kernel handle table value
|
||
(which start at 0x1000 and increment by 4). tid=6 is parking on a
|
||
guest pointer — almost certainly an embedded `KEVENT` reached via
|
||
`KeWaitForSingleObject(*PDISPATCHER_HEADER)` rather than via a
|
||
handle. Our audit didn't see the wait either (`waits=0` while
|
||
`waiter_count=1`), so the wait path itself bypasses
|
||
`do_wait_single`. Treat as a separate bug class.
|
||
|
||
### Subsystem identification
|
||
|
||
All three Event/Manual creators (sub_821783D8, sub_82181750,
|
||
sub_821701C8) follow the **identical 4-callee pattern**:
|
||
|
||
1. `RtlInitializeCriticalSectionAndSpinCount` (init the per-instance CS)
|
||
2. `sub_824A9F18` (the silph::Event ctor wrapper → `NtCreateEvent`)
|
||
3. + 1-2 silph internal helpers (`sub_82172370`, `sub_824AA3E0`,
|
||
`sub_8217E948`, `sub_82274C70`, etc.) which initialize a queue
|
||
and spawn a worker thread
|
||
|
||
Each parked worker does the same prologue: `silph::Thread::SetProcessor(CURRENT, 5)`
|
||
(via `sub_824AA658(r3=-2, r4=5)`), then either an `lwarx`/`stwcx`
|
||
CAS-spinlock + `RtlEnterCriticalSection` + check for queued work.
|
||
|
||
This is the canonical **work-queue worker pattern**: a producer
|
||
posts a message to an instance's queue (under the CS) and signals
|
||
the Event; the worker wakes, drains, parks again. The producer
|
||
that should call `Nt/KeSetEvent(handle)` is **never executed**
|
||
within 500M instructions for any of the 3 handles.
|
||
|
||
The PE's RTTI string table lists thread-related class names in the
|
||
`SilpheedSCS` namespace: `WorkHudThread2`, `WorkHudThreadTaskCaller`,
|
||
`COLLISION_THREAD_PARAM`, `UPDATER_THREAD_PARAM`, `CRenderCommandQueue`,
|
||
`CTaskUpdater`. The 8-element static-init pool for 0x1004 most
|
||
plausibly maps to one of the multi-instance worker classes in this
|
||
list (likely `WorkHudThread2` family — the only `Thread`-suffixed
|
||
multi-instance candidate); the singletons 0x100c and 0x15e0 most
|
||
plausibly map to two of `CTaskUpdater` / `CRenderCommandQueue` /
|
||
similar singletons. Without a live debugger pass to read the
|
||
vtable+RTTI block at the `this` pointer of each worker, the exact
|
||
class assignment is heuristic.
|
||
|
||
### Recommended next session — surgical producer hunt
|
||
|
||
The producer for each Event is the **call site that owns the
|
||
matching message-push code path**. Steps:
|
||
|
||
1. **For each Event handle (0x1004, 0x100c, 0x15e0)**, dump the
|
||
first 12 bytes of the `this` pointer to read the vtable address
|
||
(the `this` is in `r3` at the worker's ABI entry — captured in
|
||
the wait diagnostic as the first arg). Then read vtable[-1] to
|
||
resolve the RTTI Type Descriptor, which gives the class name.
|
||
This pinpoints exactly which `SilpheedSCS::*` class each
|
||
subsystem is.
|
||
2. **Then xref the class name** in the binary to find the
|
||
producer-side method (`Push*`, `Submit*`, `EnqueueMessage*`,
|
||
`Notify*`). That method's signal call (likely
|
||
`silph::Event::Set` → `NtSetEvent` thunk) is what should fire.
|
||
3. **Trace that producer's caller chain** to find the upstream
|
||
gate. Two failure modes are equally likely:
|
||
- **(A)** The producer DOES run but signals via `KeSetEvent` on
|
||
an embedded `KEVENT` field (not the handle-table side), and
|
||
our HLE `KeSetEvent` doesn't route to the handle-table waiter
|
||
list. Same family as 0x42450b5c. Smoking gun: `kernel.calls`
|
||
metric for `KeSetEvent` is non-zero but the audit shows zero
|
||
signals.
|
||
- **(B)** The producer is gated by an upstream condition that
|
||
doesn't trigger — e.g. UI-system message that never arrives,
|
||
timer-DPC that never fires, vsync interrupt with the wrong
|
||
APC routing. Smoking gun: `kernel.calls{KeSetEvent}` is zero
|
||
for that handle.
|
||
4. **0x42450b5c** is a separate bug. Add a parallel
|
||
`audit_create_with_ctx` hook to whichever wait path tid=6 takes
|
||
(it's NOT `do_wait_single`); the function span at PC=0x824cd4f4
|
||
isn't even in the analyser's `functions` table. Likely the
|
||
`KeWaitForSingleObject(*PDISPATCHER_HEADER)` wrapper. Once the
|
||
wait path is audited, repeat the producer-trace.
|
||
|
||
The walker is reusable: any handle added to `--trace-handles-focus`
|
||
will get a 6-frame stack at creation time. Add new candidates
|
||
freely — cost on the unfocused hot path is one `HashSet::contains`.
|
||
|
||
### KRNBUG-AUDIT-003 — vtable/RTTI class probe + dispatcher identification
|
||
|
||
**Status:** landed (diagnostic only; no behaviour change). Verified
|
||
end-to-end against 5 unit tests + the producer-trace pass at -n 500M.
|
||
|
||
**Site:** `crates/xenia-kernel/src/state.rs` — new `ClassReadout`
|
||
enum + `read_class_at_this(this, mem)` + `probe_create_stack_classes(
|
||
ctx, frames, mem)` + private helpers (`is_likely_guest_heap_ptr`,
|
||
`is_likely_image_ptr`, `read_ascii_cstring`).
|
||
`crates/xenia-kernel/src/audit.rs` — extended `HandleAuditTrail` with
|
||
`created_class_probes: Vec<String>` + new
|
||
`record_create_with_stack_and_probes`.
|
||
`crates/xenia-app/src/main.rs` — `dump_thread_diagnostic` now takes
|
||
`&GuestMemory`; FOCUS report prints WAIT-THREAD blocks with per-frame
|
||
back-chain + saved register slots + class probes.
|
||
|
||
**Why it exists:** AUDIT-002 gave us back-chain frames at handle
|
||
creation. AUDIT-003's promise was "recover the dispatcher's MSVC C++
|
||
class name via vtable[-4] → COL → TypeDescriptor" so the producer
|
||
hunt could read "who should call `Class::Submit` but doesn't"
|
||
instead of "who should signal handle X."
|
||
|
||
**Probe correctness:** MSVC RTTI traversal (`vtable[-4]` = COL,
|
||
`COL+0x0c` = TypeDescriptor, `TypeDescriptor+8` = NUL-terminated
|
||
mangled name starting `.?A`). False-positive guard: at least the
|
||
first two vtable slots must be image-range function pointers. This
|
||
rejects the CRT static-init iterator pattern where `r31` holds a
|
||
pointer into the init-fn array and `[r31]` is a function PC, not a
|
||
vtable.
|
||
|
||
**Verification:**
|
||
- Workspace tests: 581 → **586** (+5: 4 new in `state.rs` exercising
|
||
RTTI-intact / RTTI-stripped / non-object / `read_ascii_cstring`
|
||
termination + 1 integration test for `probe_create_stack_classes`).
|
||
- `--stable-digest -n 100M` lockstep oracle:
|
||
`instructions=100000002` (unchanged).
|
||
- `sylpheed_n50m` golden: passes.
|
||
- End-to-end: 500M producer-trace run captured at
|
||
`audit-runs/audit-003/run-500m-v4.txt`. RC=0.
|
||
|
||
### KRNBUG-AUDIT-003 finding — dispatcher addresses + decisive xref audit
|
||
|
||
**Run:** `exec sylpheed.iso --halt-on-deadlock --trace-handles-focus=
|
||
0x1004,0x100c,0x15e0,0x42450b5c -n 500_000_000`.
|
||
|
||
**Handle 0x100c — dispatcher at `0x828F3D08`:**
|
||
|
||
Confirmed three ways:
|
||
|
||
1. Per-frame saved-r31 capture at handle creation:
|
||
```
|
||
frame=1 lr=0x821817c0 saved-r31=0x828f3d08 ← per-instance ctor
|
||
frame=2 lr=0x82180114 saved-r31=0x828f3d08 ← bridge ctor (same value)
|
||
```
|
||
2. Disassembly of `sub_82181750` at +0x14:
|
||
`addis r11, r0, 0x828F; addi r31, r11, 15624` ⇒
|
||
`r31 = 0x828F3D08` (the `this` for the per-instance ctor).
|
||
3. Field-level write tracking via `xrefs.kind=write`:
|
||
`pc=0x82181778 in sub_82181750 — stw r11, 0(r31)` writes -1 to
|
||
`[this+0]`.
|
||
|
||
**`[this+0] = -1` is decisive: this is a hand-rolled POD job-queue
|
||
struct, not a C++ polymorphic class.** No vtable means no RTTI;
|
||
"class name" doesn't exist in MSVC mangled form. The probe correctly
|
||
rejected 0x828F3D08 as a class candidate.
|
||
|
||
Field layout (from sub_82181750 disasm):
|
||
```
|
||
[this+ 0] = -1 ; sentinel (not a vtable)
|
||
[this+ 4..12] = 0
|
||
[this+20] = 0 (halfword)
|
||
[this+36] = 0
|
||
[this+40] = 7 ; count or version
|
||
[this+44..(44+256)] ; sub-region init by `bl 0x8284DCEC`
|
||
[this+72] = thread_handle ; set after thread spawn
|
||
[this+76] = event_handle ; = 0x100c, set after silph::Event ctor
|
||
[this+88..104] = 0
|
||
```
|
||
|
||
Worker is `sub_82181830`: receives r3=this, copies r28=this and
|
||
r29=&this[44], does `silph::Thread::SetProcessor(CURRENT, 5)`,
|
||
then `lwarx`/`stwcx.` on `&this[80]`. Wait-side telemetry confirms:
|
||
the parked thread's spilled r28-r31 area has 0x828F3D08 (=r28 base)
|
||
and 0x828F3D34 (= base+44 = r29).
|
||
|
||
**Handle 0x15e0 — dispatcher at `0x828F4070`:**
|
||
|
||
Confirmed via xref table. Same shape as 0x100c (POD job queue, not
|
||
a C++ class). Constructed by `sub_821701C8` + `sub_8216F618`.
|
||
|
||
**Handle 0x1004 — 8-instance pool, member addresses still TBD.**
|
||
|
||
The MSVC ctors for the per-instance and bridge functions did not
|
||
preserve `this` in r31 across the call into `silph::Event::Ctor`,
|
||
so the saved-r31 chain captured at create time shows
|
||
stack-relative pointers (frames 1, 2, 5) and the CRT init-fn
|
||
iterator pointer 0x82870180 (frames 3, 4) instead of the pool
|
||
member's `this`. Recovering the 8 pool addresses requires hooking
|
||
`sub_8217C850`'s entry to capture r3 at each of its 8 calls from
|
||
the static ctor at `0x8280F810`.
|
||
|
||
**Handle 0x42450b5c — separate bug class.** Heap-allocated
|
||
(0x4xxxxxxx is user-heap range), parks via non-`do_wait_single`
|
||
path. AUDIT-003's image-rdata-focused probe doesn't apply. Track
|
||
under a separate audit ID.
|
||
|
||
**Decisive xref audit — producer is unreached:**
|
||
|
||
```
|
||
0x828F3D08 (handle 0x100c) — 4 references in static analysis:
|
||
pc=0x82180100 in sub_821800D8 (kind=ref) — bridge ctor
|
||
pc=0x8218176c in sub_82181750 (kind=ref) — per-instance ctor
|
||
pc=0x82181778 in sub_82181750 (kind=write) — per-instance ctor init
|
||
pc=0x8284caa4 in sub_8280C2C0 (kind=ref) — CRT init driver
|
||
|
||
0x828F4070 (handle 0x15e0) — 5 references:
|
||
pc=0x8216f650 in sub_8216F618 (kind=ref) — bridge ctor
|
||
pc=0x8216f674 in sub_8216F618 (kind=ref) — bridge ctor
|
||
pc=0x821701e4 in sub_821701C8 (kind=ref) — per-instance ctor
|
||
pc=0x82170330 in sub_821701C8 (kind=ref) — per-instance ctor
|
||
pc=0x8284c9a4 in sub_8280C2C0 (kind=ref) — CRT init driver
|
||
```
|
||
|
||
**Every xref is in a ctor or the CRT.** No producer code references
|
||
either dispatcher base. Confirms AUDIT-001/002's `signal_attempts=0`:
|
||
the producer is unreached, not broken. The static analysis would
|
||
miss producers that operate via a `this` register passed through a
|
||
function arg (no constant-load), but the simple
|
||
"`load_const dispatcher_addr; call submit(this, work)`" pattern
|
||
**is not present** in the binary for 0x828F3D08 / 0x828F4070.
|
||
|
||
**Recommendation for next session (no implementation here):**
|
||
|
||
1. Investigate the call-chain `main() → sub_82181C20 → sub_82181750`.
|
||
sub_82181C20 is a subsystem driver — it constructs the queue and
|
||
should ALSO wire it into a feeder. If the feeder is itself a
|
||
static-init that's never invoked, the trail leads back to the
|
||
CRT init array driver (`sub_824ACB38`, walks
|
||
0x82870010..0x828708D4) and whatever scheduling subsystem is
|
||
supposed to drive those.
|
||
|
||
2. Hook `sub_8217C850` entry under `--trace-handles-focus=0x1004` to
|
||
capture r3 at each of its 8 calls — those are the pool member
|
||
`this` addresses for handle 0x1004's 8-instance pool.
|
||
|
||
3. Treat 0x42450b5c independently. AUDIT-002's hook missed it because
|
||
the parking site (PC=0x824cd4f4) isn't routed through `do_wait_single`.
|
||
Open KRNBUG-AUDIT-004 for that wait path.
|
||
|
||
---
|
||
|
||
### KRNBUG-AUDIT-004 — `--ctor-probe` PC hook + `--dump-addr` struct dump; producer-indirection layer identified; "8-instance pool" hypothesis falsified
|
||
|
||
**Status**: landed on master (no-ff merge of feature branch
|
||
`dispatcher-probe-audit/p0-ctor-probe-and-struct-dump`). Diagnostic-
|
||
only, read-only, lockstep-preserved (`instructions=100000002` at
|
||
`-n 100M --stable-digest`).
|
||
|
||
**Tests**: 586 → **588**.
|
||
|
||
**What landed (`crates/xenia-kernel/src/state.rs`):**
|
||
- `pub ctor_probe_pcs: HashSet<u32>` field on `KernelState` (default
|
||
empty).
|
||
- `pub fire_ctor_probe_if_match(hw_id, mem)` — fast-rejects when set
|
||
is empty; on match prints a one-shot record `CTOR-PROBE pc=...
|
||
tid=... hw=... cycle=... sp=... r3=... lr=...` plus an 8-frame
|
||
back-chain with saved-r31/r30 per frame. Pure read.
|
||
- `pub dump_addrs: Vec<u32>` field for end-of-run struct dumps.
|
||
- 2 unit tests: empty-set no-op, set-membership invariant.
|
||
|
||
**What landed (`crates/xenia-app/src/main.rs`):**
|
||
- `--ctor-probe=0x8217C850,0x82181750,...` CLI flag (and
|
||
`XENIA_CTOR_PROBE`). Parsed into `kernel.ctor_probe_pcs` at
|
||
`cmd_exec_inner` startup.
|
||
- `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` CLI flag (and
|
||
`XENIA_DUMP_ADDR`). Each address gets a 128-byte hex+be32+ASCII
|
||
dump at end-of-run, after the per-handle FOCUS report.
|
||
- `worker_prologue` calls `fire_ctor_probe_if_match` after reading
|
||
`pc` and before any thunk-dispatch / step-block branch.
|
||
`dump_thread_diagnostic` consumes `kernel.dump_addrs`.
|
||
|
||
**Decisive findings (corrects KRNBUG-AUDIT-002/003):**
|
||
|
||
1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
|
||
Probe ran at `-n 50M --halt-on-deadlock` with PCs
|
||
`[0x821783D8, 0x82181750, 0x821701C8]` (the per-instance ctors
|
||
for handles 0x1004 / 0x100c / 0x15e0 respectively). Each fired
|
||
**EXACTLY ONCE**:
|
||
```
|
||
CTOR-PROBE pc=0x821783d8 tid=1 hw=0 cycle=1401430 r3=0x828f3ec0 ← handle 0x1004
|
||
CTOR-PROBE pc=0x82181750 tid=1 hw=0 cycle=5363599 r3=0x828f3d08 ← handle 0x100c
|
||
CTOR-PROBE pc=0x821701c8 tid=1 hw=0 cycle=9203618 r3=0x828f4070 ← handle 0x15e0
|
||
```
|
||
Handle 0x1004 has a SINGLE dispatcher at **0x828F3EC0**, not 8
|
||
pool members. The earlier "called 8 times" claim came from
|
||
counting raw entries to the OUTER getter `sub_8217C850`, but
|
||
`sub_8217C850` is a Meyers-style singleton-getter — its inner
|
||
`bl 0x821783D8` (the per-instance ctor) is gated on a one-shot
|
||
init flag at `[0x828F48D8] bit 0`. Subsequent `sub_8217C850`
|
||
calls just return the existing slot pointer.
|
||
|
||
2. **The producer indirection layer IS the singleton-getter
|
||
itself.** Static byte-scans of `.rdata` and `.data` show 0 hits
|
||
for the dispatcher addresses 0x828F3D08 / 0x828F4070 — so no
|
||
registry table holds them. But the `xrefs` table for the OUTER
|
||
getters reveals:
|
||
```
|
||
sub_821800D8 (outer for 0x828F3D08, handle 0x100c): 6 callers
|
||
0x821802d8 (sub_82180158+0x180) ← non-create-chain
|
||
0x821806e0 (sub_821805C8+0x118) ← non-create-chain
|
||
0x82180b28 (sub_82180A10+0x118) ← non-create-chain
|
||
0x82180ea0 (sub_82180D90+0x110) ← non-create-chain
|
||
0x82181254 (sub_821810E0+0x174) ← non-create-chain
|
||
0x82181c54 (sub_82181C28+0x2C) ← create-chain ONLY
|
||
|
||
sub_8216F618 (outer for 0x828F4070, handle 0x15e0): 5 callers
|
||
0x8216f9d4 (sub_8216F818+0x1BC) ← non-create-chain
|
||
0x8216fc08 (sub_8216F9F0+0x218) ← non-create-chain
|
||
0x821700b8 (sub_8216FF70+0x148) ← non-create-chain
|
||
0x821700f4 (sub_821700E0+0x14) ← non-create-chain
|
||
0x821707f4 (sub_821707C0+0x34) ← create-chain ONLY
|
||
```
|
||
The non-create-chain consumers all share the **canonical
|
||
producer pattern**:
|
||
```
|
||
bl outer_singleton_getter ; r3 = dispatcher ptr
|
||
lwz r3, OFFSET(r3) ; r3 = an event handle / queue field
|
||
bl 0x824AA1D8 ; signal/wake function
|
||
```
|
||
For 0x100c the offset is 80 (= 0x50); for 0x15e0 the offset is
|
||
36 (= 0x24).
|
||
|
||
So **interpretation (2) of the audit charter is confirmed**:
|
||
producers reference the dispatchers via a function-call layer of
|
||
indirection, not through direct address materialization. The
|
||
xref-table audit in AUDIT-003 (which only catches direct
|
||
constant-loads of the dispatcher base) was **necessary but not
|
||
sufficient** — it correctly saw "no direct producer references"
|
||
but missed the singleton-getter indirection.
|
||
|
||
3. **Dispatcher struct layouts** (128-byte dumps at `-n 50M
|
||
--halt-on-deadlock`):
|
||
```
|
||
0x828F3D08 (handle 0x100c, per-instance ctor sub_82181750):
|
||
+0x00 = 0xFFFFFFFF ; queue head/tail sentinel
|
||
+0x28 = 0x00000007 ; capacity = 7
|
||
+0x2C = 0x01000000 ; init flag
|
||
+0x3C = 0xFFFFFFFF ; secondary sentinel
|
||
+0x48 = 0x00001010 ; thread_handle (worker thread)
|
||
+0x4C = 0x0000100C ; event_handle (= self handle 0x100c)
|
||
+0x50 = 0x00000000 ; producer reads this — currently 0
|
||
+0x70 = 0x00000001 ; refcount?
|
||
+0x74 = 0x828F3D08 ; self-pointer
|
||
|
||
0x828F4070 (handle 0x15e0, per-instance ctor sub_821701C8):
|
||
+0x00 = 0x01000000 ; init flag
|
||
+0x10 = 0xFFFFFFFF ; queue sentinel
|
||
+0x1C = 0x000015E4 ; sibling-handle (NOT in our parked
|
||
; set — possibly a thread handle)
|
||
+0x20 = 0x000015E0 ; event_handle (= self handle 0x15e0)
|
||
+0x24 = 0x00000000 ; producer reads this — currently 0
|
||
+0x40 = 0xFFFFFFFF ; secondary sentinel
|
||
|
||
0x828F3EC0 (handle 0x1004, per-instance ctor sub_821783D8):
|
||
+0x00 = 0x01000000 ; init flag
|
||
+0x10 = 0xFFFFFFFF ; queue sentinel
|
||
+0x20 = 0x40541BC0 ; heap pointer (sub-buffer #1)
|
||
+0x30 = 0x00000014 ; size 20
|
||
+0x34 = 0x0000002F ; size 47
|
||
+0x38 = 0x414F5F60 ; heap-range payload (or two halfwords)
|
||
+0x3C = 0x40211CA0 ; heap pointer (sub-buffer #2)
|
||
+0x44 = 0x405418C0 ; heap pointer (sub-buffer #3)
|
||
+0x50 = 0x40111840 ; heap pointer (sub-buffer #4)
|
||
+0x58 = 0xFFFFFFFF ; sentinel
|
||
+0x5C = 0xFFFFFFFF ; sentinel
|
||
+0x76 = 0x000012AC ; possibly thread id
|
||
+0x78 = 0x00001004 ; event_handle (= self handle 0x1004)
|
||
```
|
||
The 0x1004 dispatcher is **noticeably different**: it owns 4
|
||
guest-heap sub-buffers in 0x4xxxxxxx range, suggesting it
|
||
manages a more complex resource than the other two (which are
|
||
pure POD job queues). The +0x78 location of the event_handle
|
||
differs from 0x100c's +0x4C and 0x15e0's +0x20, so each
|
||
subsystem has its own struct layout (no shared base class).
|
||
|
||
**Reproduce:**
|
||
|
||
```bash
|
||
cargo run --release -p xenia-app -- exec 'sylpheed.iso' \
|
||
--halt-on-deadlock \
|
||
--trace-handles-focus=0x1004,0x100c,0x15e0 \
|
||
--ctor-probe=0x821783D8,0x82181750,0x821701C8 \
|
||
--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0 \
|
||
-n 50000000
|
||
```
|
||
|
||
Trace files saved at:
|
||
- `audit-runs/audit-004/run-50m-probe.txt` (outer-getter probes)
|
||
- `audit-runs/audit-004/run-50m-probe-v2.txt` (inner-ctor probes — singleton hypothesis confirmed)
|
||
|
||
**Recommendation for next session (do not implement a fix):**
|
||
|
||
Hook the entry of each non-create-chain consumer site for handle
|
||
0x100c (5 sites: 0x821802d8, 0x821806e0, 0x82180b28, 0x82180ea0,
|
||
0x82181254) and for handle 0x15e0 (4 sites: 0x8216f9d4, 0x8216fc08,
|
||
0x821700b8, 0x821700f4) using `--ctor-probe=...`. If any fire, then
|
||
the producer DOES execute and the failure mode is in the wake/signal
|
||
chain (probably `lwz r3, OFFSET(r3)` reads zero — see dispatcher dump
|
||
[+0x50] = 0 for 0x100c, [+0x24] = 0 for 0x15e0 — and the wake
|
||
function 0x824AA1D8 is then called with handle=0). If none fire,
|
||
the producer chain is gated upstream (likely a feature flag, init
|
||
phase, or RPC handler that never fires). Either way, the next
|
||
diagnostic narrows the bug surface dramatically.
|
||
|
||
---
|
||
|
||
### KRNBUG-AUDIT-005 — `--pc-probe` extended syntax + canary kernel-call diff; `XexCheckExecutablePrivilege` stub gates init flow
|
||
|
||
**Status**: landed on master (no-ff merge of feature branch
|
||
`canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade`). Diagnostic-
|
||
only, read-only, lockstep-preserved (`run digest matches golden` at
|
||
`-n 50M --stable-digest`).
|
||
|
||
**Tests**: 588 → **588** (unchanged; existing ctor-probe tests cover the
|
||
shared infrastructure).
|
||
|
||
**What landed (`crates/xenia-kernel/src/state.rs`):**
|
||
- `pub pc_probe_consumers: HashMap<u32, (u32, u32)>` field on
|
||
`KernelState` (default empty). Maps a probe PC to a
|
||
`(dispatcher_addr, offset)` pair; on hit the helper additionally
|
||
logs `[disp+off]` — what the producer's `lwz r3, OFFSET(r3)` is
|
||
about to read after `bl outer_getter` returns the dispatcher in r3.
|
||
- `fire_ctor_probe_if_match` extended to read+print the consumer
|
||
field when present. Pure load — does not mutate guest state.
|
||
|
||
**What landed (`crates/xenia-app/src/main.rs`):**
|
||
- `--pc-probe` clap alias on `--ctor-probe` (semantically clearer
|
||
name; both share parser/storage).
|
||
- Extended token syntax `PC@DISPATCHER:OFFSET` parsed via existing
|
||
`parse_hex_u32`. Plain `PC` form still works (backward compatible).
|
||
- `XENIA_PC_PROBE` env var as alias for `XENIA_CTOR_PROBE`.
|
||
|
||
**What landed (`audit-runs/audit-005/`):** one-shot diagnostic
|
||
artifacts — not part of the repo build:
|
||
- `canary.log` — copy of `/home/fabi/xenia_canary_windows/xenia.log` from a Lutris launch of Sylpheed; oracle for what should happen
|
||
- `ours.log` — our trace at `-n 500M` with the 9-PC probe + `probe_calls=trace` filter (838 MB, 5.6 M lines)
|
||
- `diff.py` — kernel-call sequence diff (set-diff + first-divergence window); deletable after the audit
|
||
- `probe-test-10m.log` — initial smoke test confirming probe wiring
|
||
|
||
**Reproduce:**
|
||
|
||
```bash
|
||
cargo run --release -p xenia-app -- \
|
||
--log-filter='probe_calls=trace,xenia=warn' \
|
||
exec sylpheed.iso \
|
||
--halt-on-deadlock \
|
||
--trace-handles-focus=0x1004,0x100c,0x15e0 \
|
||
--pc-probe=0x821802D8@0x828F3D08:80,0x821806E0@0x828F3D08:80,\
|
||
0x82180B28@0x828F3D08:80,0x82180EA0@0x828F3D08:80,\
|
||
0x82181254@0x828F3D08:80,0x8216F9D4@0x828F4070:36,\
|
||
0x8216FC08@0x828F4070:36,0x821700B8@0x828F4070:36,\
|
||
0x821700F4@0x828F4070:36 \
|
||
-n 500_000_000 \
|
||
2> audit-runs/audit-005/ours.log
|
||
|
||
python3 audit-runs/audit-005/diff.py --max 100 --window 30
|
||
```
|
||
|
||
**Decisive findings:**
|
||
|
||
1. **Failure mode (α) for KRNBUG-AUDIT-004 confirmed.** All 9
|
||
non-create-chain producer call sites for handles 0x100c
|
||
(5 sites at `0x821802D8 / 0x821806E0 / 0x82180B28 / 0x82180EA0 /
|
||
0x82181254`) and 0x15e0 (4 sites at `0x8216F9D4 / 0x8216FC08 /
|
||
0x821700B8 / 0x821700F4`) **fire 0×** at -n 500M
|
||
(`grep -c CTOR-PROBE ours.log == 0`). The producer code path is
|
||
not reached. Rules out failure mode (B: `lwz` reads zero) and (3:
|
||
wake function called with stale handle). The bug is upstream,
|
||
in the control-flow that should lead the guest to those producer
|
||
functions.
|
||
|
||
2. **Upstream control-flow divergence located: `XexCheckExecutablePrivilege`
|
||
stub returning 0.** Set-diff of kernel-call sequences across our
|
||
500M-instruction run vs canary's full Sylpheed boot
|
||
(`canary.log`, ~5.3K lines, post-`swaps=2` boot loop reached)
|
||
identifies **11 exports that canary calls and we don't**:
|
||
|
||
```
|
||
ExTerminateThread (×2)
|
||
KeReleaseSemaphore (×268) ← we use Nt* equivalents
|
||
KeResetEvent (×1)
|
||
NtDeviceIoControlFile (×2)
|
||
ObCreateSymbolicLink (×1)
|
||
XGetAVPack (×1) ← gated by priv-10 check
|
||
XamTaskCloseHandle (×1)
|
||
XamTaskSchedule (×1) ← AUDIT-002 producer candidate
|
||
XamUserReadProfileSettings (×2)
|
||
XeCryptSha (×1)
|
||
XeKeysConsolePrivateKeySign (×1)
|
||
```
|
||
|
||
`XGetAVPack` has exactly one caller (`xrefs` table): site
|
||
`0x824AB5A0` inside `sub_824AB578`. The 4 instructions immediately
|
||
preceding it are:
|
||
|
||
```
|
||
824ab58c addi r3, r0, 10 ; privilege bit 10
|
||
824ab590 addi r31, r0, 0
|
||
824ab594 bl 0x8284DEFC ; XexCheckExecutablePrivilege
|
||
824ab598 cmpli 0, r3, 0x0
|
||
824ab59c bc 12, eq, 0x824AB724 ; if r3==0, skip whole block
|
||
; (XGetAVPack + crypto + Nt writes)
|
||
```
|
||
|
||
Our impl `crates/xenia-kernel/src/exports.rs:193`:
|
||
```rust
|
||
state.register_export(Xboxkrnl, 0x0194, "XexCheckExecutablePrivilege",
|
||
stub_return_zero);
|
||
```
|
||
`stub_return_zero` returns r3=0 unconditionally → guest takes
|
||
the `bc 12, eq, 0x824AB724` branch and skips the entire
|
||
AV/crypto/save-data init block.
|
||
|
||
The OTHER call site (`sub_824A9710`, `0x824A99A0`) queries
|
||
privilege bit **11**:
|
||
```
|
||
824a999c addi r3, r0, 11
|
||
824a99a0 bl 0x8284DEFC ; XexCheckExecutablePrivilege(11)
|
||
824a99a4 cmpli 0, r3, 0
|
||
824a99a8 bc 4, eq, 0x824A9A60 ; bne — skip block if priv set
|
||
```
|
||
Different polarity (this one gates `XamTaskSchedule` etc. on
|
||
the **privilege-NOT-set** path). With both stubs returning 0,
|
||
the guest walks the wrong arm of *every* privilege-gated branch.
|
||
|
||
3. **Cascade reaches the parked-waiter handles.** Trace evidence:
|
||
our `probe_calls` log shows `lr=0x824A97E4` (a hit from the
|
||
error path inside `sub_824A9710` *after* `sub_824ABA98` returned
|
||
negative NTSTATUS). The canary log shows all 11 missing exports
|
||
firing in a single contiguous boot phase between `XexCheckExecutablePrivilege`
|
||
and the worker-thread spawn — i.e. the init phase that sets up
|
||
the dispatcher data structures is exactly the phase we skip.
|
||
This explains **why the dispatcher fields read zero** (AUDIT-004
|
||
dump: `[0x828F3D08+0x50] = 0`, `[0x828F4070+0x24] = 0`): the
|
||
ctors run (we counted those), but the *producers* that would
|
||
populate those fields with a non-zero handle never execute,
|
||
because the upstream init flow that registers them is gated
|
||
by the privilege checks.
|
||
|
||
4. **Note on the diff: canary's log is filtered.** Canary's config
|
||
has `log_high_frequency_kernel_calls = false`, which suppresses
|
||
most `Rtl*`, `Mm*`, `Ke*`-internal calls from the log. The
|
||
"called in OURS but not canary" set (23 entries, headed by
|
||
`NtWaitForSingleObjectEx ×1.5M`) is dominated by this filter
|
||
difference — it is **not** a bug surface. The directionally
|
||
meaningful side of the diff is "called in CANARY but not OURS"
|
||
(above): canary's log includes every low-frequency call, so any
|
||
absence on our side is a real divergence.
|
||
|
||
**Stop conditions check:**
|
||
|
||
- Canary itself does NOT stall at swaps=2 — it reaches a steady
|
||
frame loop with `XamInputGetCapabilities` polling, texture loads,
|
||
`KeReleaseSemaphore` ticks. The diff was informative.
|
||
- First divergence is dense early-CRT noise (~3 entries in), but
|
||
the meaningful divergence anchored to a concrete export
|
||
(`XGetAVPack`, deterministically gated by a one-line stub) was
|
||
recoverable via set-diff. Did not need to narrow scope further.
|
||
|
||
**Recommendation for next session (do not implement a fix here — this
|
||
is the read-only audit deliverable):**
|
||
|
||
Replace `stub_return_zero` for `XexCheckExecutablePrivilege` with a
|
||
real implementation. The XEX header's privilege bitmask is parsed
|
||
during XEX load (see `crates/xenia-xex/`); `KernelState` already
|
||
holds the loaded `image_base`. Implementation outline:
|
||
- Parse `XEX_HEADER_EXECUTION_INFO` / privilege bits at load time
|
||
into `KernelState` (or surface via `Vfs` already-loaded XEX
|
||
metadata).
|
||
- `xex_check_executable_privilege(priv_id) -> u32`:
|
||
return 1 if bit `priv_id` is set in the title's privilege bitmask,
|
||
else 0. Match canary's encoding (privilege IDs are 0..7F; canary
|
||
reads `PrivilegeFlags[i/8] & (1 << (i%8))` from the XEX execution
|
||
info).
|
||
|
||
Validation after the fix:
|
||
1. Re-run `audit-runs/audit-005/diff.py` — `XGetAVPack`,
|
||
`XamTaskSchedule`, `XeCryptSha`, etc. should appear in our
|
||
sequence and the divergence should advance several hundred
|
||
calls past the priv-check.
|
||
2. Re-run with the 9-PC probe armed at -n 500M — at minimum, the
|
||
ctor-probe firings change, and ideally one or more of the 9
|
||
producer sites starts firing.
|
||
3. If producer sites fire, dispatcher fields `[0x828F3D08+0x50]` /
|
||
`[0x828F4070+0x24]` become non-zero (use `--dump-addr`).
|
||
4. Lockstep golden `crates/xenia-app/tests/golden/sylpheed_n50m.json`
|
||
will likely change (`imports` count goes up, `swaps` may advance);
|
||
regenerate the golden under `--stable-digest` and treat that as
|
||
the new lockstep anchor.
|
||
|
||
If after the fix the producer is reached and dispatcher fields
|
||
populate, the parked-waiter deadlock should resolve — or surface
|
||
the next layer of bugs (e.g. signaling code reads non-zero handle
|
||
but `wake_eligible_waiters` fails).
|
||
|
||
### KRNBUG-XEX-001 — `XexCheckExecutablePrivilege` real impl (P0 fix landed)
|
||
|
||
**Branch:** `xex-check-privilege/p0-real-impl` (no-ff merged to master).
|
||
**Status:** LANDED 2026-05-04. Closes the priv-stub side of KRNBUG-AUDIT-005.
|
||
|
||
**Implementation.** Replaced `stub_return_zero` at
|
||
`crates/xenia-kernel/src/exports.rs:193` with a real implementation
|
||
that reads the XEX `XEX_HEADER_SYSTEM_FLAGS` (key `0x00030000`)
|
||
bitmap. Mirrors canary's `XexCheckExecutablePrivilege_entry`
|
||
[xboxkrnl_modules.cc:22-39](../xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_modules.cc#L22-L39):
|
||
`(flags >> priv) & 1` for `priv < 32`, else 0.
|
||
|
||
Plumbing:
|
||
- `xenia-xex/src/header.rs`: added `header_keys::SYSTEM_FLAGS = 0x00030000`.
|
||
- `xenia-xex/src/loader.rs`: added `get_system_flags(&Xex2Header) -> u32`.
|
||
- `xenia-kernel/src/state.rs`: added `pub xex_system_flags: u32` (init 0)
|
||
+ `xex_priv_logged: HashSet<u32>` (one-shot log gate per priv).
|
||
- `xenia-app/src/main.rs`: wired
|
||
`kernel.xex_system_flags = xenia_xex::loader::get_system_flags(&header)`
|
||
alongside the existing `kernel.image_base = base` line in `cmd_exec_inner`.
|
||
|
||
Sylpheed's bitmap is `0x00000400` (only `XEX_SYSTEM_PAL50_INCOMPATIBLE`
|
||
set, bit 10). So priv 10 → 1, priv 11 → 0. Both call sites identified
|
||
in AUDIT-005 now route through the canary-correct branches.
|
||
|
||
**Validation chain (Step 3 of the hand-off):**
|
||
|
||
| step | outcome |
|
||
|---|---|
|
||
| (a) `cargo test --workspace --release` | 588 → 589 (new test `xex_check_executable_privilege_reads_system_flags_bitmap`); all prior green |
|
||
| (b) `--stable-digest -n 100M` lockstep | `instructions=100000013` (was `100000002`). 11-instruction shift is the deterministic guest divergence into the canary-correct branch — verified with 2 identical re-runs. NOT nondeterminism. |
|
||
| (c) AUDIT-005 9-PC probe at -n 500M | All 9 producer probe sites still 0×. **BUT** `kernel.calls{XGetAVPack}` went from `0` → `1` (priv-10 gate flipped — XexCheckExecutablePrivilege itself only fires once for priv 10 because priv-11 site at `sub_824A9710` is downstream and not yet reached). |
|
||
| (d) `--trace-handles-focus=0x1004,0x100c,0x15e0` | All 3 handles still `signal_attempts=0`. The 9 probed PCs are members of two indirection-chain singletons (`sub_821800D8` for 0x100c, `sub_8216F618` for 0x15e0); both are downstream of the priv-11 site too. |
|
||
| (e) Canary kernel-call diff | 10 of the 11 missing exports remain absent. Only `XGetAVPack` was unlocked. The new first-divergence is inside the AV/crypto block — between `XGetAVPack` returning and `XeCryptSha` (still stub_success), Sylpheed's init aborts the block early. |
|
||
| (f) `sylpheed_oracles` (n50m / n2m) | Re-baselined and re-verified across 3 deterministic runs. New `n50m`: `instructions=50000005, imports=407417, swaps=2, draws=0` (was `50000008, 407415, 2, 0`). |
|
||
|
||
**Decisive interpretation.** The fix is **correct but partial**. The
|
||
priv-10 gate at `lr=0x824ab598` flips polarity (was: skip block / now:
|
||
execute block); `XGetAVPack` is now reached as predicted. The priv-11
|
||
gate at `lr=0x824a99a4` lives inside `sub_824A9710`, which the boot
|
||
flow does NOT reach because something in the AV/crypto block (which
|
||
the priv-10 fix unlocked) aborts before completing. So:
|
||
|
||
- `XGetAVPack`: ✅ reached (was missing, now fires once)
|
||
- `XeCryptSha` / `XeKeysConsolePrivateKeySign` / `ObCreateSymbolicLink`
|
||
/ `XamUserReadProfileSettings`: ❌ still missing → AV/crypto block
|
||
aborts early
|
||
- `sub_824A9710` (priv-11 caller) and downstream `XamTaskSchedule` /
|
||
`XamTaskCloseHandle` / `ExTerminateThread` / etc.: ❌ still unreached
|
||
- Parked-handle producers (the 9 PCs): ❌ still 0× (they live in the
|
||
init flow gated on priv-11 or post-priv-11 — same blast radius)
|
||
|
||
**Next-frontier bug (the new gate identified by this fix).** Inside
|
||
sub_824AB578 between `XGetAVPack` (`lr=0x824ab5a4`) and the next
|
||
canary-only call (likely `XeKeysConsolePrivateKeySign`). The
|
||
candidates are:
|
||
|
||
1. **`XGetAVPack` returns wrong value.** Our impl returns `0x16`
|
||
(`crates/xenia-kernel/src/xam.rs:382-384`). Canary returns
|
||
`cvars::avpack` (default 8 = HDMI). Sylpheed comment in canary
|
||
`xam_info.cc:250-251`: "if the result is not 3/4/6/8 they
|
||
explode with errors". `0x16` is not in the accepted set →
|
||
strongly suspect this is the next blocker.
|
||
2. **`XeCryptSha` / `XeKeysConsolePrivateKeySign` are `stub_success`**
|
||
(`exports.rs:188-189`). Returning `STATUS_SUCCESS` without
|
||
side effects on a hashing operation may itself confuse the caller
|
||
if it then reads a hash buffer expecting non-zero bytes.
|
||
|
||
Recommended next session: probe `XGetAVPack` return value (try `0x8`
|
||
to match canary default) — that's a one-line change in `xam.rs:383`.
|
||
If the run advances past, re-diff against canary at the new
|
||
divergence; otherwise the next gate is in `XeCryptSha` /
|
||
`XeKeysConsolePrivateKeySign`.
|
||
|
||
**Trace artifacts:** `audit-runs/post-priv-fix/ours.log` (5.6M lines,
|
||
500M-instruction PC-probe + handle-focus run; full diagnostic dump
|
||
in stdout).
|
||
|
||
---
|
||
|
||
### KRNBUG-XAM-001 — `XGetAVPack` returned non-canary `0x16`; canary default is `8` (HDMI)
|
||
|
||
**Status:** LANDED 2026-05-04. Closes the first follow-up of
|
||
KRNBUG-XEX-001 (the `XGetAVPack` arm flipped 0→1 by the priv-10 fix
|
||
exposed this as the next gate).
|
||
|
||
**One-line fix.** `crates/xenia-kernel/src/xam.rs:382-384`:
|
||
|
||
```rust
|
||
fn xget_avpack(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) {
|
||
ctx.gpr[3] = 8;
|
||
}
|
||
```
|
||
|
||
Was `0x16`. Canary's `XGetAVPack_entry` returns `cvars::avpack`
|
||
(`xam_info.cc:252`); the cvar is `DEFINE_int32(avpack, 8, ...)`
|
||
(`xam_info.cc:35`). Canary's inline comment at `xam_info.cc:250-251`:
|
||
*"Games seem to use this as a PAL check — if the result is not
|
||
3/4/6/8 they explode with errors if not in PAL mode."* `0x16` (=22)
|
||
is not in `{3, 4, 6, 8}`, so Sylpheed's caller treated the response
|
||
as invalid.
|
||
|
||
**Tests.** 589 → 590. New unit test `xget_avpack_returns_hdmi` asserts
|
||
`r3 == 8`. Constant-return change; one assertion is enough.
|
||
|
||
**Validation chain (Step 3 of the hand-off):**
|
||
|
||
| step | outcome |
|
||
|---|---|
|
||
| (a) `cargo test --workspace --release` | 589 → 590; all green. |
|
||
| (b) `--stable-digest -n 100M` lockstep | `instructions=100000010, import_calls=987686, swaps=2`. 3-run identical (counter sets bit-equal). Was `100000013, 407417, 2`. The +2.4× import-call jump is the deterministic guest divergence into the canary-correct branch (the AV/crypto block now executes more thunks). NOT non-determinism. |
|
||
| (c) AUDIT-005 9-PC probe at -n 500M | All 9 producer probe sites still 0× (`grep -c CTOR-PROBE = 0`). |
|
||
| (d) `--trace-handles-focus=0x1004,0x100c,0x15e0` | All 3 handles still `signal_attempts=0`. The producers live deeper in the init flow than what `XGetAVPack` alone unlocks. |
|
||
| (e) Canary kernel-call diff (set-diff `audit-runs/post-fix/ours-500m.log` vs `audit-runs/audit-005/canary.log`) | 11 → **10** canary-only exports. The single match unlocked is `XGetAVPack` (canary=1, ours=1). Remaining absent: `ExTerminateThread ×2`, `KeReleaseSemaphore ×268`, `KeResetEvent ×1`, `NtDeviceIoControlFile ×2`, `ObCreateSymbolicLink ×1`, `XamTaskCloseHandle ×1`, `XamTaskSchedule ×1`, `XamUserReadProfileSettings ×2`, `XeCryptSha ×1`, `XeKeysConsolePrivateKeySign ×1`. |
|
||
| (f) `sylpheed_oracles` (n50m) | Re-baselined: `instructions=50000004, imports=407416, swaps=2, draws=0` (was `50000005, 407417, 2, 0`). 3 deterministic re-runs. Orphan golden `sylpheed_n2m.json` (no test refers to it) also re-baselined for hygiene. |
|
||
|
||
**Decisive interpretation.** The fix is **correct but partial**. The
|
||
`XGetAVPack` value returns are now in the canary-accepted set, and
|
||
the call site at `0x824ab5a0` reaches it; the rest of the AV/crypto
|
||
block at `sub_824AB578` between `XGetAVPack` returning (`lr=0x824ab5a4`)
|
||
and `XeCryptSha` does not execute. The cascade walked exactly **one**
|
||
step.
|
||
|
||
**Telemetry signal — `lr=0x824a97e4` post-fix.** A new `RtlNtStatusToDosError(r3=0xc0000011 ...)` (`STATUS_NOT_IMPLEMENTED`)
|
||
fires from `lr=0x824a97e4` immediately after `XGetAVPack` returns.
|
||
That PC is **inside** `sub_824A9710` (the priv-11 site), so the
|
||
priv-11-caller IS being entered (probably via fall-through from a
|
||
caller of `sub_824AB578`'s post-AV block), but the priv-11 query
|
||
itself never fires — there's a precondition between block entry and
|
||
priv-11 that fails. Almost certainly a downstream sub of the
|
||
AV/crypto block (one of `sub_824ABA98` and friends from AUDIT-005's
|
||
disasm) returns negative NTSTATUS, which routes here.
|
||
|
||
**Next-frontier bug (the new gate identified by this fix).** Between
|
||
`XGetAVPack` (`lr=0x824ab5a4`) and `XeCryptSha`. Two candidates:
|
||
|
||
1. **The 4 unreached siblings inside `sub_824AB578`** —
|
||
`XeCryptSha`, `XeKeysConsolePrivateKeySign`, `NtDeviceIoControlFile ×2`,
|
||
`ObCreateSymbolicLink`. Currently all stubs (`stub_success` for
|
||
the crypto, real for `NtDeviceIoControlFile` but the caller may
|
||
not be reached). Need to diff sub_824AB578 step-by-step from
|
||
`0x824ab5a4` onward to find the failing precondition.
|
||
2. **`sub_824ABA98` returning negative NTSTATUS** (the AUDIT-005
|
||
call site referenced from `lr=0x824a97e4`). If the AV/crypto
|
||
block calls `sub_824ABA98` and gets a negative return, control
|
||
transfers to the error path that triggers the
|
||
`RtlNtStatusToDosError(c0000011)` we observe. That PC is the
|
||
tail signal — finding what's upstream of it is the next probe.
|
||
|
||
**What did NOT change** (per the AUDIT-004 diagnosis chain):
|
||
|
||
- The 9 producer-callsite PCs for handles `0x100c` (5 sites) and
|
||
`0x15e0` (4 sites): still 0× hits.
|
||
- The 3 parked-waiter handles `0x1004 / 0x100c / 0x15e0`:
|
||
still `signal_attempts=0`.
|
||
- `swaps=2` plateau, `draws=0`: unchanged.
|
||
|
||
**Trace artifacts:** `audit-runs/post-fix/ours-500m.log` (5.6M lines,
|
||
500M-instruction PC-probe + handle-focus run, post-AV-pack-fix).
|
||
Same probe configuration as KRNBUG-AUDIT-005's `audit-runs/audit-005/ours.log`,
|
||
re-runnable with the command in that finding's "Reproduce" block.
|
||
|
||
**Reproduce the canary set-diff:**
|
||
|
||
```bash
|
||
python3 - <<'PY'
|
||
import re
|
||
from pathlib import Path
|
||
from collections import Counter
|
||
HERE = Path('audit-runs/audit-005')
|
||
CR = re.compile(r'^d>\s+[0-9A-Fa-f]+\s+([A-Z][A-Za-z0-9_]+)\(')
|
||
OR_ = re.compile(r'probe_calls.*?call=([A-Za-z_][A-Za-z0-9_]*)')
|
||
def extract(p, rx):
|
||
out = Counter()
|
||
with open(p, errors='replace') as f:
|
||
for line in f:
|
||
m = rx.search(line)
|
||
if m: out[m.group(1)] += 1
|
||
return out
|
||
canary = extract(HERE/'canary.log', CR)
|
||
ours = extract('audit-runs/post-fix/ours-500m.log', OR_)
|
||
for n in sorted(set(canary) - set(ours)):
|
||
print(f' {canary[n]:>5} {n}')
|
||
PY
|
||
```
|
||
|
||
### KRNBUG-IO-002 — `nt_query_volume_information_file` block size (LANDED, gate hypothesis FALSIFIED)
|
||
|
||
**Status:** applied (branch `xboxkrnl-vol-allocunit/p0-65536-cluster`,
|
||
single squash commit). Tests 591 → 592. Lockstep
|
||
`instructions=100000010, swaps=2, draws=0` deterministic across two
|
||
reruns. sylpheed_n50m oracle still matches its existing golden.
|
||
|
||
**The fix.** `crates/xenia-kernel/src/exports.rs:1241-1269`,
|
||
`nt_query_volume_information_file` class-3 (FileFsSizeInformation)
|
||
branch, was returning `(total=0x100000, free=0,
|
||
sectors_per_unit=1, bytes_per_sector=2048)`. Replaced with the
|
||
canary-NullDevice byte-identical `(total=0x10, free=0x10,
|
||
sectors_per_unit=0x80, bytes_per_sector=0x200)` (product = 65536).
|
||
Reference: `xenia-canary/src/xenia/vfs/devices/null_device.h:38-46`.
|
||
|
||
**The cascade hypothesis.** AUDIT-006 predicted that fixing this would
|
||
unblock seven canary-only kernel exports — the priv-11 query at
|
||
`sub_824A9710` would fire, `XamTaskSchedule` at `lr=0x824a9a10` would
|
||
fire, the Cache0 callback thread would spawn, and dispatcher 0x100c's
|
||
producer would finally fire (closing the 6-session producer hunt).
|
||
|
||
**The cascade DID NOT FIRE.** Fresh 500 M trace at
|
||
`audit-runs/post-IO-002/ours.log` (692 MB, 5.6 M lines):
|
||
|
||
| Metric | Pre-IO-002 (audit-006) | Post-IO-002 |
|
||
|---|---|---|
|
||
| canary-only kernel exports | 7 | **7 (identical set)** |
|
||
| `XexCheckExecutablePrivilege` calls | 1 (priv=0xA only) | **1** (still no priv=0xB) |
|
||
| `XamTaskSchedule` calls | 0 | **0** |
|
||
| `KeResetEvent / ObCreateSymbolicLink / KeReleaseSemaphore / ExTerminateThread / XamTaskCloseHandle / XamUserReadProfileSettings` | 0 | **0** |
|
||
| `NtQueryVolumeInformationFile` calls | 16 | **16** (no new sites reached) |
|
||
| `swaps` | 2 | 2 |
|
||
| `draws` | 0 | 0 |
|
||
| Worker thread spawns | 19 | 18 (within noise) |
|
||
| `imports` at -n 100M (stable digest) | 987686 | **987630** (-56) |
|
||
|
||
**Diagnostic.** All 16 `NtQueryVolumeInformationFile` calls in our trace
|
||
originate from a single LR `0x82611f38`, a downstream consumer that
|
||
**completes successfully** in both pre- and post-fix runs. The audit-006
|
||
premise that `sub_824ABA98`/`sub_824A9710` consume the volume-info reply
|
||
at the priv-11 gate is therefore likely incorrect, *or* the gate consumes
|
||
a different information class via a different export entirely.
|
||
|
||
**Stop-condition triggered.** Per the IO-002 task brief, this session
|
||
landed the correct fix (it makes our reply byte-identical to canary's
|
||
NullDevice and survives every test we have) but did not pivot to a
|
||
second fix. The branch is kept because the value change is correct
|
||
and unblocks no regression; it is, however, **not load-bearing for
|
||
the priv-11 gate**.
|
||
|
||
**Next-session next-gate hypothesis (untested, ranked by likelihood):**
|
||
|
||
1. **`sub_824A9710` early-exit probe.** Per AUDIT-005 instrumentation
|
||
the priv-11 site has never fired in any session. Use `--pc-probe` on
|
||
the entry of `sub_824A9710` and probe each conditional branch within
|
||
it; whichever branch exits the function before the priv-11
|
||
`XexCheckExecutablePrivilege` call site is the actual gate.
|
||
2. **Different info-class.** `nt_query_information_file` (class 5
|
||
`FileStandardInformation`, class 22 etc.) or
|
||
`nt_query_full_attributes_file` may be the actual consumer. The
|
||
16 calls at LR `0x82611f38` are *not* the gate even though they
|
||
complete successfully.
|
||
3. **Mis-attributed disasm.** AUDIT-005's identification of
|
||
`sub_824ABA98 = VerifyDirBlockSize` came from disasm reading; IO-001's
|
||
runtime trace already invalidated parts of that attribution.
|
||
Re-disassemble `sub_824A9710` with `xenia-rs dis --json --at 0x824a9710`
|
||
and walk every conditional that might exit before the priv-11 query.
|
||
4. **A different IOCTL.** `NtDeviceIoControlFile` is now reachable
|
||
(KRNBUG-IO-001 unblocked it); some FsCtl response we return may be
|
||
the new gate.
|
||
|
||
**Trace artifacts:**
|
||
- `audit-runs/post-IO-002/ours.log` — 500 M trace, post-fix
|
||
- `audit-runs/post-IO-002/canary.log` — copy of the audit-006 canary oracle
|
||
- `audit-runs/post-IO-002/diff.py` — copy of audit-006 diff tool
|
||
- `audit-runs/post-IO-002/lock_n100m_run{1,2}.json` — bit-identical lockstep digests
|
||
- `audit-runs/post-IO-002/canary_only.txt` — set-difference output (the 7-entry list)
|
||
- `audit-runs/post-IO-002/canary_exports.txt`, `ours_exports.txt` — sorted unique export names
|
||
|
||
|
||
---
|
||
|
||
## KRNBUG-AUDIT-007 — branch-probe instrumentation + sub_824A9710 exit-branch identification (2026-05-04)
|
||
|
||
### Outcome
|
||
|
||
**`--branch-probe` instrumentation landed (read-only diagnostic). Runtime trace decisively identified the priv-11 gate.**
|
||
- 592→592 tests; lockstep `instructions=100000010, swaps=2, draws=0` deterministic across reruns
|
||
(`audit-runs/audit-007/lock_post_branchprobe.json` ≡ `lock_post_branchprobe_run2.json`
|
||
≡ `audit-runs/post-IO-002/lock_n100m_run1.json`).
|
||
- Branch: `investigate-sub-824a9710/p0-branch-probe` — kept (instrumentation is reusable).
|
||
|
||
### Decisive runtime evidence
|
||
|
||
`audit-runs/audit-007/sub_824A9710-trace.log`:
|
||
```
|
||
BRANCH-PROBE pc=0x824a9710 tid=1 hw=0 cycle=5363003 r3=0x00000000 lr=0x824a9acc
|
||
BRANCH-PROBE pc=0x824a97e0 tid=1 hw=0 cycle=5369559 r3=0xc0000034 lr=0x824a9940
|
||
BRANCH-PROBE pc=0x824a9a98 tid=1 hw=0 cycle=5369562 r3=0x00000002 lr=0x824a97e4
|
||
```
|
||
|
||
The probe at `0x824a97e0` (the failure landing pad) captured `r3=0xC0000034`, `lr=0x824a9940` (= the
|
||
`cmpi 0,r3,0` PC after `bl sub_824ABD88` at `0x824a993c`). This pinpoints:
|
||
- **Exit branch**: `0x824a9944` (`bc 12, lt, 0x824A97E0`) — taken because r3 was 0xC0000034 < 0.
|
||
- **Responsible bl**: `0x824a993c` → `sub_824ABD88` first call.
|
||
- **Status code**: `0xC0000034` = `STATUS_OBJECT_NAME_NOT_FOUND`.
|
||
|
||
### Root-cause chain through sub_824ABD88
|
||
|
||
The function-detector's `end_address=0x824abe3c` for sub_824ABD88 was a truncation artifact;
|
||
the function actually runs to `0x824ac184`. Within that range the `0xC0000034` is **HARDCODED**
|
||
at `0x824abea8-0x824abeac`:
|
||
```
|
||
0x824abe90 bl NtDeviceIoControlFile (FsCtlCode=0x74004, out_buf=r1+160, out_len=16)
|
||
0x824abe94 cmpi 0, r3, 0
|
||
0x824abe98 bc 12, lt, 0x824abeb8 # if r3 < 0 → failure cleanup (NOT taken; stub returned 0 = success)
|
||
0x824abe9c ld r10, 168(r1) # load doubleword from [out_buf+8]
|
||
0x824abea0 cmpi cr6, 1, r10, 0 # 64-bit cmp r10 == 0
|
||
0x824abea4 bc 4, 4*cr6+eq, 0x824abeb0 # if NOT eq, skip the assignment
|
||
0x824abea8 addis r3, r0, 0xC000 # r3 = 0xC0000000
|
||
0x824abeac ori r3, r3, 0x34 # r3 = 0xC0000034 (STATUS_OBJECT_NAME_NOT_FOUND)
|
||
0x824abeb0 cmpi cr6, 0, r3, 0
|
||
0x824abeb4 bc 4, 4*cr6+lt, 0x824abecc # if NOT lt → success path; r3 < 0 → NOT taken
|
||
0x824abeb8 or r28, r3, r3 # save 0xC0000034
|
||
0x824abebc lwz r3, 96(r1)
|
||
0x824abec0 bl NtClose
|
||
0x824abec4 or r3, r28, r28 # restore failure status
|
||
0x824abec8 b 0x824abe34 # epilogue → return 0xC0000034
|
||
```
|
||
|
||
The game expects the IOCTL response's upper 8 bytes to be non-zero. Our
|
||
`NtDeviceIoControlFile` is registered as `stub_success` at
|
||
`crates/xenia-kernel/src/exports.rs:90` — returns 0 (SUCCESS) but writes nothing
|
||
into the OUT buffer. The fresh stack frame has zero at `[r1+168]`, so the check
|
||
at `0x824abea4` falls through to the hardcoded failure assignment.
|
||
|
||
### Canary reference
|
||
|
||
`audit-runs/post-IO-002/canary.log` lines 1196-1209 show canary calls
|
||
`NtDeviceIoControlFile(handle, ..., FsCtlCode=0x74004, ..., out_buf, out_len=16)`,
|
||
gets a populated 16-byte response (whose upper 8 bytes are non-zero), then proceeds
|
||
through 17× NtWriteFile zero-fill, NtClose, NtCreateFile (Cache0\), NtQueryVolumeInformationFile
|
||
class=3, NtClose, and finally **`XexCheckExecutablePrivilege(0x0000000B)`** — the
|
||
priv-11 site that has never fired in our run. Immediately followed by
|
||
**`XamTaskSchedule(824A93C8, 828A28F0, ...)`** — the canary-only export hunt's
|
||
gate-pivot call.
|
||
|
||
The IOCTL implementation in canary lives in `xenia-canary/src/xenia/vfs/devices/null_device.{h,cc}`
|
||
(`NullDevice::IoControl`) — the device's `IoControl` writes the structured payload
|
||
that the game-side check consumes.
|
||
|
||
### Next session: KRNBUG-IO-003
|
||
|
||
**Where:** `crates/xenia-kernel/src/exports.rs:90` — replace the
|
||
`stub_success` registration with a real `nt_device_io_control_file`.
|
||
|
||
**Minimum viable fix:** for FsCtlCode=0x74004, write any non-zero u64 at
|
||
`[out_buf+8]`. That alone clears the gate.
|
||
|
||
**Canary-faithful fix:** mirror `NullDevice::IoControl` for FsCtlCodes
|
||
`0x70000` (8-byte response, consumed at `sub_824ABD88:0x824abe3c` for a
|
||
log2/shift count) and `0x74004` (16-byte response, partition geometry).
|
||
Fall through to `STATUS_NOT_IMPLEMENTED` for unrecognized codes so future
|
||
divergences surface.
|
||
|
||
**Falsifiable cascade prediction:**
|
||
- `XexCheckExecutablePrivilege` count: **1 → 2** (priv=0xA + priv=0xB).
|
||
- `XamTaskSchedule` count: **0 → 1**.
|
||
- canary-only export count: **7 → ≤ 3**.
|
||
- Worker thread spawn at `ExCreateThread(entry=0x82181830, ctx=0x828F3D08)` —
|
||
the parked-handle 0x100c producer fires.
|
||
- `swaps=2 draws=0` plateau persists (renderer is multi-causal).
|
||
|
||
**Failure modes to watch for:**
|
||
- (α) Re-running `--branch-probe` should show a NEW exit branch in
|
||
`sub_824A9710` (one of `0x824a996c`, `0x824a9998`, `0x824a9a18`) if a downstream
|
||
helper has its own unimplemented dependency.
|
||
- (β) sub_824ABA98's analogous failure path (called at 0x824a9950, 0x824a9990)
|
||
may surface if its own kernel-call dependencies are stubs.
|
||
- (γ) `nt_write_file` against the synth empty-file Cache0 path needs to handle
|
||
the 17× zero-fill loop; if our implementation rejects writes to a zero-byte
|
||
file, the cascade stalls just past the IOCTL fix.
|
||
|
||
### Files added / modified (instrumentation only)
|
||
|
||
- `crates/xenia-kernel/src/state.rs` — added `branch_probe_pcs: HashSet<u32>`
|
||
field + `fire_branch_probe_if_match(hw_id)` method emitting a single compact
|
||
`BRANCH-PROBE` line per fire (pc, tid, hw, cycle, r3, lr, cr0/cr6). Sister to
|
||
`fire_ctor_probe_if_match`; no back-chain walk. ~40 LOC.
|
||
- `crates/xenia-app/src/main.rs` — `--branch-probe` CLI flag (env var
|
||
`XENIA_BRANCH_PROBE`), parser, and call in `worker_prologue`. ~30 LOC.
|
||
|
||
### Probe-machinery limitation
|
||
|
||
The probe fires only when the **block head** at the matched PC is dispatched —
|
||
mid-block PCs in the request set don't trigger because the prologue runs once
|
||
per block, not once per instruction. In this trace: function entries, failure
|
||
landing pads (`0x824a97e0`), and external-call return PCs (`0x824a9a98`) all
|
||
hit. Internal `bc` PCs (`0x824a9944`, `0x824a9958`, ...) were silent. The data
|
||
captured was sufficient — the failure landing PC + LR pair uniquely identified
|
||
the upstream branch — but if a future audit needs every-branch coverage, the
|
||
helper call would need to move from `worker_prologue` into the per-instruction
|
||
step loop (or a custom block-scan that flags branches matching the request
|
||
list).
|
||
|
||
### Trace artifacts (re-runnable)
|
||
|
||
- `audit-runs/audit-007/sub_824A9710-trace.log` — 5 BRANCH-PROBE lines + thread diagnostics.
|
||
- `audit-runs/audit-007/sub_824A9710-trace.err` — full kernel-call trace + counter dump.
|
||
- `audit-runs/audit-007/lock_post_branchprobe.json`, `lock_post_branchprobe_run2.json` — lockstep digests.
|
||
|
||
Re-run command:
|
||
```
|
||
PROBE_LIST="0x824a9aa0,0x824a9128,0x824a9710,0x824a9778,0x824a9788,0x824a9790,0x824a97dc,0x824a97e0,0x824a9824,0x824a9828,0x824a9840,0x824a9850,0x824a985c,0x824a9870,0x824a9880,0x824a9888,0x824a9918,0x824a9944,0x824a9958,0x824a996c,0x824a9998,0x824a999c,0x824a99a0,0x824a99a8,0x824a9a10,0x824a9a18,0x824a9a60,0x824a9a78,0x824a9a98"
|
||
./target/release/xenia-rs exec sylpheed.iso --halt-on-deadlock \
|
||
--branch-probe="$PROBE_LIST" -n 500_000_000 \
|
||
> audit-runs/audit-007/sub_824A9710-trace.log \
|
||
2> audit-runs/audit-007/sub_824A9710-trace.err
|
||
```
|
||
|
||
## KRNBUG-IO-003 — `NtDeviceIoControlFile` real implementation (LANDED 2026-05-04)
|
||
|
||
### Outcome
|
||
|
||
**Replaced `stub_success` registration with a real `nt_device_io_control_file` mirroring canary `NullDevice::IoControl` for FsCtlCodes 0x70000 + 0x74004.**
|
||
- 592 → 594 tests; lockstep `instructions=100000019 imports=987524 swaps=2 draws=0` deterministic across run1/run2/run3 (`audit-runs/post-IO-003/lock_n100m_run{1,2,3}.json`, all byte-identical).
|
||
- Branch: `xboxkrnl-ioctl/p0-fsctl-mountinfo` (no-ff merge).
|
||
- `sylpheed_n50m` golden re-baselined `instructions=50000004→50000003`, `imports=407362→407255`. `sylpheed_n2m` unchanged.
|
||
|
||
### Audit-007 prediction scorecard
|
||
|
||
| # | Prediction | Pre | Post | Held? |
|
||
|---|---|---|---|---|
|
||
| (a) | `cargo test --workspace --release` green | 592 | 594 | ✓ |
|
||
| (b) | Lockstep determinism preserved | bit-identical | bit-identical (run1≡run2≡run3) | ✓ |
|
||
| (c) | `XexCheckExecutablePrivilege` count: 1 → ≥2 | 1 | 2 | ✓ |
|
||
| (c) | `XamTaskSchedule` count: 0 → ≥1 | 0 | 1 | ✓ |
|
||
| (e) | canary-only exports: 7 → ≤3 | 7 | 3 | ✓ |
|
||
| (d) | 0x100c worker spawn (handle goes from UNCREATED to created+signaled) | UNCREATED | UNCREATED | **✗** |
|
||
| (d) | 0x1004 signal_attempts > 0 | 0 | 0 | **✗** |
|
||
| (d) | 0x15e0 signal_attempts > 0 | 0 | 1 (primary=1, "not stuck") | ✓ (new) |
|
||
| (f) | Worker thread spawn count: 19 → higher | 19 | 19 | **✗** |
|
||
|
||
5/8 predictions held; cascade fired but not as far as audit-007 expected. Specifically:
|
||
- The priv-11 query DOES fire → flows into `XamTaskSchedule` → 0x15e0 semaphore-signal pump now runs.
|
||
- The audit-006 `canary_only.txt` 7 entries reduce by 4 (`KeResetEvent`, `ObCreateSymbolicLink`, `XamTaskCloseHandle`, `XamTaskSchedule`). Still missing: `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`.
|
||
- `XeCryptSha` (1) and `XeKeysConsolePrivateKeySign` (1) also now fire (were 0).
|
||
- Per-handle 0x100c stays UNCREATED — the producer chain that should spawn the 0x100c worker is gated downstream of where IO-003 unblocks. The 7→3 canary-only entries that remain (`ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`) are the next clue: any of those could be the next gate.
|
||
|
||
### Files modified
|
||
|
||
- `crates/xenia-kernel/src/exports.rs:90` — `stub_success` → `nt_device_io_control_file`.
|
||
- `crates/xenia-kernel/src/exports.rs` (new function) — body mirrors `xboxkrnl_io.cc:645-678` + `null_device.cc` (canary). Stack args 9-10 read from `[sp+0x54]` / `[sp+0x5C]` per Xbox 360 PowerPC ABI (parameter save area at sp+0x14 + 64 bytes spill = sp+0x54, confirmed by disasm of caller `sub_824ABD88` at `0x824abe04-0x824abe10` and `0x824abe78-0x824abe70`).
|
||
- `crates/xenia-kernel/src/exports.rs` (new tests) — `nt_device_io_control_file_drive_geometry` (FsCtlCode 0x70000) + `nt_device_io_control_file_partition_info_unblocks_gate` (FsCtlCode 0x74004 — asserts OUT+8 ≠ 0, the gate condition).
|
||
- `crates/xenia-app/tests/golden/sylpheed_n50m.json` — re-baselined.
|
||
|
||
### Trace artifacts
|
||
|
||
- `audit-runs/post-IO-003/lock_n100m_run{1,2,3}.json` — three byte-identical 100M lockstep runs.
|
||
- `audit-runs/post-IO-003/lock_n500m.json` — 500M lockstep digest (`instructions=500000010 imports=5629676`).
|
||
- `audit-runs/post-IO-003/exec_trace_focus_500m.log` — `--trace-handles-focus=0x1004,0x100c,0x15e0` at 500M.
|
||
|
||
### Next session candidates
|
||
|
||
The 0x100c worker still doesn't spawn. Three of audit-006's canary-only entries (`ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`) remain canary-only — any of them may be the next downstream gate. Re-running `--branch-probe` against `sub_824A9710` would now show a new exit branch (the priv-11 site fires, so the failure mode has shifted).
|
||
|
||
## KRNBUG-AUDIT-008 — IO-003 model reset; next gate is β-class job-submitter unreached (DIAGNOSTIC 2026-05-05)
|
||
|
||
### Outcome
|
||
|
||
**Model reset on IO-003 cascade.** Branch-probe trace at the post-priv-11 cluster decisively shows the 0x100c worker IS spawned as `tid=3` with `ctx=0x828F3D08, entry=0x82181830`, parked on lifecycle event handle `0x1020` (signals=0). The IO-003 audit memory's "0x100c UNCREATED" claim was wrong; the handle audit already had `handle=0x00001020 waiters(tid)=[3]` but the trace dump didn't connect tid=3 to the 0x100c dispatcher. Same correction applies to the 0x1004 worker (tid=11).
|
||
|
||
The actual next gate is **β-class** (internal-sub unreached): the 5 non-create-chain callers of `sub_821800D8` (job-submitter shims with the pattern `bl outer_getter; lwz r3, 80(r3); bl sub_824AA1D8`) are never called. Their parent functions live in the **0x82287000-0x82292FFF module range** — likely renderer / scene-graph subsystem.
|
||
|
||
### Decisive runtime evidence
|
||
|
||
`audit-runs/audit-008/branch-probe.trace`:
|
||
|
||
```
|
||
BRANCH-PROBE pc=0x824a9a14 tid=1 cycle=5378562 -- main: post-XamTaskSchedule
|
||
BRANCH-PROBE pc=0x824a93c8 tid=2 cycle=0 r3=0x828a28f0 -- spawned thread enters callback (matches canary's 0x824A93C8/0x828A28F0)
|
||
BRANCH-PROBE pc=0x824a9540 tid=2 cycle=4232 -- spawned thread post-StfsCreateDevice cmpi
|
||
BRANCH-PROBE pc=0x824a9a44 tid=1 cycle=5378576 -- main: post-KeWaitForSingleObject(0x8287094C)
|
||
BRANCH-PROBE pc=0x824a9a4c tid=1 cycle=5378579 -- main: post-KeResetEvent
|
||
BRANCH-PROBE pc=0x824a9a98 tid=1 cycle=5378596 -- main: sub_824A9710 epilogue
|
||
BRANCH-PROBE pc=0x824a9acc tid=1 cycle=5378609 -- main: sub_824A9AA0 return
|
||
BRANCH-PROBE pc=0x8216eaa0 tid=1 cycle=5378617 -- main: bl sub_82181C28 callsite
|
||
BRANCH-PROBE pc=0x82181c28 tid=1 cycle=5378618 -- main entered sub_82181C28
|
||
BRANCH-PROBE pc=0x821800d8 tid=1 cycle=5378630 -- main entered sub_821800D8 (singleton getter for 0x100c)
|
||
BRANCH-PROBE pc=0x82181750 tid=1 cycle=5378645 r3=0x828f3d08 -- main entered sub_82181750 ctor
|
||
BRANCH-PROBE pc=0x821817c0 tid=1 cycle=5378712 r3=0x00001020 -- post-sub_824A9F18 (lifecycle event=0x1020)
|
||
BRANCH-PROBE pc=0x82181830 tid=3 cycle=0 r3=0x828f3d08 lr=0xbcbcbcbc -- 0x100C WORKER SPAWNED
|
||
BRANCH-PROBE pc=0x82181838 tid=3 cycle=1 -- past entry thunk
|
||
BRANCH-PROBE pc=0x821817fc tid=1 cycle=5378786 r3=0x00001024 -- main: post-sub_82172370, thread handle=0x1024
|
||
BRANCH-PROBE pc=0x82180120 tid=1 cycle=5378951 -- main: post-atexit
|
||
BRANCH-PROBE pc=0x82181c58 tid=1 cycle=5378957 r3=0x828f3d08 -- main: bl sub_821800D8 returned
|
||
```
|
||
|
||
### Mechanical chain (cross-checked vs disasm)
|
||
|
||
1. main (sub_8216EA68) returns from sub_824A9AA0 at cycle 5378609.
|
||
2. main calls `sub_82181C28` at `0x8216eaa0` (cycle 5378617). `sub_82181C28` is a Meyers singleton getter that checks `[0x828F3D98]` flag.
|
||
3. First call → flag is 0 → falls through to `bl sub_821800D8` at `0x82181c54`.
|
||
4. `sub_821800D8` is the 0x100c singleton getter. Checks `[0x828F3D78]` flag bit 0. First call → bit 0 is 0 → falls through to `bl sub_82181750` at `0x82180110`.
|
||
5. `sub_82181750` is the constructor. With `r3 = this = 0x828F3D08` (the dispatcher).
|
||
6. Constructor calls `bl sub_824A9F18` (allocates a lifecycle event); returns r3=0x1020.
|
||
7. Constructor calls `bl sub_82172370` at `0x821817f8` (the ExCreateThread wrapper) with r3=0x20000 (stack), r4=0x82181830 (entry), r5=0x828F3D08 (ctx).
|
||
8. Worker thread spawns as tid=3 at PC=0x82181830 → through the entry thunk → 0x82181838 (worker body).
|
||
9. Worker body reads `[0x828F3D08+76] = 0x1020` (lifecycle event handle), waits on it.
|
||
10. **Wait never satisfied** — handle 0x1020 has `signals=0, waits=1, wakes=0` in the handle-audit dump.
|
||
|
||
### Where the gate actually is
|
||
|
||
`sub_821800D8` xrefs:
|
||
|
||
| caller PC | from func | role |
|
||
|-----------|-----------|------|
|
||
| 0x82181c54 | sub_82181C28 | **create chain** (ran successfully — see trace above) |
|
||
| 0x821802d8 | sub_82180158 | job-submitter shim |
|
||
| 0x821806e0 | sub_821805C8 | job-submitter shim |
|
||
| 0x82180b28 | sub_82180A10 | job-submitter shim |
|
||
| 0x82180ea0 | sub_82180D90 | job-submitter shim |
|
||
| 0x82181254 | sub_821810E0 | job-submitter shim |
|
||
|
||
Each shim is a 5-instruction leaf (`bl getter; lwz r3, 80(r3); bl sub_824AA1D8`) — the canonical "get-then-enqueue" pattern. `sub_824AA1D8` is the universal dispatcher-submit primitive that signals the lifecycle event.
|
||
|
||
The 5 shims' parent functions are in the **0x82287000-0x82292FFF module range** (sub_82292838, sub_822878A8, sub_8228D760, sub_822900A8, sub_822919C8, sub_8228FDB8). This module is downstream of the cache-init code we've been working on and is almost certainly renderer/scene-graph related.
|
||
|
||
### Discipline gate
|
||
|
||
Per task brief (audit-008 session), gate fails on:
|
||
- Box 1: gate is NOT a single stubbed import (β-class, not α-class).
|
||
- Box 4: no sharp 4-dim cascade prediction can be written without first identifying which submitter should fire first.
|
||
|
||
**Hand back. No fix this session.**
|
||
|
||
### Follow-up probe set for next session
|
||
|
||
Probe parent functions of the 5 shims to find which path actually fires:
|
||
|
||
```
|
||
PROBE_LIST="0x82292838,0x822878a8,0x8228d760,0x822900a8,0x822919c8,0x8228fdb8,
|
||
0x82180158,0x821805c8,0x82180a10,0x82180d90,0x821810e0,
|
||
0x824aa1d8"
|
||
```
|
||
|
||
Whichever target fires identifies the producer path; whichever doesn't names the gate.
|
||
|
||
### Trace artifacts
|
||
|
||
- `audit-runs/audit-008/branch-probe.trace` — 17 BRANCH-PROBE lines (clean extract).
|
||
- `audit-runs/audit-008/probe-100m.log` — full stdout.
|
||
- `audit-runs/audit-008/probe-100m.err` — full stderr trace.
|
||
|
||
### Files modified
|
||
|
||
None. KRNBUG-AUDIT-007's `--branch-probe` machinery was sufficient. No git commit — no code changes.
|
||
|
||
## KRNBUG-AUDIT-009 — renderer cluster fully unreached; gate is structurally above 0x82287xxx-0x82294xxx (DIAGNOSTIC 2026-05-05)
|
||
|
||
### Outcome
|
||
|
||
**Stop condition 1 triggered.** Branch-probed all 21 PCs proposed by AUDIT-008 (12 renderer-cluster parents + shims + dispatcher) plus the AUDIT-005 9-PC producer-callsite set. **0/21 fired** at -n 500M. The 0x82287000-0x82294000 cluster is not entered at all. The gate is structurally above the cluster — outside its call boundary — so a deeper renderer-side probe would land on dead code. Per task brief stop condition 1, hand back with a higher-up probe set; no Phase 2 attempted.
|
||
|
||
### Decisive runtime evidence
|
||
|
||
`audit-runs/audit-009/probe-500m.err`:
|
||
|
||
- `branch probes armed: 21 (0x8216f9d4 ... 0x824aa1d8)`
|
||
- `BRANCH-PROBE` line count in stderr: **0**.
|
||
- `instructions=500000010 import_calls=5629676 unimplemented=0` — completed without halt.
|
||
- Final state, main: `tid=1 hw=0 state=Ready pc=0x822f1c60 lr=0x822f1be0` — inside `sub_822F1AA8` (frame-poll loop, between two `XNotifyGetNext` calls at 0x822f1bdc / 0x822f1c14). LR-of-LR points back into the same function.
|
||
- Counters: `XNotifyGetNext=1,489,741`, `NtWaitForSingleObjectEx=1,489,801`, `NtWaitForMultipleObjectsEx=865,493`, `RtlEnter/LeaveCriticalSection=889,109` each, `VdSwap=2`. Main is service-loop polling forever; no forward progress past frame-poll.
|
||
- 18 worker threads spawned (parity with audit-008 baseline + 2 new entry trampolines for 0x822c6870 / 0x824563e0 / 0x823dde30 / 0x823ddb50 that weren't catalogued before): tid=3 (0x100c worker, ctx=0x828F3D08, parked on lifecycle event 0x1020), tid=11 (0x1004 worker, ctx=0x828F3EC0, parked on event 0x1004), tid=17 (0x15e0 worker, ctx=0x828F4070, parked on event 0x15F4 — confirms post-IO-003 spawn at the new tid).
|
||
- canary-only kernel exports unchanged from audit-008: `{ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings}` (3 entries).
|
||
- `signal_attempts=0` on parked handles 0x1004, 0x100c (= event 0x1020), 0x15e0 (= event 0x15F4), 0x10c4. Same parked state as audit-008.
|
||
|
||
### Mechanical interpretation
|
||
|
||
- **Box 1 of the 12 PCs (parents):** `sub_82292838, sub_822878A8, sub_8228D760, sub_822900A8, sub_822919C8, sub_8228FDB8` — never entered.
|
||
- **Box 2 of the 12 PCs (shims):** `sub_82180158, sub_821805C8, sub_82180A10, sub_82180D90, sub_821810E0` — never entered. (These are leaf shims with the `bl outer_getter; lwz r3, OFFSET(r3); bl sub_824AA1D8` pattern.)
|
||
- **Box 3 (universal dispatcher):** `sub_824AA1D8` — never entered. The dispatcher serves both 0x100c and 0x15e0 clusters; its non-entry confirms NEITHER cluster's job-submit path runs, not just the 0x100c side.
|
||
- **AUDIT-005 9-PC producer callsites (5 × 0x100c shims + 4 × 0x15e0 shims):** never entered.
|
||
|
||
This eliminates the audit-008 working hypothesis that the gate sat among the 5 known callers of `sub_821800D8`. The gate is at least one level higher — above the cluster's external entry boundary.
|
||
|
||
### Cluster shape (from sylpheed.db xrefs)
|
||
|
||
The 0x82287000-0x82294000 cluster is **internally cohesive but externally unreachable via direct calls**. Its level-1 root functions (where call hierarchy starts within the cluster) have only self-call xrefs — i.e. the cluster is reached only via indirect calls (function pointers / vtables) from outside. The 6 candidate parents from audit-008 sit deep enough that ANY upstream gate looks the same from their level.
|
||
|
||
External entry points worth probing next:
|
||
- `sub_82293448` (0x82293448) — level-1 root, only self-recursion xrefs.
|
||
- `sub_822919C8` (0x822919C8) — level-1 root, only self-recursion xrefs.
|
||
- `sub_82288028` (0x82288028) — 8 callers, all in-cluster, but a hub.
|
||
- `sub_82292D80` (0x82292D80) — 1 caller, in-cluster (sub_82293448).
|
||
- `sub_822851E0` (0x822851E0) — has 2 in-cluster callers (sub_82284BA0, sub_82290BC8); reached transitively from `sub_82289FD0`.
|
||
- `sub_82286BC8` (0x82286BC8) — only sub_822851E0 calls it.
|
||
|
||
NEW thread entry trampolines spawned post-IO-003 (these didn't exist in audit-008's tid set; mid-run kernel-call telemetry shows ExCreateThread at these PCs):
|
||
- 0x822c6870 (tid=14 + tid=15, parallel duplicates, ctx=0x828f3300)
|
||
- 0x824563e0 (tid=16, ctx=0x828f3e70)
|
||
- 0x823dde30 (tid=18, ctx=0x828f3c4c)
|
||
- 0x823ddb50 (tid=19 + tid=20, parallel duplicates, ctx=0x828f3c88)
|
||
|
||
These are likely XAM/system-event dispatchers, not renderer producers, but their entries are unprobed — worth folding into the next probe set to confirm they are not the missing edge.
|
||
|
||
### Why main parks at sub_822F1AA8
|
||
|
||
main's call sequence (from xrefs of sub_8216EA68): the priv-11/cache-init cluster (`sub_824A9AA0`), the 0x100c create chain (`sub_82181C28`), `sub_82181298` (a 964-byte function — likely 0x1004 create chain), then a series of `sub_8216E858 / sub_82448470 / sub_8216F218 / sub_82448XXX` calls (probably config / xconfig / atexit), then finally:
|
||
|
||
```
|
||
0x8216ecc4: sub_822F17F0 (684 bytes — pre-poll setup, calls sub_82611CD8/sub_825F1000/sub_825F14D0/sub_824C1A38/sub_824BD460×2/sub_824BD580×2/sub_824B3798/sub_824B40B0/sub_824C2BF8/sub_824CE348/sub_824C76D0/sub_824CE4D0)
|
||
0x8216eccc: sub_822F1AA8 (frame-poll #1) ← we are here, looping forever
|
||
0x8216ee10: sub_822F1AA8 (frame-poll #2) — never reached
|
||
```
|
||
|
||
Two interpretations are plausible:
|
||
1. **sub_822F1AA8 is a finite poll** that exits when XNotifyGetNext returns a particular notification (e.g. dashboard signin completion / profile load). Some XAM event main expects is never delivered.
|
||
2. **sub_822F1AA8 is an event pump for the FIRST half of init**, calling work-items that should drive the renderer subsystem. If the work-items are dispatched here and the dispatch path goes via an indirect call into the 0x82287xxx cluster, then the missing edge is a function-pointer/vtable that's never populated.
|
||
|
||
Both interpretations are consistent with the 0/21 probe data. Probing the entry of sub_822F1AA8's CALLEE list (the calls inside the 1.49M-iteration loop) will discriminate.
|
||
|
||
### Discipline gate
|
||
|
||
| # | Condition | Pass? |
|
||
|---|---|---|
|
||
| 1 | Phase 1 named a single failing kernel/xam import (α) or a narrow internal-sub bug | **NO** — 0 PCs fired |
|
||
| 2 | Canary impl small (<80 LOC) | N/A |
|
||
| 3 | Sharp 4-dim cascade prediction | **NO** — no candidate fix |
|
||
| 4 | No new ABI plumbing | N/A |
|
||
| 5 | Fix doesn't touch renderer subsystem | N/A |
|
||
|
||
**Gate fails on box 1 + 3. STOP. Hand back per stop condition 1.** No code changes this session.
|
||
|
||
### Follow-up probe set for next session
|
||
|
||
```
|
||
PROBE_LIST=
|
||
# Renderer-cluster level-1 roots (never entered if gate is above):
|
||
0x82293448,0x822919c8,0x82288028,0x82292d80,0x822851e0,0x82286bc8,
|
||
# Newly spawned thread entry trampolines (unprobed, may be system-side):
|
||
0x822c6870,0x824563e0,0x823dde30,0x823ddb50,
|
||
# Main's frame-poll loop entry + its callee list (XNotifyGetNext consumer):
|
||
0x822f1aa8,0x822f1be0,0x822f1c14,0x822f1d00,
|
||
# Main's continuation (only fires if main exits frame-poll #1):
|
||
0x822f1638,0x821506b8,0x8216f088,0x82150ef8,
|
||
0x82173360,0x82173530,0x8216f170,0x824a9ad8
|
||
```
|
||
|
||
Whichever entries fire bound the live path; whichever don't bound the gate.
|
||
|
||
If `sub_822F1AA8` fires once but never exits → main is stuck waiting for a XAM notification or critical-section signal. Look for which `XamNotifyCreateListener`-registered ID the loop expects.
|
||
If `sub_822F1AA8` fires AND exits → main reaches `sub_822F1638` etc.; gate is further down.
|
||
If the cluster level-1 roots fire → gate is INSIDE the cluster (renderer β-recursion), and the brief's "no renderer fixes" rule binds.
|
||
|
||
### Trace artifacts
|
||
|
||
- `audit-runs/audit-009/probe-500m.log` — final state + thread diag + handle audit + full counter table.
|
||
- `audit-runs/audit-009/probe-500m.err` — full stderr trace (kernel call log, 187 KB).
|
||
- `audit-runs/audit-009/branch-probe.trace` — empty (0 BRANCH-PROBE lines emitted).
|
||
|
||
Re-run command:
|
||
|
||
```
|
||
cd xenia-rs
|
||
PROBE="0x82292838,0x822878a8,0x8228d760,0x822900a8,0x822919c8,0x8228fdb8,\
|
||
0x82180158,0x821805c8,0x82180a10,0x82180d90,0x821810e0,0x824aa1d8,\
|
||
0x821802d8,0x821806e0,0x82180b28,0x82180ea0,0x82181254,\
|
||
0x8216f9d4,0x8216fc08,0x821700b8,0x821700f4"
|
||
./target/release/xenia-rs exec sylpheed.iso \
|
||
--halt-on-deadlock --branch-probe="$PROBE" \
|
||
--trace-handles-focus=0x1004,0x100c,0x15e0,0x1020,0x10c4 \
|
||
-n 500000000 \
|
||
> audit-runs/audit-009/probe-500m.log 2> audit-runs/audit-009/probe-500m.err
|
||
```
|
||
|
||
### Files modified
|
||
|
||
None. KRNBUG-AUDIT-007's `--branch-probe` machinery was sufficient. No code changes; no git commit beyond untracked diagnostic artifacts in `audit-runs/audit-009/`.
|
||
|
||
---
|
||
|
||
## KRNBUG-AUDIT-010 — XNotify delivery diff: 4 missing startup notifications gate dispatcher invocation (DIAGNOSTIC 2026-05-05)
|
||
|
||
### Status
|
||
|
||
**READ-ONLY DIAGNOSTIC**. Branch (α) — canary delivers 4 specific
|
||
startup notifications we don't. Discipline gate fails on box 3
|
||
(L1-root prediction); no fix landed. Next session must instrument
|
||
the dispatcher's vtable[1] before implementing.
|
||
|
||
### Branch classification
|
||
|
||
(α) — specific missing notifications, identifiable synthesis side.
|
||
|
||
### Verified ground truth
|
||
|
||
Our impl:
|
||
- `crates/xenia-kernel/src/xam.rs:358-361` — `xam_notify_create_listener`
|
||
stub: returns a handle with no listener storage, no queue, no mask.
|
||
- `crates/xenia-kernel/src/xam.rs:363-366` — `xnotify_get_next` stub:
|
||
always returns r3=0.
|
||
- `crates/xenia-kernel/src/objects.rs:14-77` — `KernelObject` has no
|
||
`NotifyListener` variant.
|
||
|
||
Canary:
|
||
- `xenia-canary/src/xenia/kernel/kernel_state.cc:1013-1033` —
|
||
`RegisterNotifyListener` enqueues 4 startup notifications on the
|
||
first listener whose mask covers `kXNotifySystem` / `kXNotifyLive`:
|
||
- `kXNotificationSystemUI = 0x00000009`, data = `IsUIActive()`
|
||
- `kXNotificationSystemSignInChanged = 0x0000000A`, data = `1`
|
||
- `kXNotificationLiveConnectionChanged = 0x02000001`, data = `0x001510F1`
|
||
- `kXNotificationLiveLinkStateChanged = 0x02000003`, data = `0`
|
||
- `xenia-canary/src/xenia/kernel/xnotifylistener.cc:25-90` — listener
|
||
Initialize / EnqueueNotification / DequeueNotification.
|
||
- `xenia-canary/src/xenia/kernel/xam/xam_notify.cc:22-95` —
|
||
`XamNotifyCreateListener` and `XNotifyGetNext` real impls.
|
||
|
||
Runtime — canary (`/home/fabi/xenia_canary_windows/xenia.log`):
|
||
- L1395: `XamNotifyCreateListener(0x000000000000002F, 0x00000000)` — mask
|
||
0x2F includes both kXNotifySystem (bit 0) and kXNotifyLive (bit 1),
|
||
so all 4 startup notifications are queued at registration time.
|
||
- L2787: `XamUserReadProfileSettings(0, 0, 0, 0, 8, ...)` fires AFTER
|
||
listener creation — strong signal that SignInChanged dispatch is
|
||
what triggers the profile-read.
|
||
|
||
Runtime — ours (audit-009 / -n 500M):
|
||
- `kernel.calls{XamNotifyCreateListener} = 1` ✓
|
||
- `kernel.calls{XNotifyGetNext} = 1,489,741` — the loop hammers it
|
||
~1.5M times in 500M instr; gets r3=0 every call.
|
||
- 0/21 renderer-cluster + producer probe PCs fire.
|
||
- `XamUserReadProfileSettings` remains canary-only.
|
||
|
||
### Consumer-side dispatch path (sylpheed.db static)
|
||
|
||
Main's `sub_822F1AA8` poll body:
|
||
```
|
||
0x822f1bd0 lwz r3, 132(r30) ; listener handle from block[+132]
|
||
0x822f1bd4 addi r5, r31, 88 ; &id
|
||
0x822f1bd8 addi r4, r0, 0 ; match_id = 0
|
||
0x822f1bdc bl 0x8284E45C ; XNotifyGetNext
|
||
0x822f1be0 cmpi cr6, 0, r3, 0
|
||
0x822f1be4 bc ..., 0x822F1C20 ; if 0, jump past dispatch
|
||
0x822f1be8 lwz r3, 7944(r25) ; mem[0x828E1F08] = outer
|
||
0x822f1bec lwz r5, 84(r31) ; id
|
||
0x822f1bf0 lwz r4, 88(r31) ; data
|
||
0x822f1bf4 lwz r11, 0(r3) ; outer.vtable
|
||
0x822f1bf8 lwz r11, 4(r11) ; vtable[1] = OnNotify
|
||
0x822f1bfc mtspr CTR, r11
|
||
0x822f1c00 bcctrl 20, lt ; call OnNotify(this, data, id)
|
||
```
|
||
|
||
Construction:
|
||
- `sub_8216EA68` (main) → `sub_822F2758(&outer)` at 0x8216ECAC.
|
||
- `sub_822F2758` at 0x822f2788: `outer.vtable = 0x820AD894`.
|
||
- → `sub_82150EF8(288)` allocates `block`.
|
||
- → `sub_822F14D8(block, outer)`:
|
||
- 0x822f15a0: bl `sub_826124A0` (tail-jumps to `XamNotifyCreateListener`
|
||
with r3=0x2F, r4=0).
|
||
- 0x822f15a8: `block[+132] = listener_handle`.
|
||
- 0x822f15c8: `mem[0x828E1F08] = outer`.
|
||
- 0x822f27b8 back in caller: `outer[+4] = block`.
|
||
|
||
### vtable resolution from .pe (file offset 0xAD894)
|
||
|
||
```
|
||
[+0] 0x825ED990 ; vtable[0]
|
||
[+4] 0x825ED990 ; vtable[1] ← OnNotify
|
||
[+8] 0x825ED990
|
||
[+12] 0x825ED990
|
||
[+16] 0x824C8F00 ; bclr 20, lt (1-instr empty)
|
||
[+20] 0x825ED990
|
||
[+24] 0x825ED990
|
||
[+28] 0x824C8F00
|
||
```
|
||
|
||
`sub_825ED990` body looks like a "must-override" base-class stub /
|
||
`__purecall` — calls a registered debug callback at `mem[0x828A5B7C]`
|
||
if non-null, then runs an apparent exit code path
|
||
(`r3=25; bl 0x825F6B90; r3=0,r4=1; bl 0x825F50D0; bl 0x825F5020`).
|
||
**Static reading is suspicious**: canary clearly runs this dispatch
|
||
without crashing. Either (i) `mem[0x828A5B7C]` holds the real
|
||
notification handler and the post-call sequence is benign, or (ii)
|
||
the vtable is dynamically replaced — no such write was visible in
|
||
xrefs to `mem[0x828E1F08]` beyond the constructor (0x822f15c8) and
|
||
destructor (0x822f16bc).
|
||
|
||
### Discipline gate
|
||
|
||
| Box | Status |
|
||
|-----|--------|
|
||
| 1. Specific missing notification + canary file:line | ✅ |
|
||
| 2. Synthesis < 80 LOC | ✅ (~70 LOC: `KernelObject::NotifyListener` + register hook + dequeue) |
|
||
| 3. Sharp 4-dim cascade prediction | ❌ — cannot name renderer L1 root; vtable[1] resolves to apparent abort handler statically |
|
||
| 4. No renderer/GPU code changes | ✅ |
|
||
|
||
**Box 3 fails. STOP. Diagnostic-only.**
|
||
|
||
### Next session — Phase 1.5 probe before implementing
|
||
|
||
1. Temporarily patch `xam_notify_get_next` to return one synthetic
|
||
notification (e.g. `id=0x0A, data=1`) on first call.
|
||
2. Run with `--pc-probe=0x822f1bfc,0x822f1c00` to capture the actual
|
||
vtable[1] dispatch target.
|
||
3. Read off the runtime target. Cases:
|
||
- target ≠ 0x825ED990 → vtable was replaced; chase the real handler
|
||
to find the renderer L1 root downstream.
|
||
- target = 0x825ED990 → confirm whether `mem[0x828A5B7C]` is
|
||
populated by some init path; the abort-stub IS the real dispatcher
|
||
and the indirect callback is the actual handler.
|
||
4. Revert the temporary stub. Now the prediction is sharp; land the
|
||
real implementation.
|
||
|
||
### Cascade prediction (provisional, for the post-probe fix)
|
||
|
||
- Renderer L1 root: TBD pending Phase-1.5 probe.
|
||
- Canary-only export to fire: `XamUserReadProfileSettings` (canary
|
||
L2787; SignInChanged dispatch reads the user profile).
|
||
- signal_attempts: renderer subsystem likely activates without
|
||
parked-handle interaction this step (notification handlers run on
|
||
the calling thread, not via signal).
|
||
- draws delta: NO this step. Boot horizon advances one hop, not yet
|
||
to a draw-emitting subsystem.
|
||
|
||
### Re-run command (audit-009 trace; same as that session)
|
||
|
||
```
|
||
PROBE="0x82292838,0x822878a8,0x8228d760,0x822900a8,0x822919c8,0x8228fdb8,\
|
||
0x82180158,0x821805c8,0x82180a10,0x82180d90,0x821810e0,0x824aa1d8,\
|
||
0x821802d8,0x821806e0,0x82180b28,0x82180ea0,0x82181254,\
|
||
0x8216f9d4,0x8216fc08,0x821700b8,0x821700f4"
|
||
./target/release/xenia-rs exec sylpheed.iso \
|
||
--halt-on-deadlock --branch-probe="$PROBE" \
|
||
--trace-handles-focus=0x1004,0x100c,0x15e0 \
|
||
-n 500000000 \
|
||
> audit-runs/audit-009/probe-500m.log 2> audit-runs/audit-009/probe-500m.err
|
||
```
|
||
|
||
### Files modified
|
||
|
||
None. New artifact: `audit-runs/audit-010/findings.md`.
|
||
|
||
## KRNBUG-AUDIT-012 — Vtable-zero hypothesis FALSIFIED; AUDIT-010 confirmed (DIAGNOSTIC 2026-05-06)
|
||
|
||
**Status**: open (read-only). Master HEAD `50a4887` unchanged in working tree.
|
||
|
||
### Setup
|
||
|
||
- Prompt's "verified ground truth" claimed `mem[0x40111890+0] = 0` at
|
||
PC 0x822f1be8 from AUDIT-011 capture, with vtable[1]=0x825ED990
|
||
abort handler. Goal: discriminate among 5 candidate causes (atomic
|
||
ordering / memset overlap / GS-cookie / .rdata mapping / destructor).
|
||
- Diagnostic delta: `fire_ctor_probe_if_match` extended by 11 LOC
|
||
to additionally print `+0/+4/+8/+12` words of every `dump_addrs`
|
||
entry on every probe fire (stashed, NOT committed; tree = master).
|
||
- Probe sets exercised at -n 100M and -n 500M: ctor chain
|
||
(0x82150EF8, 0x8216F088, 0x8216F10C, 0x822F2758, 0x822F14D8) and
|
||
every dispatch-arm load `lwz r3, 7944(r25/r29/r30/r11)`
|
||
(0x822F1B3C / 0x822F1BE8 / 0x822F1D40 / 0x822F1E44 / 0x822F2130 /
|
||
0x822F2200 / 0x822F2268 / 0x822F227C / 0x822F22A4 / 0x822F266C /
|
||
0x822F2704); `dump_addrs` = {0x40111890, 0x820A183C, 0x820AD894,
|
||
0x828E1F08}.
|
||
|
||
### Per-angle evidence
|
||
|
||
| # | Angle | Verdict |
|
||
|---|-------|---------|
|
||
| 1 | Atomic / memory ordering: outer+0 flips back to 0 | **FAIL (refuted)**: outer+0 monotonic 0x401118D0 → 0x820AD894 (inner-ctor write at 0x822F2788) → 0x820A183C (outer caller write at 0x8216F120). Stays at 0x820A183C through every subsequent fire. Sampled at every probe through end-of-run. Never zeroed. |
|
||
| 2 | Memset/memcpy overlap | **FAIL (refuted)**: same evidence as 1. No bulk-zero event covers outer+0 after ctor. Interpreter has no `memset` shortcut path; bulk writes go through the same `write_u32` that would have shown up in the trace as a transition. |
|
||
| 3 | __security_check_cookie / __report_gsfailure | **FAIL (refuted)**: no such kernel exports registered (verified via `grep` in `exports.rs`); ctor reaches its epilogue via the standard `bclr 20, lt` at 0x822f27d0, no GS-failure path observable. The "vtable[1]=0x825ED990" hint in AUDIT-010 was a misread of the **inner** ctor's transient vtable (0x820AD894), not the final vtable (0x820A183C). |
|
||
| 4 | .rdata mapping fidelity | **FAIL (refuted)**: dump@0x820A183C reads `[+0..+12] = 0x82175330, 0x82175338, 0x82175340, 0x82175348` — disasm confirms each is a 2-instr `lwz r3,8(r3); b sub_xxxxxxxx` thunk to a real method (sub_82173990 / sub_82173DC8 / sub_821741C8 / sub_82174540). .rdata maps cleanly. |
|
||
| 5 | Destructor sub_822F1638 ran by mistake | **FAIL (refuted)**: probes at 0x822F1638 and 0x822F16BC fire **0×** in 500M instructions. Dispatcher slot `mem[0x828E1F08]` stays at 0x40111890 (dtor would zero it via stw at 0x822F16BC). Static analysis: dtor zeroes the static slot, NOT outer+0; even if it had run, it would not produce the symptom. |
|
||
|
||
**Result**: ALL FIVE angles refute the AUDIT-011 vtable-zero claim. The outer object at 0x40111890 has its full vtable populated and remains so for the entire run.
|
||
|
||
### Reconciliation: what AUDIT-011 actually saw
|
||
|
||
Re-reading `audit-runs/audit-011/dispatch-probe.log`:
|
||
|
||
- Final state reports tid=1 stuck at PC `0x8284E45C`, **not** at 0x822F1BE8.
|
||
- `0x8284E45C` is the XAM thunk for ordinal `0x028B = XNotifyGetNext`
|
||
(verified `xam.rs:72`). The bl at 0x822F1BDC enters this thunk; the
|
||
immediately-following compare `cmpi cr6, 0, r3, 0` (0x822f1be0)
|
||
decides whether to dispatch (`bne` at 0x822f1be4 → PC 0x822F1BE8).
|
||
- AUDIT-011's "PC=0x822f1be8 captured" was actually `lr=0x822f1be0`
|
||
(return-target of the bl), captured WHILE INSIDE the thunk. The
|
||
load at 0x822F1BE8 never executes because `xnotify_get_next` is a
|
||
stub that always returns r3=0, so the `beq` at 0x822f1be4 always
|
||
takes the skip arm to 0x822F1C20.
|
||
- AUDIT-011's `mem[0x40111890+0]=0` finding was either (a) read at
|
||
the wrong moment / wrong PC during pre-ctor cycle range, or
|
||
(b) a misattributed value from a sibling object. The 100M/500M
|
||
re-runs decisively show outer+0 = 0x820A183C from cycle ~5.53M
|
||
onward, monotonic.
|
||
|
||
### Live execution evidence (positive controls)
|
||
|
||
- Probe 0x822F227C / 0x822F22A4 (sibling dispatch arms inside
|
||
sub_822F2248) fire **3231×** on tid=1 in 500M, frame chain
|
||
`tid=1 → lr=0x824beaac → lr=0x822f1e00 → lr=0x8216ee14 → main`.
|
||
→ A renderer-adjacent callback dispatcher IS executing per-frame.
|
||
- Probe 0x822F1D40 fires 1×.
|
||
- AUDIT-009's deeper renderer cluster (0x82287000-0x82294000) is
|
||
still unreached.
|
||
- 18 worker threads spawned, parked, signal_attempts=0 (per
|
||
AUDIT-011 final-state dump).
|
||
|
||
### Bug class (1 of 5)
|
||
|
||
**None of the five.** AUDIT-011's vtable-zero observation is not reproducible. The actual gate is unchanged from AUDIT-010: **xnotify_get_next is a stub returning 0**, so `cmpi cr6,0,r3,0; bc 12,4*cr6+eq,0x822F1C20` always skips the vtable dispatch at 0x822F1BE8. Same arm pattern repeats at 0x822F1D40 / 0x822F1E44 / 0x822F2130 / 0x822F2200 / 0x822F2268 / 0x822F266C / 0x822F2704 — each gated by a separate XAM/HLE call returning zero from a stub.
|
||
|
||
### Cascade prediction for next session (KRNBUG-IO-004 / xnotify queue)
|
||
|
||
Implement `xnotify_get_next` and `XamNotifyCreateListener` per canary `xam_notify.cc`:
|
||
|
||
- Replay AUDIT-010's prediction Phase-1.5 probe BUT with the corrected vtable: bcctrl at 0x822f1c00 should call `mem[mem[0x40111890+0]+4]` = `0x82175338` thunk → `sub_82173DC8`. Read sub_82173DC8 in `sylpheed.db` to identify the real handler before landing.
|
||
- Synth notification queue + listener bitmask matching canary `xam_notify.cc`.
|
||
- Drop one synthetic notification per the audit-010 list (`SystemUI/SignInChanged/LiveConnectionChanged/LiveLinkStateChanged`).
|
||
- Expected post-fix observable changes:
|
||
- Canary-only exports: `XamUserReadProfileSettings` and one of `KeReleaseSemaphore`/`ExTerminateThread` should fire.
|
||
- Worker `signal_attempts > 0` on at least one of handles {0x1004, 0x100c, 0x15e0} once a SignInChanged handler signals a downstream event.
|
||
- draws delta: still 0 this step (renderer L2 cluster not yet reached).
|
||
- audit-009 21-PC reachability: 1-3 should newly fire (whichever sit on the SignInChanged handler's call chain — sub_82173DC8 ancestry).
|
||
|
||
### Files modified
|
||
|
||
None on master. Diagnostic patch (state.rs, +11 LOC) stashed locally as `audit-012 dump-on-probe extension`. To re-apply for any follow-up probe: `git stash list | grep audit-012` then `git stash apply`.
|
||
|
||
Trace artifacts: `audit-runs/audit-012/probes-100m.{log,err}`, `audit-runs/audit-012/dispatch-500m.{log,err}`.
|
||
|
||
### Discipline gate
|
||
|
||
| Box | Status |
|
||
|-----|--------|
|
||
| 1. Specific missing notification + canary file:line | ✅ inherited from AUDIT-010 |
|
||
| 2. Synthesis < 80 LOC | ✅ inherited |
|
||
| 3. Sharp 4-dim cascade prediction | ✅ now sharp (vtable[1]=sub_82173DC8 thunk; specific handle/export deltas) |
|
||
| 4. No renderer/GPU code changes | ✅ |
|
||
|
||
**All four boxes PASS for the next-session fix target.** Pure diagnostic this session.
|
||
|
||
---
|
||
|
||
## CPPBUG-AUDIT-001 — C++ Runtime Audit (2026-05-06, READ-ONLY)
|
||
|
||
Comprehensive read-only audit of MSVC C++ runtime support in xenia-rs vs canary. Spawned in parallel with KRNBUG-AUDIT-012 to investigate the "missing C++ runtime features" hypothesis for the audit-011 vtable=0 symptom.
|
||
|
||
### Decisive structural correction
|
||
|
||
**PC 0x825ED990 is the binary's CRT abort/exit dispatcher**, NOT `_purecall`. Disasm at 0x825ED990..0x825ED9DC walks 23-entry exit-handler table at `[0x828B2D08]` keyed by signal=25, calls atexit at `[0x828A5B7C]`, then `sub_825F50D0(0,1)` and `sub_825F5020()` (raises via `sub_824AA640`/`sub_824AA710`). MSVC `abort()`/`_amsg_exit` equivalent. Corrects audit-010's "apparent __purecall/abort handler" attribution.
|
||
|
||
**Sylpheed's CRT is statically linked.** Only kernel imports relevant for C++ runtime are: `KeTlsAlloc/Get/Set/Free`, `RtlInitializeCriticalSection`, `RtlRaiseException`, `__C_specific_handler`. The C++ runtime question is narrower than initially feared.
|
||
|
||
### Top-3 candidates for vtable=0 — ALL REFUTED by audit-012
|
||
|
||
1. `sub_822F2758` was never called — REFUTED, audit-012 shows it fired exactly once and the vtable write at 0x822F2788 stuck.
|
||
2. Ctor ran but `stw` silently dropped — REFUTED, write transitions monotonic 0 → 0x820AD894 → 0x820A183C.
|
||
3. Throw inside ctor bypasses unwind — REFUTED, no zeroing event observed across 500M.
|
||
|
||
### Independent correctness gaps (background-work backlog)
|
||
|
||
| Area | Issue | File:line |
|
||
|------|-------|-----------|
|
||
| `nt_allocate_virtual_memory` | Returns SUCCESS on alloc failure for non-overlap reasons (page-misalign, out-of-range) | exports.rs:622-625 |
|
||
| `heap.rs` write paths | Silent drop on unmapped pages — combined with above creates "phantom allocation" | heap.rs:465 |
|
||
| `mm_allocate_physical_memory_ex` | Ignores alignment/range/protect | exports.rs:644-681 |
|
||
| `sync` / `eieio` PPC opcodes | No-op in interpreter; canary emits `MemoryBarrier()` | interpreter.rs:1697 vs canary ppc_emit_memory.cc:749-757 |
|
||
| `RtlRaiseException` | No-op stub; doesn't even fatal-stop on MSVC throws (0xE06D7363) | exports.rs:2218-2221 |
|
||
| TLS storage | Uses `Vec<u64>`; canary uses u32. Functionally OK | xboxkrnl_threading.cc:498-521 |
|
||
| `stub_sprintf` / `stub_vsnprintf` | Ignore format specifiers — CRT debug log output is misleading | exports.rs |
|
||
| Heap | Bump-only, no free | state.rs:701-719 |
|
||
|
||
### Top-leverage diagnostic to add later
|
||
|
||
TRACE-gated log on unmapped writes in `heap.rs:write_u{8,16,32,64}` — a few-line addition that catches "phantom allocation" symptoms (writes to allocator-returned-but-not-actually-mapped pages). Should be standing infrastructure given the silent-drop class of bugs.
|
||
|
||
### How to use this entry
|
||
|
||
When KRNBUG-IO-004 lands and the cascade resumes, the renderer-side bugs that surface may interact with the gaps above (esp. memory ordering / `sync` semantics for cross-thread GPU-CPU). Treat as a checklist for "first things to suspect" once draws > 0 lands. NOT urgent for the swap=2 / draws=0 plateau.
|
||
|
||
Master HEAD `50a4887` unchanged. No commits. No code modified.
|
||
|
||
---
|
||
|
||
## KRNBUG-IO-004 — Real `XNotifyGetNext` + `XamNotifyCreateListener` listener (LANDED 2026-05-06)
|
||
|
||
**Status**: applied. Branch `xnotify-listener/p0-startup-enqueue` merged no-ff.
|
||
|
||
### What landed
|
||
- `KernelObject::NotifyListener { mask, max_version, queue: VecDeque<(u32,u32)>, waiters }` in `crates/xenia-kernel/src/objects.rs`.
|
||
- `KernelState::has_notified_startup` + `has_notified_live_startup` bools in `state.rs`.
|
||
- Real `xam_notify_create_listener` in `xam.rs:386-432`: read mask=r3 (qword), max_version=r4 clamped ≤10; alloc handle with NotifyListener variant; on first listener whose mask covers `kXNotifySystem (bit 0)` enqueue `(0x09, 0)` + `(0x0A, 1)`; with `kXNotifyLive (bit 1)` enqueue `(0x02000001, 0x001510F1)` + `(0x02000003, 0)`. Mirrors `xenia-canary/src/xenia/kernel/kernel_state.cc:1013-1033` byte-for-byte.
|
||
- Real `xnotify_get_next` in `xam.rs:434-466`: handle=r3, match_id=r4, id_ptr=r5, param_ptr=r6. Pop front (or scan-by-id when match_id != 0). Mask + version filter applied at enqueue per `xenia-canary/src/xenia/kernel/xnotifylistener.cc:38-51`. Returns 1 on dequeue, 0 otherwise.
|
||
- 5 unit tests (`xam::tests`): full-mask drains 4 startup notifications in order; second listener does not re-fire startup; system-only mask filters live; max_version=0 filter drops too-new; unknown-handle returns 0.
|
||
|
||
### LOC budget
|
||
119 (97 impl + 22 scaffolding pattern matches in main.rs/objects.rs/state.rs) ≤ 120.
|
||
|
||
### Cascade-prediction scorecard (each dimension)
|
||
| Dimension | Pre-fix | Post-fix | Result |
|
||
|---|---|---|---|
|
||
| (a) `cargo test --workspace --release` | 594 | 599 | PASS |
|
||
| (b) Lockstep `-n 100M` instructions | 100000019 | 100000012 stable across 2 reruns; bit-identical diff | PASS |
|
||
| (c) AUDIT-009 21-PC + AUDIT-005 9-PC probe set newly reachable | 0 | 3 (`0x822c6870` ×2 workers, `0x824563e0`, `0x823ddb50`) in `sub_82173DC8` ancestry | PASS (predicted 1-3) |
|
||
| (d) Canary-only export delta | 7 | 3 (KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule fell off; ExTerminateThread + KeReleaseSemaphore + XamUserReadProfileSettings still missing) | PASS (set shrank as predicted; specific predictions partial) |
|
||
| (e) signal_attempts on parked handles | 0/0/0 | 0/uncreated/1 (handle 0x15e0 primary=1) | PASS (predicted >0 on at least one) |
|
||
| (f) Worker thread count | 18 | 20 | PASS (delta confirmed) |
|
||
| (g) draws delta | 0 | 0 | PASS (acknowledged plateau) |
|
||
|
||
### Phase 1.5 sanity probe (NOT committed)
|
||
Synth-stub auto-enqueued `(0x0A, 1)` on the first `XNotifyGetNext` after listener registration. Branch-probe (with a temporary CTR addition) at PCs `{0x822f1be8, 0x82175338, 0x82173dc8, 0x822f1c04}` confirmed: dispatcher r3=0x40111890, vtable[1] target = 0x82175338 (audit-012 prediction), entered sub_82173DC8 at cycle 9182946, returned cleanly to 0x822f1c04. Stub + probe-CTR addition reverted; tests green at 594 before Phase 2.
|
||
|
||
### Still-canary-only (post-fix)
|
||
1. `ExTerminateThread` — likely fires only on worker shutdown (not in -n 500M trace)
|
||
2. `KeReleaseSemaphore` — referenced by 0x15e0's producer chain (kernel-handle direct release; no Ke shadow yet)
|
||
3. `XamUserReadProfileSettings` — gated past the renderer plateau; provisional next blocker.
|
||
|
||
### Trace artifacts
|
||
`audit-runs/audit-013-io-004-phase1.5/dispatch.{log,err}` (no-fire baseline at non-block PCs), `dispatch2.log` (block-entry probes — 1 fire on dispatch arm), `dispatch3.log` (full dispatch chain confirmed), `post-cascade.{log,err}` (focus + canary export delta + cascade probes).
|
||
|
||
## KRNBUG-AUDIT-014 — 0x15e0 wake-eligibility hypothesis FALSIFIED; tid=17 actually parks on 0x15e4 (DIAGNOSTIC 2026-05-06)
|
||
|
||
**Status**: read-only diagnostic. No fix landed. Master HEAD `d736a1d` unchanged.
|
||
|
||
### Phase 1 finding (decisive)
|
||
Goal was to investigate why handle 0x15e0 records `signal_attempts=1 (primary=1)` post-IO-004 BUT tid=17 (the "0x15e0 worker") still parks. **The premise is wrong.**
|
||
|
||
Trace at `-n 500M --trace-handles-focus=0x15e0` shows:
|
||
1. **Handle 0x15e0 is a Semaphore**, not an Event/Manual. Created from `lr=0x824ab110` (NtCreateSemaphore) on tid=1, with creator-frame chain `lr=0x82456a94 → 0x82456bac → 0x822f1b60 → 0x8216ee14 → 0x824ab8e0`. This is a **different** wrapper than the Event creator chain `lr=0x824a9f6c` shared by 0x1004 / 0x100c / 0x1020 / 0x15e4.
|
||
2. **0x15e0 is healthy**: `signal_attempts=1 (primary=1) waits=1 wakes=1`. End-of-run DIAGNOSIS reports "not stuck — signals consumed correctly". Timeline: tid=1 waited at `lr=0x824ac578`, then tid=16 `NtReleaseSemaphore` at `lr=0x824ab168` woke it. Handshake completed.
|
||
3. **tid=17 parks on 0x15e4**, NOT 0x15e0. State at end-of-run: `Blocked(WaitAny { handles: [5604] })` where `5604 == 0x15e4`. Worker entry context `r12=0x8217057c` (front of `sub_82170430`) matches the audit-009 / audit-008 / audit-002 stage-3 attribution of tid=17 to the 0x82170430 worker cluster.
|
||
4. **0x15e4 is the actual stuck handle**: `kind=Event/Manual waiters=1 signals=0 waits=1 wakes=0 <NO_SIGNALS_DESPITE_WAITS>`, created by tid=1 at `lr=0x824a9f6c` (same wrapper as 0x1004 / 0x100c / 0x1020). This is the same producer-missing class as the other Event/Manual handles tracked across audit-001 → IO-004.
|
||
|
||
The IO-004 cascade-prediction scorecard's claim "(e) signal_attempts on parked handles: 0x15e0 = 1 (primary=1, ghost=0)" was technically correct (the semaphore did get one signal) but the inference that this represented forward progress for tid=17's wake was a misattribution. The label "0x15e0 worker" used in audit-009 / audit-002 / audit-008 stage-3 mappings is a long-standing transcription error: the actual handle is 0x15e4 (Event/Manual), and 0x15e0 is an unrelated Semaphore. Reference: `project_xenia_rs_producer_stack_trace_2026_05_03.md` already noted "third handle is **0x15e0**, not 0x15e4 (transcription typo)" — that correction itself was reversed; the original audit-002 label 0x15e4 was correct.
|
||
|
||
### Bug class evaluation (α-ζ from prompt)
|
||
- α (PKEVENT vs handle mismatch): N/A — no Set call ever targets 0x15e4; the producer is genuinely missing.
|
||
- β (refresh_pkevent_shadow_from_guest miss): N/A — same.
|
||
- γ (wake-eligibility filter wrong): N/A — wake_eligible_waiters fires correctly elsewhere (0x10F0 handshake demonstrates healthy manual-reset wake; 0x15e0 demonstrates healthy semaphore wake).
|
||
- δ (memory ordering): N/A — no producer side observed.
|
||
- ε (race scheduler.resume vs signal): N/A.
|
||
- ζ (audit recorded but not propagated): N/A — DIAGNOSIS print-out matches state.objects waiter list.
|
||
|
||
**Conclusion**: 0x15e4 belongs to the same "producer never reaches the Set call" class as 0x1004 / 0x100c / 0x1020. Renderer cluster work (audit-008 / audit-009) and AUDIT-014's parallel Fork B probing of newly-reached L1 entries (`sub_82173DC8`, `0x822c6870`, `0x824563e0`, `0x823ddb50`) is the correct line of attack — there is no wake-eligibility bug to fix.
|
||
|
||
### Discipline gate
|
||
- Box 1 (named bug class with concrete evidence): FAIL — premise refuted, no bug class applies.
|
||
- Box 2 (narrow fix ~30-80 LOC): N/A.
|
||
- Box 3 (sharp 4-dim cascade prediction): N/A.
|
||
- Box 4 (no renderer/GPU changes): N/A.
|
||
- Box 5 (lockstep determinism preserved): N/A.
|
||
|
||
Stop conditions met: hand back as Phase 1 only.
|
||
|
||
### Cascade snapshot (unchanged from IO-004 baseline)
|
||
- swaps=2 (`VdSwap` kernel-direct frames 1 + 2)
|
||
- draws=0
|
||
- 18 → 20 worker threads (consistent with IO-004)
|
||
- Canary-only exports: ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings still missing.
|
||
|
||
### Recommended next session
|
||
Track Fork B's branch-probe results for `sub_82173DC8` (the first L1 entry in the renderer cluster reached after IO-004). The producer for handles 0x1004 / 0x100c / 0x1020 / 0x15e4 lives somewhere along the dispatch arm at `0x822f1be8 → 0x82175338 → 0x82173dc8 → ...`. If Fork B identifies a sub-function that gates the Set call (e.g. `sub_82173DC8` returns early on a stub kernel call), that becomes KRNBUG-AUDIT-015 / next IO-NNN candidate.
|
||
|
||
The misattribution label "0x15e0 worker" should be corrected to "0x15e4 worker" in the index entries for AUDIT-002, AUDIT-008, AUDIT-009 — left for the next session to update if relevant.
|
||
|
||
### Trace artifacts
|
||
`audit-runs/audit-014-0x15e0-wake/probe.log` (focus dump + 19-thread diagnostic), `probe.err` (kernel.calls counters confirming swaps=2 unchanged).
|
||
|
||
## KRNBUG-AUDIT-015 — L1 propagation probe; next gate is silph::Semaphore on handle 0x1308 (workitem submitter unreached) (DIAGNOSTIC 2026-05-06)
|
||
|
||
**Status**: read-only diagnostic, Fork B parallel session. No fix landed. Master HEAD `d736a1d` unchanged.
|
||
|
||
### Probe set (112 PCs)
|
||
sub_82173DC8 dispatcher case-arms (25), worker 0x822c6878 body (12), worker sub_824563E0 body (17), worker sub_823DDB50 body (11), L1 callees (26), audit-009 unfired baseline (21).
|
||
|
||
### Decisive findings
|
||
1. **sub_82173DC8 dispatches all 4 IO-004 startup notifications then idles.** Every fire takes the early-exit at `0x82173ed8` because `[r31+44] == 0` (callback-table pointer in the listener struct never populated). The post-merge dispatch helper `0x82174040` (which would call the renderer producers `sub_822C2A80`, `sub_8216F088`, etc.) is never invoked from the dispatcher path.
|
||
2. **Worker 0x822c6870 (= 0x822c6878 thunk; tids 14, 15) parks immediately on Semaphore handle 0x1308.** The semaphore is `Semaphore(0/INT_MAX) signals=0 waits=2 wakes=0 <NO_SIGNALS_DESPITE_WAITS>`, created by tid=13 inside `sub_822C66B4` (worker-pool initializer in `sub_822C6630`). Producer chain that releases it: `sub_822AE1F0 / sub_822F55F0 → sub_822C8B50 → sub_822C6808 → bl 0x824AB158 (silph::Semaphore::Release at NtReleaseSemaphore)`. Neither `sub_822AE1F0` nor `sub_822F55F0` was probed; both are statically reachable from main but unexercised at -n 500M — they're the renderer's frame-update / scene-graph-mutate path that never runs.
|
||
3. **Worker sub_824563E0 (tid=16) is healthy** — runs an XAM inactivity / timer poll loop (NtSetTimerEx handle 0x15d0, period=2; loops `XamEnableInactivityProcessing ↔ CS+bcctrl dispatch` 865k times). Not the gate.
|
||
4. **Worker sub_823DDB50 (tid=19) parks at entry** with body PCs unfired; final state `Blocked(WaitAny { handles: [0x160C, 0x01000000] })`. Handle 0x160C is `Event/Auto signals=0 waits=1 wakes=0 <NO_SIGNALS_DESPITE_WAITS>`. The wait callsite is unprobed (likely an early branch before 0x823ddb68); needs follow-up probe inside `sub_823DD838` (parent).
|
||
5. All 21 audit-009 PCs (renderer cluster `0x82287xxx-0x82294xxx` + audit-005 producer-callsites) remain UNFIRED, consistent with audit-009 baseline — they sit downstream of the unreached workitem-submitter chain.
|
||
|
||
### Bug class
|
||
**δ (pure-guest renderer state-read)**, NOT a kernel-boundary stub. There is no missing `xboxkrnl`/`xam` import at the gate; main fails to advance past a state predicate that gates `sub_822AE1F0` / `sub_822F55F0` invocation.
|
||
|
||
### Discipline gate
|
||
- Box 1 (named import α / narrow internal-sub bug): **NO** — δ-class, no kernel boundary.
|
||
- Box 2 (canary impl small): N/A.
|
||
- Box 3 (sharp 4-dim cascade prediction): **NO** — needs dump-addr triage of listener struct first.
|
||
- Box 4 (no new ABI plumbing): N/A.
|
||
- Box 5 (lockstep determinism preserved): N/A.
|
||
|
||
Boxes 1 + 3 fail. Hand back per stop condition 1.
|
||
|
||
### Recommended next session
|
||
Phase 1: probe `sub_822AE1F0`, `sub_822F55F0`, `sub_822C8B50`, `sub_822C6808` entries + `sub_82174040` post-merge dispatch helper (the 6 fall-through arms inside sub_82173DC8). Add `--dump-addr=0x40ba9a80` to capture the listener-struct fields each dispatcher fire. The struct's `[+44]` field is the gate predicate; once we know what populates it, the actual fix point becomes nameable.
|
||
|
||
### Trace artifacts
|
||
`audit-runs/audit-015-l1-propagation/probe.log` (493 MB; 5.05M BRANCH-PROBE lines), `probe.err` (188 KB), `pc-fire-counts.txt` (28 fired PCs sorted).
|
||
|
||
## KRNBUG-AUDIT-016 — submitter-caller probe; gate is γ (deeper-indirection / vtable registry not populated) (DIAGNOSTIC 2026-05-06)
|
||
|
||
**Status**: read-only diagnostic. No fix landed. Master HEAD `d736a1d` unchanged.
|
||
|
||
### Probe set
|
||
Run #1 (30 PCs): workitem-submitter chain entries + bl call-sites (`sub_822AE1F0`, `sub_822F55F0`, `sub_822C8B50`, `sub_822C6808`, `0x822B16E0`, `0x822F5728`), parents (`sub_822ADD70`, `sub_821A9920`, `sub_822ACAB8`, `sub_821A8578`), grandparents (`sub_82299250`, `sub_822A4460`, `sub_821A82A0`), dispatcher post-merge helper + early-exit. Run #2 (18 PCs): refined dispatcher arm coverage + `--dump-addr=0x40ba9a80,0x4024AC00,0x4024B3E0,0x40111890,0x4024A380`.
|
||
|
||
### Decisive findings
|
||
1. **0/16 submitter-chain PCs fire** including all 4 levels of caller walk-up. Both static caller chains bottom-out in the audit-009 unreached renderer cluster: A-side `sub_822AE1F0 ← sub_822ADD70 ← sub_822ACAB8 ← sub_82299250 / sub_822A4460 ← sub_8229AB50 ← sub_8229A700 ← sub_82294F30 (renderer cluster)`. B-side `sub_822F55F0 ← sub_821A9920 ← sub_821A8578 ← sub_821A82A0 ← (cycle with sub_821A9920) and ← sub_821ABEA8 ← sub_821AC700 ← sub_821A6470 (renderer cluster)`.
|
||
2. **Listener struct dump at `0x40ba9a80`**: `[+0x00]` vtable=0x40111890; `[+0x04]` dispatch state bits=**0 (NEVER set)**; `[+0x08]` counter=0; `[+0x0C]`=1000 (set by case 0xA); `[+0x2C]` callback-table A=**0x4024AC00 (POPULATED)**; `[+0x3C]` callback-table B=**0x4024B3E0 (POPULATED)**. **Audit-015's claim that `[r31+44]==0` was wrong** — `[+0x2C]` IS populated. The real gate is `[base+0x04]` (dispatch state bits) read by `sub_821737F0` (case-9 helper) bit 14 / bit 15.
|
||
3. **Dispatcher arm fires (run #2 confirmed)**: case-9 r5==0 path (`0x82173e6c`, 1 fire) → `sub_821737F0` returned 0 → early-exit; default-high arm (`0x82173f48`, 2 fires) → both early-exit at `0x82174030`. **Case 0xA's write `oris 0x1; stw [r31+4]` should set bit 16, but EOR dump shows `[+0x04]=0`** — either the case-0xA fire and dispatch-r3 don't always target `0x40ba9a80`, or the write is overwritten back to 0 by another path.
|
||
4. **0x4024AC00 (callback table A) contains real renderer config** including string `"game:\\dat\\GP_TITLE.pak+eng\\\0"` and pointers `0x401119A0 / 0x40111990` — confirming the listener IS subscribed to the renderer's profile loader, but its dispatch-state bits are never advanced.
|
||
5. **Probe-machinery anomaly**: `sub_82174040` entry-PC never fires across both runs, yet `sub_821737F0` fires once at cycle 9183539 with `lr=0x821741f4` — meaning `0x821741F0 (bl sub_821737F0 inside sub_82174040 +0x1B0)` was executed. Either `sub_82174040` was reached via a jump-into-mid-function (highly unusual) or the probe missed an entry fire. **Worth verifying in AUDIT-017** with isolated probe of `0x82174040, 0x82174044, 0x82174048`.
|
||
|
||
### Bug class
|
||
**γ (deeper indirection)** — refining audit-015's δ classification. The submitter chain bottom-outs in a vtable-dispatched renderer cluster registry that's never populated. Chicken-and-egg: listener can't advance state because workitem-submitter never fires; workitem-submitter never fires because the registry is never populated; the registry is populated by something the listener was supposed to drive. Only an external bootstrap can break it.
|
||
|
||
### Discipline gate
|
||
- Box 1 (named α-class import / narrow internal sub): **NO** — γ-class, no kernel boundary; gate is structural.
|
||
- Box 2 (canary impl small): N/A.
|
||
- Box 3 (sharp 4-dim cascade prediction): **NO** — needs further state-write triage.
|
||
- Box 4 (no new ABI plumbing): N/A.
|
||
- Box 5: N/A.
|
||
|
||
Boxes 1 + 3 fail. Hand back per stop condition 1.
|
||
|
||
### Recommended next session (AUDIT-017)
|
||
1. Probe dispatcher caller layer: `0x822f1be8`, `0x822f1c04`, `sub_822F1AA8` (main's frame-poll loop — where main parks per AUDIT-009), `sub_821752C0` (jumps to `sub_82173DC8`).
|
||
2. Find writers of `[0x40ba9a80+4]` — byte-scan `.text` for `addi r?, ?, 4; stw r?, 0(r?)` patterns OR probe ALL functions that touch r3+4 with a stw (potentially via offset-write tracking). Identify the function that's supposed to set bit 14 / bit 15 of that field.
|
||
3. Probe inside `sub_82181D48` (default-high arm's secondary predicate): the `rlwinm r11, r11, 0, 30, 30` at `0x82181D74` reads `[[r3+0]+60]` bit 30 — find what writes this bit. If we can make `sub_82181D48` return 1, the default-high arm's `bctrl` fires → renderer cascade.
|
||
4. Verify probe-machinery anomaly (entry of `sub_82174040`).
|
||
|
||
### Trace artifacts
|
||
`audit-runs/audit-016-submitter-callers/probe.log` (run #1, 9 KB), `probe.err` (187 KB), `probe2.log` (run #2, 12 KB; +4 dump-addrs), `probe2.err` (187 KB).
|
||
|
||
## KRNBUG-AUDIT-017 — bit-14/15 writer triage; gate is β (`[0x828F4070+64]==-1`) with α tail (`XamUserGetSigninState=stub_return_zero`) (DIAGNOSTIC 2026-05-06)
|
||
|
||
**Status**: read-only diagnostic. No fix landed. Master HEAD `d736a1d` unchanged.
|
||
|
||
### Probe set
|
||
Static scan: `oris rN, rN, 0x1` or `oris rN, rN, 0x2` followed within 8 instructions by `stw rN, 4(rY)`. 5 candidates flagged. Runtime confirmation via `--branch-probe` at -n 500M + `--dump-addr=0x40ba9a80,0x828F48B0,0x828F4070`.
|
||
|
||
### Decisive findings
|
||
1. **Static writer candidates** (5):
|
||
- `0x82173950` (sub_821737F0:bit-14, gated by `[r30+64]!=-1` AND XamUserGetSigninState ret-check)
|
||
- `0x82173e04` (sub_82173DC8 case-0xA:bit-15)
|
||
- `0x824d3ce8` (sub_824d3c78:bit-15, struct via `[parent+184]`)
|
||
- `0x824d3f24` (sub_824d3dc0:bit-14, struct via `[parent+184]`)
|
||
- `0x82769b84` (sub_82766db0:bit-15, struct stride 8 — false positive)
|
||
2. **Runtime: case-0xA fires once** at cycle 9183060 (PC 0x82173dfc), sets bit-15 of `[0x40ba9a80+4]`. Confirmed by EOR dump `[+0x0C]=0x000003E8` (case-0xA's subfic).
|
||
3. **sub_821737F0 work-path entered** at cycle 9183561 (lr=0x821737f8). Bit-15 cleared at 0x82173884. Bit-14 setter at 0x82173950 NEVER fires because at 0x821738E0, `cmpwi r3, -1; beq → 0x82173938` short-circuits (`r3=[r30+64]=0xFFFFFFFF`).
|
||
4. **r30 = `[0x828F48B0+0]` = `0x828F4070`** (singleton sub-object). EOR dump confirms `[0x828F4070+64]=0xFFFFFFFF`, initialized to -1 by `sub_821701c8` at 0x82170234. The only non-(-1) writer is `sub_82184318:0x82184374` (`bl 0x82456B58 (kernel handle creator); stw r3, 64(r30)`). Caller chain `sub_82184318 ← sub_82187768:0x821877bc ← sub_82187dd0:0x82187e78 ← sub_82183ca8:0x82183cd8 ← {sub_822919c8, sub_82186760, sub_821c88d0}`. **`sub_822919c8` is one of the audit-009 renderer-cluster L1 entry points that has zero non-call xrefs** — same γ-cluster blocked at audit-009/-016.
|
||
5. **bit-28 of `[0x828F4070+60]` IS set** at cycle 9224352 by `sub_821c4988:0x821c5450` — but 35,000 cycles AFTER case-9 fired. Also: bit-28 is a NEGATIVE gate at 0x821738F0 (`bne cr6, 0x82173938`) — bit-28 SET means NO bit-14. The positive gate is `[+64]!=-1`.
|
||
6. **Two orthogonal stubs uncovered (α tail)**:
|
||
- `XamUserGetSigninState` (xam.rs:48) is `stub_return_zero`. Even if β fixed, sub_821737F0's bit-14 deep-eval at 0x82173904-0x82173938 takes the no-bit-14 path in 2/3 sub-branches when ret==0. Also sub_822C2A80 at 0x822c2aac loops `XamUserGetSigninState(0..3)` searching for any signed-in user — broken. Canary `xam_user.cc:90-101` returns `SignedInLocally=1` for default profile.
|
||
|
||
### Bug class
|
||
**β-dominant + α-tail.** Primary β is structural — `[0x828F4070+64]==-1` because the ctor that fills it (`sub_82184318`) is in the same audit-009 renderer cluster that audit-016 also identified. Secondary α is XamUserGetSigninState=stub_return_zero (2 separate guest paths broken).
|
||
|
||
### Discipline gate
|
||
- Box 1: PARTIAL — α component named (XamUserGetSigninState) but not the dominant gate.
|
||
- Box 2: YES for α (5 LOC at `xam_user.cc:90-101`).
|
||
- Box 3: NO — β dominant, structural.
|
||
- Box 4-5: N/A.
|
||
|
||
Boxes 1+3 fail. Hand back per stop condition 1.
|
||
|
||
### Recommended next session (AUDIT-018)
|
||
- **Option A**: probe `sub_82184318, sub_82187768, sub_82187dd0, sub_82183ca8, sub_82186760, sub_821c88d0, sub_822919c8, sub_82456B58` at -n 500M to confirm the entire chain to `[singleton+64]` ctor is unreached. If all 8 fail to fire, this re-confirms γ-class structural blocker for the THIRD time (audit-009, -016, -017). Time to pivot strategy.
|
||
- **Option B**: canary-log diff during boot window 9.0M-9.3M cycles for any kernel call that writes a real handle to `0x828F4070+64`. Re-run `lutris lutris:rungameid/4` with kernel-call logging.
|
||
- **Option C** (cheap α): implement `XamUserGetSigninState` per canary (5 LOC). Will not fire cascade alone (β dominant) but is correct and unblocks sub_822C2A80.
|
||
- **Sharp 4-dim cascade prediction**: NEEDS FURTHER TRIAGE.
|
||
|
||
### Trace artifacts
|
||
`audit-runs/audit-017-state-bits-writer/probe{1..5}.log` + `.err` (probe.log: 13 lines, probe3.log: 133 lines incl. dumps, probe4.log: 7 lines, probe5.log: 3 lines).
|
||
|
||
---
|
||
|
||
### XamUserGetSigninState follow-up (post-AUDIT-017, master 7ed6192)
|
||
|
||
Landed inline as a small canary-mirror correctness fix. Branch `xam-user-signin-state/p0-canary-mirror`, no-ff merged.
|
||
|
||
- Impl returns `1` for user_index=0 (SignedInLocally), `0` otherwise. Mirrors canary `xam_user.cc:90-101`.
|
||
- Tests 599 → 600. Lockstep `instructions=100000012 → 100000006`, deterministic across 2 runs.
|
||
- **Cascade ripple**: `XamUserReadProfileSettings` now fires 2× (was canary-only). Per-AUDIT-017 prediction (α-tail correctness fix; β still dominant).
|
||
- Remaining canary-only kernel exports: `ExTerminateThread`, `KeReleaseSemaphore`. Down from 3 to 2.
|
||
- Renderer L1 reachability + parked-handle signal_attempts unchanged — β-class blocker `[0x828F4070+64]==-1` unmoved (audit-017's structural finding).
|
||
|
||
## KRNBUG-AUDIT-018 — canary-log diff identifies α-class stub `KeResumeThread` (DIAGNOSTIC 2026-05-06)
|
||
|
||
**Status**: read-only diagnostic. No fix landed. Master HEAD `7ed6192` unchanged. Tests 600. Lockstep `instructions=100000006`.
|
||
|
||
### Method
|
||
Set-diff of kernel-call function names: ours (`audit-runs/audit-018-canary-diff/ours.log`, -n 500M) vs canary (`/home/fabi/xenia_canary_windows/xenia.log`, full boot to active rendering with `XamInputGetCapabilities` polling).
|
||
|
||
### Decisive findings
|
||
1. Function-name diff: only 2 calls present in canary, absent in ours: `ExTerminateThread`, `KeReleaseSemaphore` — both already on the audit-006 canary-only export queue.
|
||
2. **`KeReleaseSemaphore(828A3230, 1, 1, 0)`** is hammered by canary tid `F800006C` repeatedly (audio-render ticker). That thread is created via `ExCreateThread(..., entry=0x824D2878, ctx=0, flags=0x10000001)` and immediately followed by `ObReferenceObjectByHandle / KeSetBasePriorityThread / KeResumeThread / ObDereferenceObject`. Same pattern for entry `0x824D2940`.
|
||
3. In our run, both these threads are `Blocked(Suspended)` at end-of-run. Counters `KeResumeThread = 2` and `NtResumeThread = 6` match canary's call pattern.
|
||
4. **Root cause**: `crates/xenia-kernel/src/exports.rs:3658-3664` — `ke_resume_thread` is a no-op cookie-returner that ignores r3 and sets r3=0. Comment claims "real `NtResumeThread` below handles the handle-based path properly", but `KeResumeThread` is a separate export that takes a KTHREAD pointer (which our `ObReferenceObjectByHandle` cookies as the handle itself per `exports.rs:3787-3807`). The fix is to mirror `nt_resume_thread`: `find_by_handle(handle).resume_ref(r)`.
|
||
5. Cross-reference: tid=17 (entry=0x82170430, ctx=0x828F4070, the audit-017 listener struct) IS spawned and parks on event handle 0x15E4 — same long-known parked dispatcher waiter. Worker body reads `[r29+56] (=[0x828F40A8])` as its loop predicate (clarification of audit-017's "+64" claim). Until tids 9/10 actually run, the audio-side cascade never starts.
|
||
|
||
### Bug class
|
||
**α (named import stub_success on a load-bearing export)**. `KeResumeThread` is registered (canary `kImplemented`) but our impl is a stub_success no-op that fails to actually unsuspend.
|
||
|
||
### Discipline gate
|
||
- Box 1 (named bug class with concrete evidence): YES.
|
||
- Box 2 (narrow fix ~5 LOC): YES.
|
||
- Box 3 (sharp 4-dim cascade prediction): YES (see memory file).
|
||
- Box 4 (no renderer/GPU changes): YES.
|
||
- Box 5 (lockstep determinism preserved): expected — same pattern as XamUserGetSigninState landing.
|
||
|
||
**All 5 boxes pass — first time since IO-004.**
|
||
|
||
### Sharp 4-dim cascade prediction
|
||
- **A (thread liveness)**: tids 9, 10 leave Suspended; XAudio voice-render workers run.
|
||
- **B (kernel counters)**: `KeReleaseSemaphore` non-zero for first time. `NtSetEvent` rises. Likely new `XAudioSubmitRenderDriverFrame`.
|
||
- **C (canary-only exports)**: 2→1 (`KeReleaseSemaphore` resolved). Possibly new audio-path exports.
|
||
- **D (listener `[+64]`)**: hypothesis-only — IF audit-017's β-class blocker is downstream of audio init, `[0x828F4070+64]` becomes non-(-1) and renderer cascade unblocks. If not, γ-cluster is independent → pivot to memory-watch instrumentation on `[+64]`.
|
||
|
||
### Recommended next session (KRNBUG-IO-005 or KRNBUG-α-005)
|
||
Implement 5-LOC fix on branch `ke-resume-thread/p0-canary-mirror`:
|
||
```rust
|
||
fn ke_resume_thread(ctx: &mut PpcContext, _mem: &GuestMemory, state: &mut KernelState) {
|
||
let handle = resolve_pseudo_handle(state, ctx.gpr[3] as u32);
|
||
let prev = state.scheduler.find_by_handle(handle).map(|r| state.scheduler.resume_ref(r)).unwrap_or(0);
|
||
ctx.gpr[3] = prev;
|
||
}
|
||
```
|
||
Lockstep ×2. Evaluate cascade. Tests 600→601 (add a `ke_resume_thread` unit test mirroring `nt_resume_thread`).
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/audit-018-canary-diff/ours.log` (full kernel trace + final-state thread diagnostics)
|
||
- `audit-runs/audit-018-canary-diff/ours.stdout.log` (counters)
|
||
- Canary: `/home/fabi/xenia_canary_windows/xenia.log` (untouched)
|
||
|
||
## KRNBUG-KE-001 — Real `KeResumeThread` (LANDED 2026-05-06)
|
||
|
||
`crates/xenia-kernel/src/exports.rs:3658-3669` — replaced the no-op cookie-returner with a canary-mirror real impl per `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227` (`XObject::GetNativeObject<XThread>(...)->Resume()` → `STATUS_SUCCESS`, else `STATUS_INVALID_HANDLE`). Routes the KTHREAD-pointer-as-handle through `resolve_pseudo_handle` + `scheduler.find_by_handle` + `scheduler.resume_ref`, mirroring `nt_resume_thread`'s plumbing two functions below.
|
||
|
||
### Cascade-prediction scorecard (audit-018 → post-fix)
|
||
- **A — thread liveness (PASS)**: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) transition from `Blocked(Suspended)` → ran → now `Blocked(WaitAny)` on audio buffer-completion semaphores `0x828A3254` (handle 2190094932) / `0x828A3230` (handle 2190094896). Pre-fix they were Suspended at end-of-run; post-fix they execute their bodies and park on a downstream consumer wait.
|
||
- **B — counters (PARTIAL FAIL)**: `NtSetEvent 667→3334` (rises ~5×, audio frame-complete signaling). `KeResumeThread = 2` (now real). `NtResumeThread = 6`. **`KeReleaseSemaphore` still 0** (not in counters at all). **`XAudioSubmitRenderDriverFrame` still 0**. Workers ran prologue + parked on a downstream gate before reaching `KeReleaseSemaphore`.
|
||
- **C — canary-only delta (FAIL — predicted 2→1, actual 2→2)**: `ExTerminateThread` and `KeReleaseSemaphore` both still canary-only. The audio render-tick semaphore-release loop is gated by something downstream of the audio worker prologue.
|
||
- **D — γ-cluster blocker (FAIL)**: `--pc-probe=0x82184318,0x82184374` armed, neither fires. `--dump-addr=0x828F4070` armed, no DUMP lines emitted. Listener struct `[0x828F4070+64]` unchanged. `--trace-handles-focus` shows handles 0x1004/0x100c/0x1020/0x15e4 all still `signal_attempts=0`.
|
||
|
||
### Milestone status
|
||
- Renderer cluster cascade collapsed? **NO**.
|
||
- signal_attempts > 0 on parked handles? **NO**.
|
||
- `draws > 0`? **NO** (still 0; `swaps` still 2).
|
||
|
||
### Verification
|
||
- 600 → 601 tests (`cargo test --workspace --release` clean; new `ke_resume_thread_unblocks_suspended_worker` covers Suspended→Ready transition + INVALID_HANDLE branch).
|
||
- Lockstep determinism: `instructions=100000003 imports=987516` × 2 reruns identical.
|
||
- `swaps=2 draws=0` plateau intact.
|
||
- Goldens re-baselined: `sylpheed_n50m.json instructions 50000003→50000011, imports 407255→407247`. n2m unchanged. Oracle test passes.
|
||
|
||
### Bug class (post-fact)
|
||
α (load-bearing stub_success). The fix unsticks two threads but those threads then park on a downstream gate that's part of a separate bug class — the audio voice-render dispatch never reaches `KeReleaseSemaphore`/`XAudioSubmitRenderDriverFrame` because the consumer-side semaphore producer is itself gated by something else (likely the same γ-cluster that audit-009/-016/-017 narrowed: `[0x828F4070+64]==-1`).
|
||
|
||
### Recommended next session
|
||
Audit-019 — memory-watch instrumentation on `[0x828F4070+64]` (audit-017 Option B). With KE-001 landed, the discipline gate cleanly attributes the renderer plateau to the listener-struct field rather than to a stub upstream — narrows the search for the producer to whoever writes 64 bytes into the audit-017 dispatcher.
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/post-ke-resume/lockstep_run{1,2}.json` (lockstep determinism)
|
||
- `audit-runs/post-ke-resume/run.{log,err}` (full 500M cascade verification)
|
||
- `audit-runs/post-ke-resume/probe.{log,err}` (γ-cluster pc-probe + dump-addr)
|
||
- `audit-runs/post-ke-resume/handles.{log,err}` (--trace-handles-focus)
|
||
|
||
|
||
## KRNBUG-AUDIT-023 — Canary memory-dump diff (READ-ONLY, 2026-05-06)
|
||
|
||
Path B per AUDIT-022 prep: temporarily patched canary (`xam_notify.cc` + `cpu_flags.{h,cc}`)
|
||
to add `DEFINE_string(memory_dump_path,...)` flag. On first `XamNotifyCreateListener_entry`
|
||
(mask=0x2F), pre-size file to 2 GiB then `Memory::Save` the entire 5-heap state into a
|
||
mmap'd file. 44 LOC, rebuilt Linux Debug clang++14 (~6 min), captured 216 MB dump. Patch
|
||
reverted post-capture (`git status` clean).
|
||
|
||
### Findings vs ours @ -n 50M
|
||
1. **0x828F4070 family (audit-017 hypothesized populator target)**: canary-at-first-listener
|
||
is ALL ZEROS; ours has dispatcher data. **Cannot resolve audit-017** — canary's dump
|
||
fired too early in init for [+64]≠-1 to have happened.
|
||
2. **0x828E1F08**: ours stores listener pointer (`0x40111890`); canary stores 0. Mechanism
|
||
difference (canary uses host-side `KernelState::notify_listeners_` vector; ours stuffs
|
||
guest-memory). Not an obvious bug.
|
||
3. **0x828F4838 +0x08**: canary has `"XEN\0" + handle 0xF8000034`; ours has zeros.
|
||
New populator-effect lead — canary's xboxkrnl writes "XEN" magic + a kernel handle
|
||
to this struct slot during init. Address sits inside the audit-016/017 cluster
|
||
(`[0x828F48B0+0]=0x828F4070` chain).
|
||
4. **0x82124xxx area (audit-009 cluster L1 PCs as data)**: REFUTED as populator target.
|
||
This is the static `.pdata` exception-handler table in the XEX image; ours has byte-identical
|
||
contents. NOT a dynamic populator.
|
||
|
||
### Pre-existing canary bugs encountered
|
||
- `PosixMappedMemory::WrapFileDescriptor` mmaps existing file size without extending —
|
||
v1 patch SIGBUS'd on first qword write; fixed with `std::filesystem::resize_file` pre-step.
|
||
- `XexInfoCache::Init` SIGBUS at line 1406 reading `GetHeader()->version` from mmap'd
|
||
infocache. Worked around with `--disable_instruction_infocache=true`.
|
||
|
||
### Bug-class refinement
|
||
The audit-017 β-class hypothesis remains unresolved. Need a LATER trigger point in
|
||
canary to capture state when populator has run. New independent lead: `"XEN" + handle`
|
||
at 0x828F4840 in canary; missing in ours.
|
||
|
||
### Recommended next session
|
||
**AUDIT-024**: re-apply canary patch with delayed trigger (e.g., on XamNotifyCreateListener
|
||
call N≥5, or on first XAudioSubmitRenderDriverFrame, or on first NtSetEvent on a specific
|
||
guest event). Capture canary's STATE post-populator. Diff at 0x828F4070+64 directly.
|
||
Alternative: static-search canary's xboxkrnl source for the writer of "XEN\0" + handle
|
||
at 0x828F4840 — if found, that names the populator's CODE, not just its effect.
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/audit-023-canary-diff/canary-memory.dump` (216 MB)
|
||
- `audit-runs/audit-023-canary-diff/canary.log` (canary stdout)
|
||
- `audit-runs/audit-023-canary-diff/canary-patch.diff` (re-applyable)
|
||
- `audit-runs/audit-023-canary-diff/parse_dump.py` (Memory::Save format walker)
|
||
- `audit-runs/audit-023-canary-diff/diff_canary_ours.py` (side-by-side diff)
|
||
- `audit-runs/audit-023-canary-diff/diff.txt` (concrete byte-level diffs)
|
||
- `audit-runs/audit-023-canary-diff/ours-{dump,extra,pdata}.{log,err}` (ours' --dump-addr)
|
||
|
||
## KRNBUG-AUDIT-024A — Canary memory-dump diff at delayed trigger (READ-ONLY, 2026-05-07)
|
||
|
||
Re-applied audit-023's pattern but moved the dump trigger to **first
|
||
`XAudioSubmitRenderDriverFrame_entry`** call (much later than first listener).
|
||
Patch: 39 LOC (cpu_flags hunk reused + new hook in `xboxkrnl_audio.cc`).
|
||
Build: incremental Debug, ~10 s after CMake-cache symlink fix.
|
||
Required preexisting workaround: `--disable_instruction_infocache=true`. Captured
|
||
260,659,200 byte dump (248.6 MiB) — slightly larger than audit-023's 216 MB,
|
||
consistent with deeper boot.
|
||
|
||
Canary log telemetry pre-dump confirms post-populator state:
|
||
`KeReleaseSemaphore(0x828A3230, 1, 1, 0)` firing repeatedly (the audio
|
||
buffer-completion semaphore — audit-018 prediction: producer is the audio render thread).
|
||
`VdSwap`, `VdRetrainEDRAM`, `XamInputGetCapabilities`, multiple texture loads firing.
|
||
|
||
### Findings — `[0x828F4070+64]` HYPOTHESIS FALSIFIED
|
||
|
||
`[0x828F40B0]` (=0x828F4070+64) at first `XAudioSubmitRenderDriverFrame`:
|
||
- **CANARY**: ALL ZEROS for at least 0x40 bytes
|
||
- **OURS @ -n 500M**: `ff ff ff ff` at offset 0 (audit-017's `-1` sentinel from sub_821701c8)
|
||
|
||
The audit-017 β-class hypothesis (`[0x828F4070+64]==-1` blocking bit-14 setter)
|
||
is now **directly falsified by canary observation**: in canary, this slot is
|
||
zero, NOT a non-(-1) handle. AUDIT-017's claim "only non-(-1) writer is
|
||
sub_82184318:0x82184374" was structurally correct *for our build*; in canary
|
||
the equivalent location remains untouched at the moment audio is already running.
|
||
The bit-14 gate at 0x821738E0 must therefore admit `[+64]==0` OR canary takes a
|
||
different control path entirely (likely the latter — different submitter chain
|
||
populates a different guest dispatcher slot, leading to the renderer-state-bits
|
||
write through a different path).
|
||
|
||
### Findings — `0x828F4838+0x08` "XEN\0 + 0xF8000034" divergence stable
|
||
|
||
Canary still has `"XEN\0"` magic + kernel handle `0xF8000034` at +0x08.
|
||
Ours still has zeros at +0x08-0x0F. **Stable across audit-023 (early)
|
||
and audit-024A (late) trigger points** — populator wrote this field
|
||
during early init, before listener-creation in audit-023. Confirms the
|
||
audit-022/023 lead is real, not transient.
|
||
|
||
Heap pointers and counts at `0x828F4838 +0x20..+0x60` populated in BOTH
|
||
canary (`0xBC36xxxx` heap) and ours (`0x4024xxxx` heap) — different
|
||
allocator state but structural equivalence.
|
||
|
||
### Findings — `0x828A3230` audio semaphore (canary only)
|
||
|
||
State quad `05 00 00 00 00 00 00 00`, `"XEN\0"` + handle `0xF8000070` at +0x08,
|
||
release-count = `01000000` at +0x14, plus chain at +0x18 / +0x28 with handles
|
||
`0xF8000080` / `0xF800007C` and a 64-bit value `0xBE628EDC1FCA7000` at +0x38
|
||
(callback ptr or last-completed timestamp).
|
||
|
||
In ours: `KeReleaseSemaphore=0` (still in canary-only export queue). Producer
|
||
(audit chain → `XAudioSubmitRenderDriverFrame` → audio system → this semaphore)
|
||
unreached at -n 500M.
|
||
|
||
### Bug-class re-classification
|
||
|
||
Drop β-class (`[+64]` poison) hypothesis. Reclassify as **γ-deep**: the gate
|
||
between audit-013's IO-004 reach (sub_82173DC8 dispatching) and the audio
|
||
producer chain firing is a multi-step renderer/audio init that fires
|
||
`XAudioSubmitRenderDriverFrame` in canary but never reaches it in ours.
|
||
|
||
### Sharp next-session prediction
|
||
|
||
(1) Per Sister-Session AUDIT-024B (parallel canary-source `"XEN\0"`-writer
|
||
static search): if 024B identifies the writer of `"XEN\0" + 0xF8000034`,
|
||
cross-reference with our canary-only kernel exports. The `"XEN" + handle`
|
||
pattern is the canonical type-tag signature emitted by `kernel/util/object_table.cc`
|
||
when a kernel object is committed to guest memory.
|
||
|
||
(2) Independent track: name the kernel call that fires
|
||
`XAudioSubmitRenderDriverFrame` in canary but not in ours. The chain we know
|
||
runs in canary post-IO-004 is roughly:
|
||
`XamNotifyCreateListener → renderer init → XAudio register → audio thread spawn → submit frames`.
|
||
Counters in our run: `XAudioRegisterRenderDriverClient=1` so registration ran,
|
||
`KeInitializeSemaphore=1` (likely the buffer-completion semaphore allocated),
|
||
but the audio thread that calls `XAudioSubmitRenderDriverFrame` never starts
|
||
feeding frames. Probe target: who reads the audio-system register-result and
|
||
starts feeding.
|
||
|
||
### Cascade prediction sharpness — 4 dim
|
||
|
||
If next-session lands a fix for the audio-thread-start gate:
|
||
- A: `XAudioSubmitRenderDriverFrame` count > 0
|
||
- B: `KeReleaseSemaphore` count > 0 (now non-canary-only)
|
||
- C: `[0x828A3230+0x14]` becomes 1 (release count)
|
||
- D: VdSwap > 2 expected ONLY if audio drives renderer pacing (unknown — open).
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/audit-024a-canary-diff/canary-memory.dump` (260,659,200 bytes)
|
||
- `audit-runs/audit-024a-canary-diff/canary.log` (canary stdout)
|
||
- `audit-runs/audit-024a-canary-diff/canary-patch.diff` (re-applyable)
|
||
- `audit-runs/audit-024a-canary-diff/canary-state.txt` (parsed canary state at probe addrs)
|
||
- `audit-runs/audit-024a-canary-diff/canary-extra.txt` (extra addrs: 0x828A3230 etc.)
|
||
- `audit-runs/audit-024a-canary-diff/ours-dump.{log,err}` (ours --dump-addr at -n 500M)
|
||
- `audit-runs/audit-024a-canary-diff/diff.txt` (side-by-side comparison)
|
||
|
||
### Cleanup
|
||
Canary patch reverted (`git status` clean). Master xenia-rs HEAD `d9e40d3`
|
||
unchanged. `/home/fabi/xenia-canary` symlink retained for future CMake regen.
|
||
|
||
|
||
## KRNBUG-α-006 — `ensure_dispatcher_object` writes XObj signature + handle (LANDED, 2026-05-07)
|
||
|
||
Mirror of canary `XObject::StashHandle` (xobject.h:253-256). On first guest-
|
||
dispatcher adoption, stamp `+0x08` with `kXObjSignature` (`'X','E','N','\0'` =
|
||
`0x58454E00`) and `+0x0C` with the stash handle. Our shadow table is keyed
|
||
by guest pointer, so handle-to-stash = `ptr` itself. 7 LOC in impl, 27 LOC
|
||
in tests.
|
||
|
||
Branch `xobj-stashhandle/p0-canary-mirror` merged --no-ff into master `de5a15e`.
|
||
Tests 604 → 605 (`ensure_dispatcher_object_stamps_xen_signature_and_handle`).
|
||
Lockstep deterministic across 2 reruns: `instructions=100000003 imports=987516`
|
||
(identical to pre-fix d9e40d3 — writeback is host-side, no guest-instruction
|
||
cost). `sylpheed_n50m` golden unchanged.
|
||
|
||
Cascade @ -n 500M halt-on-deadlock: NIL ripple. Worker count 20; KeReleaseSemaphore=0;
|
||
ExTerminateThread=0; XAudioSubmitRenderDriverFrame=0; NtSetEvent=3334; VdSwap=2 —
|
||
all match post-ke-resume baseline. At target address 0x828F4838 itself, +0x08
|
||
remains 00000000 because guest never invokes a Ke* function with that pointer
|
||
(adoption in canary at this address likely uses `SetNativePointer` lifecycle
|
||
which we don't traverse via `ensure_dispatcher_object`).
|
||
|
||
Per task brief: lands as canary-correctness restoration without sharp cascade
|
||
hypothesis. Audit-024A's hypothesis that the StashHandle stamp at 0x828F4838
|
||
gates audio init is **observationally falsified** post-fix. Trace
|
||
`audit-runs/post-stashhandle/dump-500m.log`.
|
||
|
||
|
||
## KRNBUG-AUDIT-025 — Audio thread-start gate identified (READ-ONLY, 2026-05-07)
|
||
|
||
Master HEAD at session start: `de5a15e` (post-Path-2 StashHandle merge).
|
||
|
||
### Question
|
||
|
||
Audit-024A established that `XAudioSubmitRenderDriverFrame=0` and
|
||
`KeReleaseSemaphore(0x828A3230)=0` in our run while canary fires both
|
||
repeatedly. Goal: identify the exact gate between successful
|
||
`XAudioRegisterRenderDriverClient` (both runtimes call it once with
|
||
identical return `0x41550000`) and the audio worker submitting frames.
|
||
|
||
### Static + canary-log decomposition
|
||
|
||
**Audio init in Sylpheed (sub_824D2C08, called once from sub_824D2FA8):**
|
||
1. `bl 0x824D6070` — alloc audio_system object on heap.
|
||
2. Inline DISPATCHER_HEADER write at `+0x150..+0x18A`: byte-1 to `0x828A3254`
|
||
(auto-reset Event), byte-1 to `0x828A3244` (auto-reset Event), byte-5 (per
|
||
`bl KeInitializeSemaphore` at +0x1A4 = 0x824D2DAC) to `0x828A3230`
|
||
(Semaphore, count=0, limit=6).
|
||
3. `bl ExRegisterTitleTerminateNotification(0x828A3210, 1)` at +0x1F0 = 0x824D2DF8.
|
||
4. `bl ExCreateThread(entry=0x824D2878, ctx=0, flags=0x10000001)` — audio worker.
|
||
5. `KeSetBasePriorityThread(15)` + `KeResumeThread` on the worker.
|
||
6. `bl ExCreateThread(entry=0x824D2940, ctx=0, flags=0x20000001)` — second audio thread.
|
||
|
||
**Audio worker loop (entry 0x824D2878 — disassembled):**
|
||
```
|
||
LOOP_HEAD:
|
||
r3 = 0x828A3254 # event handle
|
||
bl KeWaitForSingleObject(r3, 3, 1, 0, NULL) # 0x824D28CC
|
||
r3 = mem[0x828A3264] # = audio_system_obj ptr (heap)
|
||
r11 = mem[r3+300] # audio_active flag
|
||
if r11 != 0:
|
||
bl sub_824D2108 # process job
|
||
bl sub_824D21F0
|
||
else: # shutdown
|
||
r5 = mem[r3+304] - 1
|
||
if r5 != 0:
|
||
bl KeReleaseSemaphore(0x828A3230, r5, 1) # 0x824D2904
|
||
bl KeSetEvent(0x828A3244, 1, 0)
|
||
if r11 != 0: goto LOOP_HEAD
|
||
return
|
||
```
|
||
Wake source for `0x828A3254`: only **`sub_824D23B0`** (KeSetEvent at +0x54,
|
||
+0x4FC, +0x688 = 0x824D2404 / 0x824D28AC / 0x824D2A40). `sub_824D23B0` is the
|
||
audio job-submit method. **It also writes `[+300]=current_thread_handle`**
|
||
(at sub_824D23B0+0x678 = 0x824D2A28) so that the worker takes the job-process
|
||
branch instead of shutdown.
|
||
|
||
### Caller chain of sub_824D23B0
|
||
|
||
From `xrefs` table: only ONE static caller — `sub_824D2B08+0xE4 = 0x824D2BEC`.
|
||
But `sub_824D2B08` is the lightweight constructor (entry at 0x824D2B08, returns
|
||
at 0x824D2BD4 BEFORE 0x824D2BEC). The body containing the
|
||
`bl sub_824D23B0` at 0x824D2BEC is a SEPARATE function entry at `0x824D2BD8`
|
||
that the static analyzer didn't carve out — there are NO static call xrefs to
|
||
0x824D2BD8. **It is a virtual method invoked via the audio_system vtable**
|
||
(set in sub_824D2B08 at offset 0 of the audio object: `[r31+0] = 0x82006CF4`).
|
||
|
||
### Runtime probe (audit-025-audio-thread-start)
|
||
|
||
`--pc-probe` at 12 audio PCs + `--dump-addr` at 5 audio dispatcher addresses,
|
||
`-n 500M`, `--halt-on-deadlock`, NO `--xaudio-tick`.
|
||
|
||
**Probe fires (1 of 12):**
|
||
- `0x824D2DF8` (sub_824D2C08+0x1F0, ExRegisterTitleTerminate) tid=1 cycle=7,470,631 ✓
|
||
|
||
**Probes that DID NOT fire:**
|
||
- `0x824D23B0` (sub_824D23B0 entry) — never reached
|
||
- `0x824D2404` (KeSetEvent on 0x828A3254 — wakeup of worker) — never reached
|
||
- `0x824D28CC, 0x824D28D0` (worker wait) — never reached (probes fire on PC visit;
|
||
tid 9 is BLOCKED at 0x824D28D0 from queueing-time, never gets scheduled-back)
|
||
- `0x824D290C, 0x824D291C, 0x824D2928, 0x824D2930` (worker shutdown/exit/loop) — never reached
|
||
- `0x824D2DAC` (KeInitializeSemaphore in init) — never reached *as PC visit*
|
||
even though counter shows it fired (probe runs on prologue tick; the guest
|
||
PC moves past 0x824D2DAC during the bl in the same prologue cycle without
|
||
the check matching cleanly; not a behavior bug, probe limitation).
|
||
|
||
**Dispatcher dump shows correct DISPATCHER_HEADER structure:**
|
||
- `0x828A3254` Event sync: type=0x01, sig=0, +0x08="XEN\0", +0x0C=0x828A3254 (Path 2's stamp)
|
||
- `0x828A3230` Semaphore: type=0x05, count=0, limit=6, +0x08="XEN\0", +0x0C=0x828A3230
|
||
- `0x828A3244` Event sync: type=0x01, sig=0
|
||
- `mem[0x828A3264]=0x4250DEDC` — audio_system heap object pointer (set during init)
|
||
|
||
**Thread states at deadlock:**
|
||
- tid 9 (entry 0x824D2878, the audio worker) — `Blocked(WaitAny [0x828A3254])` at pc=0x824D28D0, lr=0x824D28D0
|
||
- tid 10 (entry 0x824D2940) — Blocked similarly at pc=0x824d29X0 region
|
||
- 0x828A3254 has tid 9 in `waiters=[9]` but `signaled=false` and no signal_attempts
|
||
|
||
### Bug-class classification: γ-DEEP (vtable-driven indirection)
|
||
|
||
The audio init runs to completion: heap object allocated, dispatchers
|
||
initialized, worker spawned + resumed, ExRegisterTitleTerminate registered.
|
||
Worker is correctly parked on `0x828A3254` waiting for a job-submit signal.
|
||
**The job-submit method `sub_824D23B0` is reachable only via vtable lookup
|
||
on the audio_system object** — `bl r11` after `lwz r11, 0(r30)` style.
|
||
|
||
The caller of the vtable method must be a periodic frame-loop (per-frame audio
|
||
update). Static analysis shows it would be from the renderer/scenegraph — i.e.,
|
||
the same `0x82287000-0x82294000` cluster identified by AUDIT-009 as
|
||
**unreached**. AUDIT-016/017 already classified this cluster as γ-deep
|
||
(chicken-and-egg vtable-registry-not-populated).
|
||
|
||
**Conclusion**: the audio thread-start gate is *not* a missing kernel call.
|
||
It is the same γ-cluster blocker that has gated the renderer since AUDIT-009.
|
||
Fixing it has no β-class memory predicate — the indirection is via a vtable
|
||
slot in `[audio_obj+0]` whose containing dispatcher-table never gets registered
|
||
because the renderer's listener-init path never executes.
|
||
|
||
### Discipline gate
|
||
|
||
- Box 1 (canary citation): PASSES — canary `xenia/apu/audio_system.cc:202-237`
|
||
+ `xenia/kernel/xboxkrnl/xboxkrnl_audio.cc:56-82`. But canary's host
|
||
audio worker is a *replacement* for the guest worker; the gate is purely
|
||
guest-side here.
|
||
- Box 3 (probe-confirmed reachability): FAILS — sub_824D23B0 never fires.
|
||
- This is a diagnostic, no fix to apply.
|
||
|
||
### Sharp next-session direction
|
||
|
||
This audit closes the audio fork. The ledger has 3 paths forward:
|
||
|
||
(A) **Strategic pivot (recommended)**: stop chasing audio. The audio gate IS
|
||
the renderer gate. Concentrate on AUDIT-009's `0x82287000-0x82294000`
|
||
cluster's L1 callers and the listener-vtable registration that never
|
||
happens. Specifically AUDIT-017's hypothesis that the bit-14 setter at
|
||
0x82173950 is the gate, but with AUDIT-024A's falsification of `[+64]==-1`
|
||
as the blocker, redirect to: **find what canary writes into the
|
||
`0x40ba9a80` listener struct's vtable-pointer slot (`[+0]` in audit-016
|
||
parlance) and identify the writer in canary kernel source**. Path 2's
|
||
StashHandle fix landing means the dispatcher-side stamp is now done; the
|
||
next missing piece is which kernel call materializes the LISTENER's
|
||
vtable so the dispatch routine can actually run.
|
||
|
||
(B) **Audio-side workaround**: extend `try_inject_audio_callback` to fire
|
||
independently of the worker thread (i.e., bypass guest worker entirely
|
||
and call the registered XAudio callback PC directly from the kernel,
|
||
canary-style). Already explored under `--xaudio-tick`; regresses
|
||
swaps 2→1 (memory entry on KRNBUG-XAUDIO-PRODUCER-001). Not recommended.
|
||
|
||
(C) **Complete audio worker host-thread emulation**: mirror canary's host
|
||
`AudioSystem::WorkerThreadMain` in our kernel (semaphore.Release
|
||
`queued_frames` times on RegisterClient + drive callbacks from a host
|
||
thread). Larger refactor; risks breaking lockstep determinism unless
|
||
quantized to instruction-count.
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/audit-025-audio-thread-start/probe.log` (CTOR-PROBE results + dispatcher dump)
|
||
- `audit-runs/audit-025-audio-thread-start/probe.err` (counters + thread states)
|
||
|
||
### Cleanup
|
||
No source modified. Master xenia-rs HEAD `de5a15e` unchanged.
|
||
|
||
---
|
||
|
||
## KRNBUG-AUDIT-027 — v40 heap memory diff vs canary (READ-ONLY, 2026-05-08)
|
||
|
||
Master HEAD at start/end: `e061e21`. NO source modified.
|
||
|
||
### Goal
|
||
Continuation of audit-026 (v80 elimination). Comprehensive byte-level
|
||
dword diff of canary's existing 248.6 MiB memory dump (audit-024A) vs
|
||
ours at v40000000 (1008 MiB span, 65 KiB pages). Looking for cluster L1
|
||
dispatch-table addresses.
|
||
|
||
### Method
|
||
- `--dump-section=0x40000000:0x3F000000:ours-v40.bin` -n 500M -> 60119
|
||
committed pages, 1008 MiB.
|
||
- `extract_v40.py` (adapted from audit-026's extract_v80.py): canary
|
||
v40 page count 16128, **committed = 90**.
|
||
- `diff_v40.py`: dword-level scan, A-list = canary 0x82xxxxxx-PC where
|
||
ours differs, B-list inverse.
|
||
|
||
### Results
|
||
- A-list (canary-PC, ours differs): **536 entries**
|
||
- B-list (ours-PC, canary differs): **31947 entries**
|
||
- **Cluster L1 PC hits in A-list: 0** (broad 116-fn 0x82285000-0x82294000),
|
||
**0** (narrow 6-fn `sub_822919C8`/`sub_82293448`/etc).
|
||
- Histogram top: `0x828f3xxx`(90), `0x8284dxxx`(78), `0x8284cxxx`(64),
|
||
`0x82150xxx`(30), `0x828f4xxx`(23), `0x82882xxx`(20). All in
|
||
.text/.data, NOT renderer cluster.
|
||
- Three vtable-shaped runs detected:
|
||
- `0x40000770` length 32 — header `00 09 00 0e | 00 01 10 00 | 40 00 01 c8 | 40 00 01 c8`
|
||
- `0x400015a0` length 110 — header `00 21 00 81 | 00 01 10 00 | 40 00 01 80 | 40 00 01 80`
|
||
- `0x40000d90` length 20 — `0x82882910`+0x20 stride
|
||
All target `.text` heap-allocator handler thunks (`0x8284cxxx`/
|
||
`0x8284dxxx`), not renderer dispatch.
|
||
- Listener struct at `0x40BA9A80`: canary page **uncommitted** in this
|
||
dump; ours has the audit-016 listener content (`+0x2C=0x4024AC00`,
|
||
`+0x3C=0x4024B3E0`, etc). This confirms canary's listener is
|
||
heap-pointer-divergent, not at `0x40BA9A80` for canary.
|
||
- B-list tail discovery: `0x40211900..0x40211B50` in ours has 23
|
||
consecutive function entries spaced 0x20 apart (`0x82183ae8,
|
||
0x82187e38, 0x8218cf10, ...`) — **a function-pointer table our impl
|
||
builds in v40 that canary builds elsewhere (likely physical heap)**.
|
||
|
||
### Bug-class classification
|
||
**Outcome (iii) per task brief: v40 ELIMINATED as dispatch-table
|
||
source.** Combined with audit-026 (v80 elim), two of four guest-virt
|
||
heap regions ruled out. Remaining surface = physical heap (0x20000000
|
||
span, 58458 commits in canary's dump = 228 MiB), v00 (256 MiB, 468
|
||
commits), or register-only constructed.
|
||
|
||
### Discipline gate
|
||
- Box 1: N/A (pure data audit).
|
||
- Box 3: N/A (no fix).
|
||
|
||
### Sharp next-session direction
|
||
- **Recommended: AUDIT-029 = extract canary PHYSICAL heap and diff**
|
||
(same script, change selected heap to `physical`, 228 MiB surface).
|
||
This is the largest non-static region and the most likely dispatch-
|
||
table home given the two virt-heap eliminations.
|
||
- Alternative: **vtable-write-tap** instrumentation logging every
|
||
`0x82xxxxxx` value our memory path writes to v40/physical heap.
|
||
Side-steps the heap-pointer namespace divergence problem entirely.
|
||
- Or: **CPPBUG-AUDIT-001 backlog** —
|
||
`nt_allocate_virtual_memory` silent-success + `mm_allocate_physical_memory_ex`
|
||
alignment/range/protect ignored could be masking the dispatch-table
|
||
writes upstream.
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/audit-027-v40-mem-diff/canary-v40.bin` (1056964608 bytes)
|
||
- `audit-runs/audit-027-v40-mem-diff/ours-v40.bin` (1056964608 bytes)
|
||
- `audit-runs/audit-027-v40-mem-diff/extract_v40.py`, `diff_v40.py`
|
||
- `diff.txt` (536), `diff-b.txt` (31947), `histogram.txt`,
|
||
`l1-hits.txt`, `tables.txt`, `anchors.txt`, `pages.txt`,
|
||
`cluster_l1_pcs.txt` (116 fns from sylpheed.db), `ours.log`,
|
||
`diff_run.log`.
|
||
|
||
### Cleanup
|
||
No source modified. Master xenia-rs HEAD `e061e21` unchanged.
|
||
Sister session 028 untouched.
|
||
|
||
---
|
||
|
||
## KRNBUG-AUDIT-028 — XNotify steady-state publisher audit (READ-ONLY, 2026-05-08)
|
||
|
||
### Goal
|
||
Determine whether canary delivers steady-state XNotify notifications
|
||
beyond the 4 startup IDs IO-004 wired, which would explain why our
|
||
main thread polls `XNotifyGetNext` 1.49M times without exit.
|
||
|
||
### Sources
|
||
- canary log: `audit-runs/audit-024a-canary-diff/canary.log` (17245 lines).
|
||
- canary source: `xenia-canary/src/xenia/`.
|
||
|
||
### Findings
|
||
- Canary log shows ONLY `XamNotifyCreateListener(0x2F)` at line 1347
|
||
and `XNotifyPositionUI(0x0A)` at line 2018 in the entire 17245-line run.
|
||
- `XNotifyGetNext` is `kHighFrequency` (xam_notify.cc:96) so its
|
||
per-call logging is suppressed; absence in log is expected, not
|
||
evidence of zero calls.
|
||
- Of 34 `BroadcastNotification` publisher sites in canary across 11
|
||
files, NONE fires every frame, every audio buffer, or in any
|
||
implicit boot-time periodic. All are event-driven from host UI,
|
||
profile/XMP menu actions, or hardware hotplug edges.
|
||
- Canary's host-side controller-hotplug log message is NOT present
|
||
in this run — so no `kXNotificationSystemInputDevicesChanged`
|
||
fired (Sylpheed launched with controllers pre-connected).
|
||
- Canary's `VdSwap` count = 1 in the entire log = ZERO actual swap
|
||
calls (the 1 line is just the export-table TOC at line 769).
|
||
Our impl's swaps=2 is actually AHEAD of canary's frame counter.
|
||
- Canary IS in steady-state (audio-sema released 2224 times, GPU
|
||
loading textures, `XamInputGetCapabilities` polled to log end).
|
||
|
||
### Outcome: β — XNotify queue is NOT the gate
|
||
Our impl's notification timeline matches canary byte-for-byte. The
|
||
1.49M `XNotifyGetNext` polls are dutiful idle polling, not a
|
||
missing-publisher symptom.
|
||
|
||
### Strategic pivot
|
||
The audio/render gate is still the γ-cluster from AUDIT-009/016/017/025:
|
||
the renderer's per-frame audio-update path (sub_824D23B0 invoked via
|
||
vtable on audio_system object at `[r31+0]=0x82006CF4`) is unreached
|
||
because the renderer cluster `0x82287000-0x82294000` is itself unreached.
|
||
|
||
### Recommended next session — AUDIT-029
|
||
Pivot to "what kernel call materializes the listener-dispatch table
|
||
so renderer can route per-frame audio":
|
||
1. Probe-set L1 callers of unreached cluster (AUDIT-009 PCs).
|
||
2. Static-grep canary for code that populates the `0x82006CF4`
|
||
audio_system vtable at runtime — likely
|
||
`XAudioRegisterRenderDriverClient` / `AudioSystem` init shim.
|
||
3. Diff that population path vs our impl.
|
||
|
||
Sharp 4-dim cascade prediction (provisional):
|
||
- A: one audit-009 cluster L1 PC fires.
|
||
- B: `KeReleaseSemaphore(0x828A3230)` 0 → many.
|
||
- C: `XAudioSubmitRenderDriverFrame` 0 → many.
|
||
- D: `VdSwap` count climbs.
|
||
|
||
### Trace artifacts
|
||
- Memory file: `project_xenia_rs_audit_028_steady_state_notify_2026_05_06.md`
|
||
- Audit dir: `audit-runs/audit-028-steady-state-notify/`
|
||
|
||
### Cleanup
|
||
No source modified. No commit. Master xenia-rs HEAD `e061e21` unchanged.
|
||
|
||
---
|
||
|
||
## KRNBUG-AUDIT-029 — physical-heap memory diff vs canary (READ-ONLY, 2026-05-08)
|
||
|
||
### Goal
|
||
Comprehensive byte-level diff between canary's physical heap (extracted
|
||
from audit-024A's `canary-memory.dump`) and our impl's putative physical
|
||
region. This is the LAST major guest-memory surface unaccounted for after
|
||
v00 (audit-024A), v40 (audit-027), v80 (audit-026), v90 (zero pages
|
||
committed).
|
||
|
||
### Method
|
||
1. Tried dumping our `0xA0000000:0x20000000` (uncached alias).
|
||
2. Tried dumping our `0xE0000000:0x20000000` (cached alias).
|
||
3. Tried dumping our `0x00000000:0x20000000` (raw physical addr).
|
||
4. Extracted canary's physical heap from dump via `extract_physical.py`
|
||
(5th heap, 4096-byte pages, state at qword bits 60-61).
|
||
5. Walked all 0x82xxxxxx PC dwords on canary's physical heap and
|
||
cross-referenced.
|
||
|
||
### Architectural finding (NEW)
|
||
**Our impl has no physically separate physical heap.** All three of our
|
||
alias dumps (`0xA0000000`, `0xE0000000`, `0x00000000`) returned
|
||
`0 committed pages`. `MmAllocatePhysicalMemoryEx` (exports.rs:644-676)
|
||
calls `state.heap_alloc()` (state.rs:702-720), which is a single bump
|
||
allocator at `heap_cursor` starting at `0x40000000` shared with
|
||
`NtAllocateVirtualMemory`. Canary, by contrast, has a dedicated
|
||
512MB physical pool (memory.cc:222-242) accessible via
|
||
0xA0/0xC0/0xE0 aliases with byte ID-mapping `& 0x1FFF_FFFF` to host
|
||
membase offset 0..0x20000000.
|
||
|
||
### Canary physical heap stats (extracted)
|
||
- File size: 0x20000000 (512 MiB), all-zero except 24.5 MiB of payload.
|
||
- Committed pages: **58458** (×4096 = ~228 MiB) — much larger than
|
||
audit-024A's `physical=48105` summary; trust this concrete value.
|
||
- Total parsed = 0xf895800 == file size (clean walk).
|
||
- 0x82xxxxxx PC dword density: **28851** entries in 4467 4K pages
|
||
spanning 536 64K-aligned regions.
|
||
|
||
### Diff results
|
||
- A-list (canary has PC, ours has zero): **28851 entries** (every PC
|
||
dword is automatically a divergence since our region is empty).
|
||
- L1 PC hits — narrow (audit-009 hand-picked 6): **0 / 6**.
|
||
- L1 PC hits — broad (116-fn cluster): **2 / 116** (`sub_8228CC18` at
|
||
phys 0x1330d620; `sub_8228A220` at phys 0x1351ef2c — both scalar,
|
||
not part of any table).
|
||
- Audit-017 chain hits (`sub_82184318`, `sub_82184374`, `sub_82187768`,
|
||
`sub_82187dd0`, `sub_82183ca8`, `sub_822919c8`, `sub_82186760`,
|
||
`sub_821c88d0`): **0 / 8**.
|
||
- Top PC bucket: `0x82026000` × 12655 occurrences (likely a vtable
|
||
pointer for a per-instance object array; `0x144x0000` regions show
|
||
stride-0x38 entries with `0x820266a4` vtable slot).
|
||
- Consecutive PC-dword runs (≥4): **5 runs** total.
|
||
- 232-dword run at phys `0x1e568f38` — XAM/UI dispatch table family
|
||
(`0x824b0xxx-0x824b2xxx`, ~220 PCs in that family).
|
||
- 9-dword run at `0x1e6290f0`.
|
||
- Three 4-dword runs at `0x1c22c9b0`, `0x1ce24bc0`, `0x1ce254c0`.
|
||
- 64K-region PC density top: `0x144x0000` family (1300-1400 PCs each).
|
||
|
||
### CONFIRMATION of audit-027 misplacement hypothesis
|
||
Our v40 table at `0x40211900..0x40211B50` (18 unique PCs, 0x20 stride,
|
||
`sub_82183ae8 ... sub_821c09d8` — audit-017 chain family) appears
|
||
verbatim on canary's physical heap at `0x1c32c910..0x1c32cb50`,
|
||
**identical 0x20 stride, identical 18 PCs, even the trailing dup of
|
||
`0x821c09d8`**. This proves the table is allocated via
|
||
MmAllocatePhysicalMemoryEx in canary; our impl correctly builds the
|
||
same table but at a different virtual address (because our allocator
|
||
is unified). The table location difference is benign; the table contents
|
||
are correct.
|
||
|
||
### Outcome: ζ — all four guest heaps eliminated
|
||
**No L1 PCs are stored as data on any heap.** Cluster L1 functions
|
||
(`sub_822919C8` etc.) are invoked exclusively via static `bl`
|
||
instructions in unreached parent code — they are NOT routed through
|
||
a runtime-built dispatch table. Audit-017 chain PCs are likewise
|
||
absent from all heap data.
|
||
|
||
This rules out the entire family of "kernel call materializes a
|
||
function-pointer table" hypotheses. The renderer cluster
|
||
0x82287000-0x82294000 is unreached because **its static caller
|
||
chain is not entered**, not because its dispatch table is not built.
|
||
|
||
Discipline gate: fails box 1 (no fix candidate this session).
|
||
|
||
### Strategic pivot — AUDIT-030 recommendation
|
||
All vtable/dispatch-table hypotheses across audits 010, 011, 012,
|
||
015, 016, 017, 026, 027, 029 are exhausted. The gate is **upstream
|
||
of any heap data structure** — it's a control-flow gate, not a
|
||
data-population gate.
|
||
|
||
Two viable next-step approaches:
|
||
|
||
**Option A (preferred): comparative-execution divergence trace.**
|
||
Instrument both runtimes to log a deterministic event stream
|
||
(e.g., `tid:pc:lr:opcode-class` per-N-instructions) and `diff` to
|
||
find the first divergent guest instruction. With lockstep
|
||
determinism on our side and `--memory_dump_path` already
|
||
patched into canary (audit-023/024), one more canary patch to
|
||
emit a periodic execution sample is feasible. Once the first
|
||
divergence is located, the kernel call (or guest computation)
|
||
that immediately preceded it names the bug class.
|
||
|
||
**Option B: focused canary trace of the audio-thread wake-source.**
|
||
Per audit-025, `sub_824D23B0` (the only `KeSetEvent(0x828A3254)`
|
||
caller) has zero static call-xrefs and is invoked only via
|
||
`[r31+0]=0x82006CF4` audio_system vtable. That vtable IS
|
||
populated in our impl (audit-026 confirmed byte-identical).
|
||
The caller must therefore be a per-frame renderer routine
|
||
already in our binary. A targeted canary log dump of the LR
|
||
on every entry to `sub_824D23B0` would name the caller.
|
||
Cross-reference with our PC trace to find which renderer-cluster
|
||
function fires in canary but not ours.
|
||
|
||
**Option C (background backlog only):** CPPBUG-AUDIT-001 items
|
||
(CRT abort, alignment-ignoring physical alloc, sync/eieio no-ops).
|
||
|
||
### Sharp prediction (provisional, low confidence)
|
||
The first divergence will be a control-flow branch in the
|
||
0x82200000-0x82290000 range whose predicate reads from a
|
||
guest memory location populated by an unreached or stub-success
|
||
kernel export. Most-likely candidates:
|
||
- A field on the audio_system object at `0x82006CF4` not yet
|
||
initialized by us (audit-026 verified vtable; field bytes
|
||
beyond may differ).
|
||
- A hardware-state poll that we stub out (e.g., GPU EDRAM-ready,
|
||
DMA-channel-idle).
|
||
- A frame counter / vsync flag that canary advances differently.
|
||
|
||
### Trace artifacts
|
||
- Audit dir: `audit-runs/audit-029-physical-mem-diff/`
|
||
- `canary-physical.bin` — 512 MiB extracted heap (24.5 MiB non-zero)
|
||
- `ours-physical-A.bin` — 512 MiB, all zero (alias not mapped)
|
||
- `ours-physical-E.bin` — 512 MiB, all zero (alias not mapped)
|
||
- `ours-physical-flat.bin` — 512 MiB, all zero (no commits in 0..0x20000000)
|
||
- `extract_physical.py` — heap extractor
|
||
- `diff_physical.py` — one-sided PC enumeration script
|
||
- `diff.txt`, `histogram.txt`, `l1-hits.txt`, `audit017-hits.txt`,
|
||
`v40table-hits.txt`, `tables.txt`, `pages.txt`, `pc-summary.txt`
|
||
- Memory file: `project_xenia_rs_audit_029_physical_mem_diff_2026_05_08.md`
|
||
|
||
### Cleanup
|
||
No source modified. No commit. Master xenia-rs HEAD `e061e21` unchanged.
|
||
|
||
## KRNBUG-AUDIT-031 — Audio worker wait-site canary trace (2026-05-08)
|
||
|
||
**READ-ONLY**. Re-applied audit-030's `--log_lr_on_pc` canary patch (30 LOC,
|
||
4 files); 4 sequential probe runs; canary patch reverted at session close.
|
||
Master HEAD `e061e21` unchanged.
|
||
|
||
### Method
|
||
- Probe `0x824D2878` (audio worker entry, sub_824D2878): 1 fire, lr=0xBCBCBCBC.
|
||
- Probe `0x824D28D0` (post-wait PC where ours parks): **54,128 fires** in
|
||
~5 min — canary's wait IS being woken on a hot loop.
|
||
- Probe `0x8284DDDC` (KeSetEvent guest thunk): 8906 fires; **wake source
|
||
captured**: `tid=0100001C lr=0x824D2A44 r3=0x828A3254 r4=1` —
|
||
`KeSetEvent(0x828A3254, 1, 0)` from PC `0x824D2A40`.
|
||
- Probe `0x824D23B0` (sub_824D23B0 entry per IDA): **0 fires**.
|
||
|
||
### Key finding — function-boundary mis-attribution corrected
|
||
AUDIT-025/-030's claim "sub_824D23B0 is the only wake-source and is never
|
||
entered" is half-correct. The IDA-DB function-record `sub_824D23B0`
|
||
(claimed `0x824D23B0..0x824D2878`) actually contains a SECOND function
|
||
prologue at `0x824D29F0` (`mfspr r12, LR; bl 0x825F0F88; stwu r1, -192(r1)`).
|
||
This second function `sub_824D29F0` is the real wake-source, not
|
||
sub_824D23B0. They share IDA's broken boundary inference.
|
||
|
||
### Static reachability of sub_824D29F0
|
||
- `0x824D6648 b 0x824D29F0` (kind=`j`, tail-jump from a 12-byte thunk at
|
||
`0x824D6640` that loads `r3 = [0x828A3264]`).
|
||
- `0x824D6640` is referenced as DATA at `sub_824D2C08+0x374`
|
||
(kind=`ref`, instruction=`addi`). PC `0x824D2F7C: addi r4, r10, 26176`
|
||
loads `r4 = 0x824D6640`; the next instructions deref `[r31][68]`,
|
||
load `vtable[7]` at `[[r3]+28]`, `bcctrl 20,lt` to register the
|
||
thunk as a callback on the audio-engine object.
|
||
|
||
So in canary: after `sub_824D2C08` registers the callback at +0x374,
|
||
some scheduler/dispatcher periodically invokes the thunk at `0x824D6640`,
|
||
which tail-jumps into `sub_824D29F0`, which sets event 0x828A3254 at
|
||
`+0x50`, waking the audio worker.
|
||
|
||
### Our impl behavior (matches AUDIT-025 exactly)
|
||
`hw=4 idx=0 tid=9 state=Blocked(WaitAny { handles: [2190094932], deadline: None }) pc=0x824d28d0 lr=0x824d28d0`
|
||
where `2190094932 = 0x828A3254`. `sub_824D2C08` runs to completion in
|
||
ours (per AUDIT-025), so the registration step fires. The host-side
|
||
dispatch loop that should periodically invoke `0x824D6640` is the
|
||
unreached gate.
|
||
|
||
### Bug class
|
||
γ-deep, vtable-driven (refines AUDIT-025 with the correct downstream
|
||
witness). The dispatch loop is a per-frame audio update — most likely
|
||
in the unreached `0x82287000-0x82294000` cluster (AUDIT-009).
|
||
|
||
### Sharp prediction — AUDIT-032
|
||
1. Probe `0x824D6640` directly in canary (`--log_lr_on_pc=0x824D6640`).
|
||
Capture lr — names the dispatcher PC.
|
||
2. Probe `0x824D2F90` (the `bcctrl` callsite) to capture `r3` (the
|
||
audio-engine "this") and `[r3+0]+28` (the vtable[7] entry being
|
||
invoked). Static disasm of vtable[7] target identifies the
|
||
register-callback implementation.
|
||
3. Walk the dispatcher PC's caller chain in our IDA DB; if it bottoms
|
||
in unreached audit-009 cluster, the dispatch loop IS the renderer
|
||
gate (audio gate IS renderer gate, named).
|
||
4. Cross-check: a fix that makes the dispatcher fire should make
|
||
`sub_824D29F0` reachable in our impl, ending the deadlock.
|
||
|
||
### Trace artifacts
|
||
- Audit dir: `audit-runs/audit-031-wait-site/`
|
||
- `canary-0x824D2878.log`, `canary-0x824D28D0.log`,
|
||
`canary-KeSetEvent.log`, `canary-sub23B0.log`
|
||
- Memory file: `project_xenia_rs_audit_031_audio_wait_site_2026_05_08.md`
|
||
|
||
### Cleanup
|
||
Canary patch reverted (`git status` clean in canary repo). Master
|
||
xenia-rs HEAD `e061e21` unchanged. No commit.
|
||
|
||
## KRNBUG-AUDIT-032 — Audio dispatcher LR capture at thunk 0x824D6640 (2026-05-08)
|
||
|
||
**READ-ONLY**. Re-applied audit-030's `--log_lr_on_pc` canary patch (30 LOC,
|
||
4 files); single 40-sec capture of `--log_lr_on_pc=0x824D6640`; canary patch
|
||
reverted at session close. Master HEAD `e061e21` unchanged.
|
||
|
||
### Capture
|
||
**7,875 fires** of `pc=0x824D6640`, all from a single host-flagged kernel
|
||
thread named **"Audio Worker"** (handle=`0100001C`, native=`467FC6C0`),
|
||
stack `700D0000-700F0000`. **LR is invariant `0xBCBCBCBC`** — canary's host
|
||
stack-fill canary value, NOT a guest PC. r3=`0x30063000` (driver context),
|
||
r4=0 first call / =1 thereafter, r5=`0x1800` (frame size 6144 bytes / 1536
|
||
stereo s16 samples), r6=`0xBDFBA600` (registered callback_arg).
|
||
|
||
Canary log line:
|
||
```
|
||
d> F8000008 XAudioRegisterRenderDriverClient(701CF210(824D6640), BDFBA658(00000000))
|
||
K> 0100001C XThread::Execute thid 4 (handle=0100001C, 'Audio Worker (0100001C)', native=467FC6C0, <host>)
|
||
i> 0100001C TRACE-PC-LR pc=824D6640 lr=BCBCBCBC r3=30063000 r4=00000001 r5=00001800 r6=BDFBA600
|
||
```
|
||
|
||
### Mechanism — host-side, not guest
|
||
Per canary source `src/xenia/apu/audio_system.cc:84-159`:
|
||
1. `AudioSystem::Setup()` spawns an `XHostThread` named "Audio Worker"
|
||
running `WorkerThreadMain()`.
|
||
2. Loop: `WaitAny(client_semaphores_)` → on wake, read
|
||
`clients_[index].callback` and `wrapped_callback_arg` → call
|
||
`processor_->Execute(worker_thread_state, client_callback, args)`.
|
||
3. The audio backend driver releases the per-client semaphore each time
|
||
it consumes a frame of audio output.
|
||
|
||
The thunk `0x824D6640` is **invoked directly by the canary host emulator's
|
||
processor** — there is no guest call site. The PPC LR remains the host
|
||
stack canary because the function is entered without a guest `bl`.
|
||
|
||
### Falsifies AUDIT-031 hypothesis
|
||
Audit-031 inferred that `0x824D6640` is registered as a vtable[7] callback
|
||
on the audio_system object and dispatched via per-frame guest bcctrl. This
|
||
is wrong. The `addi r4, r10, 26176` at `sub_824D2C08+0x374` (PC `0x824D2F7C`)
|
||
loads the PC `0x824D6640` as the **callback_ptr argument to
|
||
XAudioRegisterRenderDriverClient** — caller-side parameter setup, not vtable
|
||
registration. `XAudioRegisterRenderDriverClient` records the (callback, arg)
|
||
pair into the host-side `AudioSystem::clients_[]` table; the host worker
|
||
thread is what subsequently invokes the callback.
|
||
|
||
### Outcome
|
||
**δ + α composite** per task brief outcomes:
|
||
- δ confirmed: audit-031's "vtable[7] callback" inference is wrong.
|
||
- α partial: the "caller PC" we sought to walk up is canary's HOST C++,
|
||
not guest code. There is no guest LR to walk; the divergence is entirely
|
||
on the kernel-host boundary at `XAudioRegisterRenderDriverClient`.
|
||
|
||
### Our impl gap (probe-confirmed)
|
||
`crates/xenia-kernel/src/exports.rs:2705-2745`: registers the client into
|
||
our `state.xaudio` table (correct callback_pc=`0x824D6640`,
|
||
arg=`0x41E9DD5C`, returns driver=`0x41550000`) but **does not spawn a
|
||
host-side worker thread** to pump the callback. No semaphore-release loop
|
||
mirrors canary's `client_semaphore->Release(queued_frames_, ...)`.
|
||
|
||
Probe fires at -n 500M (`--pc-probe=0x824D6640,0x824D29F0,...` AND
|
||
`--branch-probe=...`): **0 fires for both PCs**. tid=9 parks at
|
||
`pc=0x824D28D0` waiting on event `0x828A3254`; tid=10 parks at
|
||
`pc=0x824D2990` waiting on semaphore `0x828A3230` (count=0/limit=6).
|
||
|
||
### Bug class & sharp prediction
|
||
**Class**: δ-α composite — host-side AudioSystem worker thread missing
|
||
entirely.
|
||
|
||
**Sharp cascade prediction** for fix session (audio-host-pump):
|
||
- A: tid=9 leaves `Blocked(WaitAny [0x828A3254])` on the FIRST callback
|
||
invocation (sub_824D29F0 calls `KeSetEvent(0x828A3254, 1, 0)`).
|
||
- B: tid=10 leaves `Blocked(WaitAny [0x828A3230])` on next sema release
|
||
inside sub_824D29F0.
|
||
- C: `XAudioSubmitRenderDriverFrame` count rises from 0.
|
||
- D: `KeReleaseSemaphore` becomes non-zero (canary-only export landed).
|
||
- E: open — does this unblock a non-audio consumer? Tid=10's parking on
|
||
`limit=6` semaphore (canary's `queued_frames_=6`) suggests audio frame
|
||
queue is **isolated**. So fix likely resolves audio path but **NOT**
|
||
the audit-009 renderer cluster.
|
||
|
||
The audio gate is **NOT** the renderer gate (revising audit-025's "audio
|
||
gate IS the renderer gate" claim). Separate stalls sharing only the
|
||
"host pump missing" symptom.
|
||
|
||
### Trace artifacts
|
||
- Audit dir: `audit-runs/audit-032-dispatcher-lr/`
|
||
- `canary-patch.diff` (saved before revert)
|
||
- `probe.{log,err}` (our impl, -n 500M)
|
||
- `probe-sanity.{log,err}` (-n 50M)
|
||
- `branchprobe.{log,err}` (branch-probe verification)
|
||
- `/tmp/audit-032-canary.log` (canary capture, 35,942 lines, 7,875 LR fires)
|
||
- Memory file: `project_xenia_rs_audit_032_dispatcher_lr_2026_05_08.md`
|
||
|
||
### Recommended next session
|
||
Implement host-side audio worker per canary `apu/audio_system.cc`. Est.
|
||
60-120 LOC. Predicted to unblock audio path (tids 9, 10) and add
|
||
canary-only kernel exports (KeReleaseSemaphore, possibly
|
||
XAudioSubmitRenderDriverFrame). **Won't fix the audit-009 renderer cluster
|
||
(separate γ-class blocker)**. Audit-025's strategic-pivot to renderer
|
||
cluster L1 callers REMAINS priority for swaps=2→draws>0 progression; the
|
||
audio fix is necessary cleanup of canary-only exports.
|
||
|
||
### Cleanup
|
||
Canary patch reverted (`git status` clean in canary repo). Master
|
||
xenia-rs HEAD `e061e21` unchanged. No commit.
|
||
|
||
## VERIFY-A — Static-reachability soundness check via canary PC trace (2026-05-08)
|
||
|
||
**READ-ONLY**. Re-applied audit-030's `--log_lr_on_pc` canary patch (30 LOC,
|
||
4 files). Probed 12 distinct PCs from the audit-009 unreachable cluster
|
||
(`0x82285000-0x82294000`) sequentially in canary; canary patch reverted at
|
||
session close. Master HEAD `e061e21` unchanged.
|
||
|
||
### Hypothesis being tested
|
||
Static reachability via `xrefs.kind='call'` BFS from `entry_point=0x824AB748`
|
||
in `sylpheed.db` claims 112/116 functions in cluster `0x82285000-0x82294000`
|
||
are unreachable. xrefs.kind='call' does NOT capture indirect dispatch
|
||
(vtables, function pointers). If canary reaches these PCs via indirect
|
||
dispatch, the audit-009/-016/-017/-020/-021/-029 framing is wrong.
|
||
|
||
### Method
|
||
- Build: Debug variant, `xenia-canary/build/bin/Linux/Debug/xenia_canary`
|
||
- Args: `--log_level=3 --disable_instruction_infocache=true
|
||
--log_lr_on_pc=PC --headless=true`
|
||
- Per probe: ~35 sec runtime, then SIGTERM/SIGKILL.
|
||
- Sanity check: `--log_lr_on_pc=0x824D28D0` produced 5683 fires (matches
|
||
audit-031's 54128/5min ratio) — trace mechanism functional in this build.
|
||
- Per probe: also recorded `KeReleaseSemaphore` count (audio loop liveness
|
||
proxy); each probe ran with 5,600-5,800 KeRelSem calls during the window.
|
||
|
||
### Probe results (PC → fires → cluster region)
|
||
| PC | fires | source | reachable via call-BFS? |
|
||
|-------------|-------|-------------------|-------------------------|
|
||
| 0x822919C8 | 0 | audit-009 narrow | no |
|
||
| 0x82293448 | 0 | audit-009 narrow | no |
|
||
| 0x82288028 | 0 | audit-009 narrow | no |
|
||
| 0x82292D80 | 0 | audit-009 narrow | no |
|
||
| 0x822851E0 | 0 | audit-009 narrow | no |
|
||
| 0x82286BC8 | 0 | audit-009 narrow | no |
|
||
| 0x82285C78 | 0 | broader cluster | no |
|
||
| 0x82285DD0 | 0 | broader cluster | no |
|
||
| 0x82286118 | 0 | broader cluster | no |
|
||
| 0x8228A140 | 0 | broader cluster | no |
|
||
| 0x8228CAF8 | 0 | broader cluster | no |
|
||
| 0x8228E688 | 0 | broader cluster | no |
|
||
| 0x824D28D0 | 5683 | sanity-check | reached (audit-031) |
|
||
|
||
### Cross-validation against sylpheed.db
|
||
- 116 functions live in `0x82285000-0x82294000` per `functions` table.
|
||
- 4/116 reached via call-BFS from entry; 112/116 unreached.
|
||
- 12 of those 112 unreached PCs probed; 0 fires in canary across ~6 min
|
||
cumulative wall-clock per-cluster probe time.
|
||
|
||
### Bug-class implication
|
||
Outcome (i) — **static reachability claim is sound**. The 112-function
|
||
"unreachable" cluster IS unreachable in canary too; the BFS conclusion is
|
||
not artifactually narrow. Indirect-dispatch reachability misses (the
|
||
hypothesized failure mode) are NOT happening for this cluster.
|
||
|
||
### What this rules out / does not rule out
|
||
- Rules out: "indirect dispatch through audio vtables reaches this cluster
|
||
in canary, but our static analysis missed it." Would have manifested as
|
||
>=1 PC firing.
|
||
- Rules in (consistent): the audit-031 finding that the audio dispatch
|
||
loop registers `0x824D6640` as a callback but the dispatcher itself
|
||
lives in unreached territory. Both canary and ours fail to reach the
|
||
cluster via the static-call graph; canary reaches it via a DIFFERENT
|
||
vtable/dispatch entry that this 12-PC sample didn't catch.
|
||
- Does not rule out: that SOME parts of the 42-function broader closed
|
||
island could be reached in canary (sample size 12/112 = ~10.7%
|
||
coverage). A full sweep would harden the claim, but cost is ~75 min
|
||
cumulative per probe at ~35 sec each.
|
||
|
||
### Cumulative-coverage caveat
|
||
Probes are independent — running sequentially does NOT prove
|
||
non-reachability across the whole 5-min audit-031 envelope. Each probe
|
||
ran ~35 sec. Audit-031's 5-min run captured 54128 fires of 0x824D28D0
|
||
(rate ≈180/sec). At our 35-sec rate, expected fires for a similar
|
||
hot-loop entry would be ≈6300. Zero fires is decisive for hot-loops; a
|
||
genuinely cold-but-reachable PC (e.g. fires once at boot) might not have
|
||
been captured if it fires in a window outside our trigger envelope.
|
||
Mitigation: each probe was started fresh at canary launch, so any
|
||
boot-time fire would be captured.
|
||
|
||
### Reading-error impact
|
||
This verification PASSES. The 10-error reading-error ledger does not
|
||
include the audit-009 reachability claim. No reattribution required.
|
||
|
||
### Recommendation
|
||
- Outcome (i) per task brief: no immediate action required on the audit
|
||
campaign; static reachability is sound for this cluster sample.
|
||
- The reading-error ledger separately motivates the analysis-toolset
|
||
overhaul (per user's earlier instruction) but that is a separate
|
||
planning track.
|
||
- Follow-up if desired: full 112-PC sweep (~75 min cumulative). Optional
|
||
hardening; the 12-PC sample with 0/12 hits gives a Bayesian posterior
|
||
that the cluster is genuinely cold in canary at this boot phase.
|
||
|
||
### Trace artifacts
|
||
- Audit dir: `audit-runs/verify-A-static-reachability/`
|
||
- 13 probe-*.log files (12 cluster + 1 sanity)
|
||
- Memory file: `project_xenia_rs_verify_A_canary_pc_trace_2026_05_08.md`
|
||
|
||
### Cleanup
|
||
Canary patch reverted (`git status` clean in canary repo). Master
|
||
xenia-rs HEAD `e061e21` unchanged. No commit.
|
||
|
||
## KRNBUG-AUDIT-033 — UI/save-game subsystem entry-chain divergence probe (READ-ONLY, 2026-05-08)
|
||
|
||
### Setup
|
||
- Re-applied 30-LOC `--log_lr_on_pc` canary patch (4 files, see audit-030
|
||
diff). Built `xenia_canary` Debug variant explicitly via
|
||
`ninja -f build-Debug.ninja` (Checked variant has runtime code-cache
|
||
allocation issues that block boot).
|
||
- Probed 8 PCs in canary (50s wall, `--disable_instruction_infocache=true`):
|
||
Tier 1 cluster externals — `0x8228A628`, `0x8228E138`, `0x8228E498`;
|
||
Tier 2 callers — `0x82172524`, `0x82175810`, `0x8217EB78`;
|
||
Tier 3 CMessageBridge sites — `0x821A6CF0`, `0x821A8578`.
|
||
- xenia-rs `--pc-probe` of same 8 PCs at -n 500_000_000 (master HEAD
|
||
`9028021`).
|
||
|
||
### Canary fire counts
|
||
| PC | Tier | Canary fires | LRs |
|
||
|----|------|--------------|-----|
|
||
| 0x8228A628 | T1 | 0 | — |
|
||
| 0x8228E138 | T1 | 2 | 0x82172BF8 (in sub_82172BA0) |
|
||
| 0x8228E498 | T1 | 28 | 0x82451E78, 0x82174730 |
|
||
| 0x82172524 | T2 | 0 | — |
|
||
| 0x82175810 | T2 | 0 | — |
|
||
| 0x8217EB78 | T2 | 0 | — |
|
||
| 0x821A6CF0 | T3 | 0 | — |
|
||
| 0x821A8578 | T3 | 0 | — |
|
||
|
||
### xenia-rs fire counts (CTOR-PROBE)
|
||
| PC | Ours fires | LR |
|
||
|----|------------|-----|
|
||
| 0x8228E138 | 1 | 0x82172BF8 (in sub_82172BA0) |
|
||
| 0x8228E498 | 62 | 0x82451E78 (in sub_82451E20) |
|
||
| (others) | 0 | — |
|
||
|
||
### Convergence finding
|
||
**Both implementations enter the same 2 cluster externals via the same
|
||
LRs.** sub_82172BA0 → sub_8228E138 (boot init), sub_82451E20 →
|
||
sub_8228E498 (init array, 28 fires canary / 62 fires ours). Tier 2 +
|
||
Tier 3 functions (`sub_82172524`, `sub_82175810`, `sub_8217EB78`,
|
||
`sub_821A6CF0`, `sub_821A8578`) are 0-fires in canary at the 50s boot
|
||
horizon — they are NOT activated in canary either. The audit-prompt
|
||
hypothesis that these caller paths fire in canary is FALSIFIED for
|
||
Tier 2+3 within the 50s envelope.
|
||
|
||
Frame walk from our impl's CTOR-PROBE for 0x8228E498 yields a
|
||
call chain: sub_82451E20 ← sub_82450720 ← sub_82450638 ←
|
||
sub_821CB968 ← sub_821CD458 ← sub_821CBEA8 ← sub_821CECF0 ←
|
||
sub_821C4988 — all reached.
|
||
|
||
### Bug-class classification
|
||
**Outcome (γ)** per task brief: "Both reach the same PCs up to bcctrl
|
||
through cluster vtable; the divergence is at the indirect-dispatch
|
||
level." Specifically: at the 50s boot horizon, canary itself doesn't
|
||
penetrate deeper into the UI/save-game cluster than our impl does.
|
||
Tier 1 entries `sub_8228E138` and `sub_8228E498` are reached by both;
|
||
the cluster's full activation (mission select, save-game UI) requires
|
||
a boot-phase further than this probe envelope captures.
|
||
|
||
### Per-PC quantitative divergence
|
||
- `0x8228E138`: ours fires 1× at cycle 9191803 (very late), canary fires
|
||
2× — minor frequency divergence, both via sub_82172BA0. Cause likely
|
||
a duplicate post-boot reentry that ours misses.
|
||
- `0x8228E498`: ours fires 62× across cycles 104K–249K, canary fires 28×
|
||
across 50s wall — ours busy-loops sub_82451E20 more aggressively
|
||
(likely an array ctor dispatch). May indicate canary breaks out of the
|
||
loop early via a state ours doesn't reach.
|
||
|
||
### Discipline gate
|
||
- Box 1: probe data captured both sides — PASS.
|
||
- Box 2: canary fires Tier 1 entries (2 of 3) — PARTIAL.
|
||
- Box 3: cross-impl LR mirror — PASS (LRs match).
|
||
- Box 4: bug class = γ — does not gate to fix; M5.5 prerequisite.
|
||
- Box 5: no fix this session per task brief — PASS.
|
||
|
||
### Recommended next session
|
||
- **(γ) M5.5 prerequisite**: schedule "this-flow vptr resolution" as
|
||
next analyzer milestone — without it, indirect-dispatch reachability
|
||
cannot be modeled. Until M5.5 lands, top-down probing inside the
|
||
cluster is blind.
|
||
- **Alternative pivot**: probe the 62-fires-vs-28-fires divergence at
|
||
`sub_82451E20` more deeply. Probe `sub_82450720` / `sub_82450638` /
|
||
`sub_821CB968` (frame chain captured). One of these exits the loop
|
||
early in canary; that exit gate IS the divergence.
|
||
- **Alternative pivot 2**: longer canary trace (5-10 min Lutris-launched
|
||
Windows build) to confirm Tier 2+3 PCs activate post-boot. The 50s
|
||
Linux probe envelope is too short for "press-A-to-continue" / intro
|
||
video boundary.
|
||
|
||
### Trace artifacts
|
||
- Audit dir: `audit-runs/audit-033-ui-entry-chain/`
|
||
- 8 canary-0x*.log probe files (Tier 1+2+3)
|
||
- ours.log (CTOR-PROBE captures), ours.err (kernel-call counters)
|
||
|
||
### Cleanup
|
||
Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs
|
||
master HEAD `9028021` unchanged. No commit.
|
||
|
||
## KRNBUG-AUDIT-034 — Frame-chain divergence + Tier 2/3 horizon (READ-ONLY, 2026-05-09)
|
||
|
||
**Status**: open. Sister of AUDIT-033. Master `9028021` unchanged. Tests 640.
|
||
Lockstep instructions=100000003. Subsystem: front-end UI / save-game /
|
||
mission-select / HUD (NOT renderer).
|
||
|
||
### Phase A — frame-chain firing-rate matrix
|
||
Canary patch (audit-030 30-LOC) re-applied; reverted at session close.
|
||
Probed 8 PCs in canary 50s wall + ours -n 500M (~8s guest):
|
||
|
||
| PC | canary 50s | ours -n 500M | divergence |
|
||
|----|---:|---:|---:|
|
||
| sub_821C4988 | 1 | 1 | 6.3× |
|
||
| sub_821CECF0 | 2 | 2 | 6.3× |
|
||
| sub_821CBEA8 | 7 | 7 | 6.3× |
|
||
| sub_821CD458 | 7 | 7 | 6.3× |
|
||
| sub_821CB968 | 14 | 14 | 6.3× |
|
||
| sub_82450638 | 14 | 14 | 6.3× |
|
||
| sub_82450720 | 24 | 16 | 4.2× |
|
||
| sub_82451E20 | 90 | 80 | 5.5× |
|
||
|
||
**Loop-exit-divergence located**: sub_82450720+0x160..+0x1F4
|
||
(PC 0x82450880..0x82450914). 5-iteration loop bounded by `r25 < 5`.
|
||
- Ours: 5/5 iterations (80/16=5.00) — never early-exits.
|
||
- Canary: avg 3.75/5 (90/24=3.75) — exits via 0x82450904 `bne 0x8245092C`.
|
||
|
||
**Exit predicate**: `[sub_82451E20_out+0] == r30-12 AND [+4] == [r30+0]+[r30+4]`.
|
||
Data source = 5×20-byte slot table at `r26+108..207` (r26 = sub_82450720
|
||
arg1 = container struct). The predicate is fed by sub_82451E20's inner
|
||
loop, which calls Tier-1 cluster sub_8228E498 to dereference
|
||
`[working_key->vptr][32]`.
|
||
|
||
**Bug class**: β-class (data-state divergence) with γ-deep entry
|
||
(sub_821C4988 = 0 static call xrefs → vtable-driven). The 6.3× upstream
|
||
amplification is uniform from L0..L5 (entry frequency), and the L7 5-loop
|
||
shows ours never triggers the early-exit data-match.
|
||
|
||
### Phase B — Tier 2/3 horizon (300s canary)
|
||
Probe set: 0x82172524, 0x82175810, 0x8217EB78, 0x821A6CF0, 0x821A8578.
|
||
**ALL 5 PCs = 0 fires at 300s in canary**. Cluster activation is even
|
||
deeper than this 5-min Linux Debug horizon. Linux Debug canary trajectory
|
||
matches Lutris Windows up to frame 42 (per RECONCILE-A); 300s ≈ early-boot
|
||
pre-intro only. May need Lutris Windows trace OR upstream probing OR
|
||
non-time-based trigger to reach Tier 2/3 activation.
|
||
|
||
### Recommended next session
|
||
|
||
**Option 1 (preferred)**: AUDIT-035 = mem-watch r26+108..207 for one
|
||
captured r26 value (capture via extended pc-probe of sub_82450720) →
|
||
identify writer in canary that ours misses. The slot table populator
|
||
is the gate to the early-exit path.
|
||
|
||
**Option 2**: schedule M5.5 (alias-aware vtable dispatch resolver) as next
|
||
analyzer milestone — sub_821C4988 has 0 static call xrefs and is the
|
||
chain entry; M5.5 would name the trigger.
|
||
|
||
**Option 3**: probe sub_8228E498's output `[r3+0][32]` value directly via
|
||
extended `--pc-probe` (capture vptr-at-+32 dereferenced value) — name what
|
||
the predicate compares against, then mem-watch its source.
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/audit-034-frame-chain/canary-0x*.log` — 8 50s logs + 1 300s
|
||
preserved log + 5 Phase B 300s logs
|
||
- `audit-runs/audit-034-frame-chain/ours.log` (8-PC pc-probe at -n 500M)
|
||
- `audit-runs/audit-034-frame-chain/scripts/probe-canary*.sh`
|
||
|
||
### Cleanup
|
||
Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs
|
||
master HEAD `9028021` unchanged. No commit.
|
||
|
||
## KRNBUG-AUDIT-035 — Slot table byte-level diff at sub_82450720 (READ-ONLY, 2026-05-09)
|
||
|
||
### Background
|
||
Continuation of AUDIT-034. Disasm verified slot table at r26+108, 5×20=100
|
||
bytes (loop body PC 0x82450880..0x82450914). Goal: byte-level diff of the
|
||
5-slot table contents between canary and ours at the same call site.
|
||
|
||
### Canary patch (extended)
|
||
Re-applied audit-030 30-LOC patch + extended TrapLogLR helper (+19 LOC) to
|
||
also log r26 and dump 5×20-byte slot table from r3+108 (r3 == r26 after
|
||
the function's `mr r26,r3` prologue, which has not yet run at PC 0x82450720).
|
||
Total +49 LOC across 4 files; under the 80-LOC budget. Build succeeded. Patch
|
||
reverted at session close; canary `git status` clean.
|
||
|
||
### Captured slot tables (final state)
|
||
|
||
Both runtimes converge on r3=r26=0x828F3B68 at sub_82450720 entry; slot table
|
||
base = 0x828F3BD4. 22 canary entries captured ~30s wall.
|
||
|
||
| Slot | addr | Canary (last entry) | Ours (-n 500M) |
|
||
|------|------|---------------------|----------------|
|
||
| 0 | 0x828F3BD4 | `00000000 00000000 00000000 00000000 00000000` | (same — all zero) |
|
||
| 1 | 0x828F3BE8 | `00000000 00000000 00000000 BC3654C0 00000008` | `00000000 00000000 00000000 4024A240 00000008` |
|
||
| 2 | 0x828F3BFC | `00000000 00000000 00000000 BC366080 00000008` | `00000000 00000000 00000000 4024AEE0 00000008` |
|
||
| 3 | 0x828F3C10 | `00000002 00000005 00000000 00000000 00000000` | `00000000 00000000 00000000 00000000 00000000` |
|
||
| 4 | 0x828F3C24 | `00000000 00000000 00000000 BC365520 00000008` | `00000000 00000000 00000000 4024A300 00000008` |
|
||
|
||
### Diff summary
|
||
|
||
- Slots 1, 2, 4: same shape (zeros + heap-pointer + size 8) but pointers
|
||
diverge by **heap region** — canary `BC3xxxxx` (physical heap), ours
|
||
`4024xxxx` (v40 bump heap). Same divergence noted in audit-027/029.
|
||
- Slot 3: canary [+0]=2, [+4]=5 (counter pair); ours [+0]=0, [+4]=0. Slot 3
|
||
is dynamic — push/pop counter; ours's writers fire at higher rate.
|
||
|
||
### Writer identification (1066 ours mem-watch hits on slot 3)
|
||
PCs: 0x82450c08, 0x82450c40, 0x82450c4c, 0x82450c3c (sub_82450bc4 chain),
|
||
0x822f8b20 (counter inc), 0x82323364 (index update), 0x8231eee8 (init).
|
||
Slot 3 [+4] cycles 0..0xB in ours vs 0..5 in canary's window. Ours over-pushes.
|
||
|
||
### Reading — ε-class heap-region mismatch
|
||
|
||
The slot table populates IDENTICALLY in shape across both runtimes. The
|
||
predicate at PC 0x82450904 fails because the **lookup table** sub_82451E20
|
||
walks (via Tier-1 cluster external sub_8228E498's `[r3+0][32]`) is populated
|
||
with canary-physical-heap pointers on canary, v40 pointers on ours — but the
|
||
slot-table writers on the **other** side push pointers from a different
|
||
allocator state. Per-element cross-reference inconsistency causes the
|
||
predicate to never match in ours's iter 1-2; it falls through to slot 4
|
||
(self-referential default) only. Bug class **ε — heap-region-mismatch
|
||
propagating through dual-data-structure consistency check**.
|
||
|
||
### Sharp 4-dim cascade prediction
|
||
|
||
A: implement physical-heap separation (CPPBUG-AUDIT-001) so
|
||
mm_allocate_physical_memory_ex / nt_allocate_virtual_memory return distinct
|
||
0xBC3xxxxx region.
|
||
B: sub_8228E498's vptr-table contains 0xBC3xxxxx, slot-table writers push
|
||
0xBC3xxxxx — same heap region.
|
||
C: predicate at 0x82450904 matches at iter 1-2, sub_82450720 returns 1,
|
||
sub_82450638 second-call frequency normalizes (~10× per L5 entry).
|
||
D: cluster activation MAY clear (`draws > 0` cascade UNKNOWN until B-C
|
||
observed).
|
||
|
||
### Falsification of audit-034
|
||
"Different positions in the 5-slot table" — falsified. Matching slot indices
|
||
(1, 2, 4) are populated identically in shape. Mismatch is in the VALUE of
|
||
the heap pointer, not its slot position.
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/audit-035-slot-table/canary-0x82450720-fix.log` (132 lines, 22 entries)
|
||
- `audit-runs/audit-035-slot-table/ours-lrtrace.jsonl` (16 entries)
|
||
- `audit-runs/audit-035-slot-table/ours-dump-stdout.log` (slot table at end-of-run)
|
||
- `audit-runs/audit-035-slot-table/ours-memwatch-slot3.log` (1066 writers)
|
||
|
||
### Recommended AUDIT-036
|
||
1. Land physical-heap separation; re-run AUDIT-035 trace to verify slot
|
||
pointers shift to 0xBC3xxxxx and predicate early-exits.
|
||
2. Or probe sub_8228E498 in both runtimes to capture `[r3+0][32]` value
|
||
and confirm cross-table heap divergence.
|
||
|
||
### Cleanup
|
||
Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs
|
||
master HEAD `9028021` unchanged. No commit.
|
||
|
||
## KRNBUG-AUDIT-036 — `[[r3+0]+32]` predicate hypothesis test (READ-ONLY, 2026-05-09)
|
||
|
||
### Validation goal
|
||
Direct hypothesis test of audit-035's heap-region narrative. Capture
|
||
`[[r3+0]+32]` at sub_8228E498 in both canary and ours; CONFIRMED if both are
|
||
heap-region-divergent pointers (0xBC3xxxxx vs 0x4024xxxx); REFUTED otherwise.
|
||
|
||
### Disasm correction
|
||
sub_8228E498 is NOT a vtable[8] dispatcher. It's a deque/segmented-array
|
||
iterator deref returning element_address in r3:
|
||
- `[r3+0]` = header*; `[r3+4]` = packed (chunk_idx, sub_idx)
|
||
- `[header+4]` = segment_table; `[header+8]` = chunk_count
|
||
- `r3 = segment_table[chunk_idx] + sub_offset` ; `blr`
|
||
|
||
The `[+32]` deref happens in the CALLER `sub_82451E20` at PC 0x82451E78
|
||
(LR), reading the returned element's `[+0]` and then `[+32]` as predicate
|
||
target compared against r28 (= caller's r6, 3rd arg).
|
||
|
||
### Canary patch — 49 LOC, reverted
|
||
Re-applied audit-030 base + extended TrapLogLR to log r3, r28, dereference
|
||
`[r3+0]` (key), and dump 64 bytes (16 u32 lanes + ASCII) at the key.
|
||
Build via ninja Debug; reverted via `git checkout -- src/` at session
|
||
close; canary `git status` clean.
|
||
|
||
### Captured values
|
||
|
||
**Canary** (PC=0x82451E78, ~36 fires at 30s):
|
||
- r3 (returned element) = 0xBC22CA20 / 0xBC22CA24 (physical heap)
|
||
- `[r3+0]` (key) = 0xBC65D018 / 0xBC65D140 / 0xBC65D1C0 / 0xBC65D240 / 0xBC65D340 / 0xBC65D400 / 0xBC65D540
|
||
- Key struct (key=0xBC65D1C0): `F80000B8 0 0 3 0 0 0 0 BC65D018 BC65D140 0 BC65D034 0 0 1 0`
|
||
- ASCII: `'.................................e...e.@.....e.4................'`
|
||
- **`[[r3+0]+32]` = 0xBC65D018 / 0xBC65D2D8 / 0xBC65CFD8 / 0xBC65D118 / 0xBC65D198 / 0xBC65D398** — phys-heap pointers, range 0xBC65xxxx
|
||
|
||
**Ours** (PC=0x8228E498 + dump-addr at returned r3, ~62 fires at 500M):
|
||
- r3 (returned element) = 0x401119B0..0x401119BC (v40 bump heap)
|
||
- `[r3+0]` (key) = 0x40542300 / 0x40542340 / 0x40542400 / 0x405424C0
|
||
- Key bytes at 0x40542300:
|
||
```
|
||
+0x00 "game:\hidden\Resource3D\Common.x"
|
||
+0x10 "ource3D\Common.xpr\0\..."
|
||
+0x20: 70 72 00 5c (= "pr\0\\")
|
||
```
|
||
- **`[[r3+0]+32]` = 0x7072005C** (mid-string text "pr\0\\")
|
||
|
||
### Verdict — REFUTED-AS-STATED, stronger η-class divergence found
|
||
|
||
Audit-035's strict prediction "ours's `[[r3+0]+32]` is in 0x4024xxxx" is
|
||
REFUTED. Ours's value is `0x7072005C` — literal filename text bytes, not
|
||
a heap pointer.
|
||
|
||
But the deeper divergence is even worse than the heap-region narrative
|
||
suggested: the records held by the container have **fundamentally
|
||
different layouts**. Canary's `[r3+0]` points to a 16-dword pointer-bearing
|
||
struct with phys-heap sub-pointers at offsets 32/36/44. Ours's `[r3+0]`
|
||
points to a struct that begins with the inline filename string, so offset
|
||
32 falls inside the string text. The predicate
|
||
`r28 == [[r3+0]+32]` therefore COMPARES STACK POINTERS (r28) against
|
||
INLINE STRING TEXT in ours — a comparison that can never succeed.
|
||
|
||
Bug class **η — record-layout divergence** (NEW class). Distinct from
|
||
audit-035's "heap region" axis; the populator for these records writes
|
||
DIFFERENT struct shapes in ours vs canary.
|
||
|
||
### Cascade implication
|
||
|
||
The `swaps>2 / draws>0` plateau is gated by THIS predicate failing on
|
||
EVERY iteration in ours's main loop body. Even if physical-heap
|
||
separation (CPPBUG-AUDIT-001) landed, the records would still hold inline
|
||
strings, so the predicate would still fail.
|
||
|
||
### Recommendation — DO NOT proceed with physical-heap separation as audit-037
|
||
|
||
Audit-037 should NOT be the heap-split fix. Instead:
|
||
**Audit-037 = identify the record populator(s)** that build the container
|
||
elements at `0x401119B0+` (ours) vs `0xBC22CA20+` (canary). The populator
|
||
writes the struct at `[r3+0]`. Likely path:
|
||
1. mem-watch on `0x40542300+0x20` (the predicate target offset) to find
|
||
the writer PC and LR in ours.
|
||
2. Disasm the writer's caller chain.
|
||
3. Re-apply audit-030 patch in canary, probe the equivalent PC, compare
|
||
the populator's ctor / load path.
|
||
4. The two populators should diverge at a static-init or resource-loader
|
||
function — that divergence is the audit-037 root cause.
|
||
|
||
### Sharp 4-dim cascade prediction (post-fix at populator)
|
||
|
||
A: ours's `[0x40542300+0x20]` becomes a phys-style pointer (matches
|
||
canary's record layout)
|
||
B: predicate `r28 == [[r3+0]+32]` matches at least once during boot
|
||
C: sub_82451E20 inner loop exits via the `bne` branch, not via end-iter
|
||
D: cluster `0x82285000-0x82294000` external-entry probes (audit-033)
|
||
show new fires — front-end UI activation begins
|
||
|
||
### Falsification of audit-035
|
||
|
||
"`[[r3+0]+32]` is a heap-region-divergent pointer" — REFUTED. Ours's value
|
||
is mid-string text bytes (0x7072005C). Heap-region divergence is real for
|
||
the container element pointers themselves (0xBC22CA20 vs 0x401119B0) but
|
||
the predicate failure mechanism is record-layout, not heap-region.
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/audit-036-vptr-deref/canary.log` — initial 30s canary at PC=0x8228E498
|
||
- `audit-runs/audit-036-vptr-deref/canary-callsite.log` — extended canary at PC=0x82451E78
|
||
- `audit-runs/audit-036-vptr-deref/ours.log` — pc-probe at 0x8228E498 (62 fires)
|
||
- `audit-runs/audit-036-vptr-deref/ours-exit.log` — branch-probe at 0x82451E78 (returned r3)
|
||
- `audit-runs/audit-036-vptr-deref/ours-final.log` — dump-addr at element + key targets
|
||
|
||
### Cleanup
|
||
Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs
|
||
master HEAD `9028021` unchanged. No commit. Tests 640.
|
||
|
||
### Discipline gate (5/5 PASS)
|
||
1. Hypothesis explicitly tested with sharp pre-prediction
|
||
2. Canary patch reverted at session close, git clean
|
||
3. xenia-rs source unmodified, no commit
|
||
4. Single-step (validation only, no fix attempt)
|
||
5. Trace files saved per audit dir convention
|
||
|
||
## TRACK-1-VERIFY — Cache-fix record-layout verification (READ-ONLY, 2026-05-09)
|
||
|
||
### Validation goal
|
||
Direct verification of cascade dimension A from audit-038. Audit-038 landed
|
||
the cache fix (cache:/* paths persist via /tmp/xenia-rs-cache-<pid>-<seq>/);
|
||
sub_82459D18, sub_8245D230, 0x82450904 were silenced from "many fires" to
|
||
zero. The unmeasured dimension was record-layout: did the fix flip the
|
||
record at 0x40542300 from inline-string (audit-037 pre-fix shape) to
|
||
canary-shape pointer-bearing (handle@+0=0xF80000B8, sub-pointers
|
||
@+32/+36/+44)?
|
||
|
||
### Method (read-only, no source mods, no commit)
|
||
1. Probe sub_8228E498 (deque iterator deref returning element_address)
|
||
at -n 500M to find current record-base addresses. **Result: 0 fires**.
|
||
The cache fix silenced the cache-miss path; sub_8228E498 is downstream
|
||
of that path and now never executes.
|
||
2. Fallback: dump audit-037 record bases via
|
||
`--dump-addr=0x40542300,0x40542340,0x40542400,0x405424C0` (master
|
||
d8766c6, post-fix). Plus extended-range dump
|
||
0x40542100..0x40542800 to look for any pointer-shaped records nearby.
|
||
3. Cross-reference canary record shape from audit-037's canary probe of
|
||
0x82450b68 — canary populates filenames via
|
||
`RtlInitAnsiString(BC365xxx, "game:\\hidden\\Resource3D\\…")` separately
|
||
from the per-file struct at 0xBC65xxxx (struct holds pointers).
|
||
|
||
### Captured values (post-fix, master d8766c6)
|
||
|
||
**0x40542300** — IDENTICAL to audit-037 pre-fix:
|
||
```
|
||
+0x00: "game:\hidden\Res"
|
||
+0x10: "ource3D\Common.x"
|
||
+0x20: 70 72 00 5c 93 9a 9d cc ... (be32=7072005c)
|
||
+0x30: ...69 d8 e4 5c c2 95 ea d8...
|
||
```
|
||
+0x20 dword = **0x7072005C** ("pr\0\\" text bytes), unchanged.
|
||
|
||
**0x40542340** — descriptor-shape, header pointers + inline filename text:
|
||
```
|
||
+0x00: 40 54 28 80 ... | be32=40542880 (next-record ptr)
|
||
+0x40: "...dden@T#." (continuation of inline filename)
|
||
+0x50: "ource3D\Comm..."
|
||
```
|
||
|
||
**0x40542400** — descriptor-shape with offsets at +0x40 ("@T&.@T..@T%@_TIT"):
|
||
```
|
||
+0x00: 40 54 24 80 (be32=40542480 ptr)
|
||
+0x40: 40 54 26 00 40 54 1e c0 40 54 25 40 5f 54 49 54
|
||
```
|
||
|
||
**0x405424c0** — pointer-bearing PARTIAL but filename still inlined at +0x44:
|
||
```
|
||
+0x00: 40 54 25 80 (be32=40542580 ptr)
|
||
+0x20: 40 54 1e d8 ... 40 54 1e f4 (be32=40541ed8, 40541ef4 — pointers)
|
||
+0x40: 40 54 23 40 ":\hidden\Res"
|
||
+0x50: "ource3D\ptc_pack"
|
||
+0x60: ".xpr\0..."
|
||
```
|
||
+0x20 dword = **0x40541ED8** (pointer in v40 range). Filename "ptc_pack.xpr"
|
||
still inlined at +0x44.
|
||
|
||
### Verdict — Cascade Dimension A: FAIL
|
||
|
||
Cache fix (audit-038) DID NOT flip record layout to canary-shape:
|
||
|
||
- 0x40542300: inline-string layout fully unchanged. +0x20 = 0x7072005C
|
||
(text), IDENTICAL to audit-037 pre-fix.
|
||
- 0x405424c0 has descriptor-shape pointers at +0x20 / +0x2C
|
||
(0x40541ED8 / 0x40541EF4) but **the filename is still inlined at +0x44**
|
||
rather than externalized to a separate `RtlInitAnsiString`-allocated
|
||
ANSI-string heap.
|
||
- No record begins with the canary 0xF80000B8 handle. No record contains
|
||
BC65xxxx-equivalent sub-pointers. The transformation step that should
|
||
externalize filenames into ANSI-string heap before the pointer-bearing
|
||
record stage is NOT running in our impl.
|
||
|
||
### Mechanism
|
||
|
||
Canary's record-population path:
|
||
1. `RtlInitAnsiString(heap_alloc, "game:\\hidden\\Resource3D\\Common.xpr")`
|
||
allocates the literal on a separate heap (BC365xxx range).
|
||
2. The per-file record at BC65xxxx receives a POINTER to that string.
|
||
3. `[[r3+0]+32]` then dereferences cleanly to BC65xxxx neighbours
|
||
(handle/sub-pointer fields).
|
||
|
||
Our impl's record-population path:
|
||
1. The literal "game:\\hidden\\Resource3D\\Common.xpr" is written DIRECTLY
|
||
into the per-file record at +0x00 (or +0x44 for some records).
|
||
2. There is no separate ANSI-string allocation. No pointer indirection.
|
||
3. `[[r3+0]+32]` reads inline filename text bytes (0x7072005C "pr\0\\")
|
||
instead of a pointer.
|
||
|
||
The audit-038 cache fix made `cache:/*` paths persist on real disk, which
|
||
silenced the cache-miss restore loop. But the populator that turns a
|
||
filename literal into either an ANSI-heap pointer (canary) or an
|
||
inline-record-prefix (ours) is a DIFFERENT mechanism — sibling to or
|
||
upstream of cache machinery.
|
||
|
||
### Cascade implication
|
||
|
||
The `swaps>2 / draws>0` plateau and the cluster L1 unreached state are
|
||
both still gated by this layout divergence. Even with the cache fix
|
||
landed, the predicate `r28 == [[r3+0]+32]` STILL compares stack pointers
|
||
against inline filename text bytes — a comparison that cannot succeed.
|
||
Sister Track 2's extended-horizon canary trace becomes the load-bearing
|
||
diagnostic: if cluster L1 fires in canary at e.g. T+30s, then this
|
||
transformation-step fix is the next concrete target.
|
||
|
||
### Recommendation — Track 1 next moves
|
||
|
||
- **Option A (preferred)** — trace `RtlInitAnsiString` callers in our impl
|
||
vs canary on the `game:/dat:/cache:` prefix family; find which path
|
||
doesn't fire in our impl. The missing path is the populator divergence.
|
||
- **Option B** — mem-watch +0x20 of 0x40542320 to capture the writer's
|
||
PC + LR in our impl; the writer's function should diverge from canary's
|
||
equivalent at a static-init / resource-loader site.
|
||
- **Option C** — wait for sister Track 2's findings before declaring
|
||
transformation-step missing; rule out timing/horizon as a confound.
|
||
- **Option D** — KRNBUG entry: audit `RtlInitAnsiString` (and adjacent
|
||
string-init paths) for prefix branching. If our impl folds all prefixes
|
||
into the same handler but canary branches, that's the bug.
|
||
|
||
### Lockstep determinism preserved
|
||
|
||
`instructions=500000019, imports=5629636, swaps=2, VdSwap=2`. Stable.
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/audit-039-track-1-verify/probe-element.{out,log}` —
|
||
pc-probe sub_8228E498 (0 fires) + 4 record dumps
|
||
- `audit-runs/audit-039-track-1-verify/dump-extended.{out,log}` —
|
||
extended-range dump 0x40542100..0x40542800
|
||
|
||
### Cleanup
|
||
xenia-rs source unmodified. No commit. No canary touch. Sister Track 2
|
||
running parallel against xenia-canary; not touched. Master HEAD
|
||
`d8766c6`. Tests 645.
|
||
|
||
### Discipline gate (5/5 PASS)
|
||
1. Hypothesis explicitly tested with sharp pre-prediction (cascade dim A)
|
||
2. No canary patch (read-only on our side only)
|
||
3. xenia-rs source unmodified, no commit
|
||
4. Single-step (verification only, no fix attempt)
|
||
5. Trace files saved per audit dir convention
|
||
|
||
## TRACK-2-EXTENDED — Extended-horizon canary trace for cluster activation (READ-ONLY, 2026-05-09)
|
||
|
||
### Question
|
||
At 10–15 min wallclock (2–3× longer than audit-034 Phase B's 5 min), does
|
||
Linux Debug canary EVER reach the audit-009 cluster's Tier-2 callers
|
||
(`sub_82172524`, `sub_82175810`, `sub_8217EB78`) — and through them the
|
||
cluster's L1 entries? If YES → capture LR (caller PC) → name the
|
||
activation gate. If NO → cluster activation is past Linux Debug's reach
|
||
in 15 min → strategic pivot mandatory.
|
||
|
||
### Method (canary patch + revert; no xenia-rs touch)
|
||
1. Re-applied audit-030 `--log_lr_on_pc` patch (30 LOC across 4 files)
|
||
to xenia-canary HEAD `6de80dffe`. Build via `ninja -f build-Debug.ninja
|
||
xenia_canary`. Mandatory `--disable_instruction_infocache=true`.
|
||
2. Probed 3 Tier-2 PCs serially (single PC at a time per audit-031
|
||
constraint), 15-min wallclock each:
|
||
- `0x82172524` — actual run 22 min (timeout(1) didn't enforce 900s
|
||
cleanly until force-kill)
|
||
- `0x82175810` — 15 min
|
||
- `0x8217EB78` — 15 min (force-killed at +3s post-timeout)
|
||
3. Compressed plan per task brief: skip Tier-1 (3 PCs) + L1 (6 PCs) when
|
||
Tier-2 = 0× — they are downstream consequences of Tier-2 firing.
|
||
4. Trace marker `TRACE-PC-LR pc=… lr=… r3..r6,r31`.
|
||
|
||
### Result Table
|
||
| Tier | PC | Horizon | Hits | LR | Notes |
|
||
|------|-------------|---------|------|----|-------|
|
||
| T2-A | 0x82172524 | 22 min | **0** | — | Steady-state idle: 240k KeReleaseSemaphore / 75k texture-load / VdRetrainEDRAM loop |
|
||
| T2-B | 0x82175810 | 15 min | **0** | — | Steady-state idle (same kernel-call mix) |
|
||
| T2-C | 0x8217EB78 | 15 min | **0** | — | Steady-state idle (same kernel-call mix) |
|
||
|
||
Total ~52 min canary CPU. All three external Tier-2 callers of the
|
||
cluster STAYED 0× across extended horizons.
|
||
|
||
### Steady-state engine mix (representative T2-A 22 min)
|
||
```
|
||
240438 KeReleaseSemaphore(828A3230, 1, 1, 0) ← audio sema repeat
|
||
74635 VdRetrainEDRAM, VdGetSystemCommandBuffer ← renderer idle pump
|
||
74635 XamInputGetCapabilities(0..3) ← input poll
|
||
432 Removed; 396 Added; 381 NtStatusToDosError
|
||
```
|
||
Identical mix in T2-B, T2-C. Engine is alive at the kernel-call level
|
||
but does not advance through the front-end-UI / save-game state
|
||
machine across 3× the previously-tested wallclock.
|
||
|
||
### Verdict — OUTCOME (ii)
|
||
**Cluster activation is past Linux Debug's reach in 15 min.** Per task
|
||
brief Step 3 outcome (ii). Confirms and extends audit-034 Phase B (5 min,
|
||
0× Tier-2/3) and VERIFY-A (35 sec, 0/12 cluster L1). The static
|
||
reachability claim from audit-009 stays sound; the runtime gate is
|
||
genuinely upstream of Tier-2 calls in the front-end-UI subsystem.
|
||
|
||
### Strategic implication
|
||
RECONCILE-B's host-presenter caveat dominates: Vulkan/XCB on Linux fails
|
||
to display intro video; user confirmed Weston also shows black; the
|
||
front-end-UI state machine never advances past the post-intro
|
||
state-transition that Tier-2 callers gate on. Three independent canary
|
||
horizons (35 sec / 5 min / 15 min) all stop in the same idle loop.
|
||
|
||
**15-min Linux Debug canary cannot witness the cluster activation event
|
||
on this host.** Continued probing at higher horizons on Linux is unlikely
|
||
to yield. Two pivots open:
|
||
|
||
- **Pivot A — Lutris Windows canary instrumentation.** Re-port
|
||
`--log_lr_on_pc` to a Windows build and probe Tier-2 there. Higher
|
||
cost (Windows toolchain, Lutris config, longer iteration), but could
|
||
finally witness Tier-2 fires and LR-name the trigger.
|
||
- **Pivot B — Static-only.** Drop runtime probing on this side; lean on
|
||
M5.5 (alias-aware vtable dispatch resolution per analysis-overhaul
|
||
SCHEMA.md) to statically name the gate function in xenia-rs's IDA DB,
|
||
then probe THAT function in our impl + canary-Linux at 5-min horizon.
|
||
|
||
**Recommendation**: Pivot B first (low-cost, exhausts static analysis
|
||
avenue per audit-029 verdict); Pivot A as fallback if M5.5 doesn't reach
|
||
a probeable witness.
|
||
|
||
### Sister-session coordination (Track 1)
|
||
Track 1 (cache-fix record-layout verification) verdict on cascade
|
||
dimension A: **FAIL** — audit-038 cache fix did NOT flip record layout
|
||
to canary-shape. Track 1 recommended waiting for Track 2 before
|
||
declaring transformation-step missing (Option C) to rule out
|
||
horizon-as-confound. Track 2 now rules that out: 15-min horizon does
|
||
not move the needle. **Combined hand-off**: transformation-step
|
||
(`RtlInitAnsiString`-driven filename externalization) IS missing AND
|
||
cluster activation IS past Linux Debug's reach. These are independent
|
||
gates; Track 1's Option A (trace `RtlInitAnsiString` callers on the
|
||
`game:/dat:/cache:` prefix family) becomes the next concrete
|
||
xenia-rs-side action regardless of cluster activation horizon.
|
||
|
||
### Falsifications
|
||
- Audit-034 Phase B's "5 min may be too short" caveat is closed: 15 min
|
||
doesn't reach Tier-2 either.
|
||
- Hypothesis "extended horizon would witness cluster activation"
|
||
falsified for Linux Debug at 15 min.
|
||
|
||
### Trace
|
||
`audit-runs/audit-039-track-2-extended-canary/`:
|
||
- `canary-0x82172524.{log,err}` — 77 MB log, 0 fires, 22-min wall
|
||
- `canary-0x82175810.{log,err}` — 52 MB log, 0 fires, 15-min wall
|
||
- `canary-0x8217EB78.{log,err}` — 55 MB log, 0 fires, 15-min wall
|
||
|
||
### Cleanup
|
||
Canary patch reverted (`cd xenia-canary && git status` → clean,
|
||
HEAD `6de80dffe` unchanged). xenia-rs source unmodified, no commit,
|
||
no push. Sister Track 1's territory untouched.
|
||
|
||
### Discipline gate (5/5 PASS)
|
||
1. Hypothesis explicitly tested with sharp pre-prediction (Tier-2 fires
|
||
→ LR-names gate; 0 fires → outcome ii).
|
||
2. Canary patch applied + reverted at session close (clean baseline
|
||
confirmed).
|
||
3. xenia-rs source unmodified, no commit.
|
||
4. Single-step (verification only, no fix attempt).
|
||
5. Trace files saved per audit dir convention.
|
||
|
||
## KRNBUG-AUDIT-040 — record ctor input divergence at sub_8244FC90 (READ-ONLY, 2026-05-09)
|
||
|
||
### Goal
|
||
Per audit-037 sub_8244FC90 fires identically in canary + ours but produces
|
||
different record layouts. Identify the divergent INPUT (which arg register
|
||
holds different content). Trace the upstream caller that supplies it.
|
||
|
||
### Canary patch — 56 LOC, reverted
|
||
Re-applied audit-030 base + extended TrapLogLR to log r3..r10 + r28..r31 +
|
||
LR + 32-byte hex dump from `*r4` and `*r5`. Build via
|
||
`ninja -f build-Debug.ninja xenia_canary`. Reverted via
|
||
`git checkout -- src/` at session close; canary `git status` clean
|
||
(HEAD `6de80dffe` unchanged).
|
||
|
||
### Calling convention (sub_8244FC90)
|
||
- r3 = dest record (alloc'd by caller via `operator new`)
|
||
- **r4 = source struct ptr (28 bytes; memcpy'd to dest+0x3C via 7-dword loop)**
|
||
- r5 = secondary "this" (vtable in canary)
|
||
- r6/r7 = scalar args
|
||
|
||
### Concrete register values (representative fire 2 of 33 canary / 8 ours)
|
||
| reg | canary | ours |
|
||
|-----|--------|------|
|
||
| r3 | `BC65D440` | `405420C0` |
|
||
| **r4** | **`BC79C9EC`** | **`406819EC`** |
|
||
| r5 | `BC65D2C0` | `40542100` |
|
||
| LR | `82450440` | `82450440` (= `sub_824503A0+0xA0`) |
|
||
|
||
### Source-struct content at `*r4` (the load-bearing memcpy region)
|
||
| word | canary | ours | diff |
|
||
|------|--------|------|------|
|
||
| +0 | **`F80000DC`** | **`00001454`** | **HANDLE-NAMESPACE** |
|
||
| +4 | `0` | `0` | same |
|
||
| +8 | `0` | `2` | DIFFERENT |
|
||
| +12 | `3` | `3` | same |
|
||
| +16 | `0` | `0xC` | DIFFERENT |
|
||
| +20 | `0xC` | `0xC` | same |
|
||
| +24 | `0` | `0` | same |
|
||
|
||
### Upstream caller — divergent dword origin
|
||
|
||
Backtrace: sub_8244FC90 ← sub_824503A0 ← sub_824528A8 ← sub_822DFBC8 ←
|
||
sub_822DFC74 (the producer).
|
||
|
||
In **sub_822DFC74**:
|
||
```
|
||
0x822DFC8C-90 bl 0x824A9F18 ; r3=r4=r5=r6=0 — calls NtCreateEvent
|
||
0x822DFC94 r4 = r3 (event handle returned)
|
||
0x822DFC98-9C bl 0x821820B0 ; stw r4, 0(r1+80)
|
||
0x822DFCA0 lwz r11, 80(r1) ; r11 = handle
|
||
0x822DFCB8 stw r11, 44(r31) ; *** [this+44] = NtCreateEvent handle ***
|
||
0x822DFCC4 bl sub_822DFBC8 ; vtable[7] dispatcher reads [this+44]
|
||
```
|
||
|
||
`sub_824A9F18` is a wrapper around **`NtCreateEvent`** (xboxkrnl.exe ord
|
||
209, thunk `0x8284DF1C`). The OUT handle is what diverges:
|
||
- canary: `NtCreateEvent` → `0xF80000DC` (kernel-region pseudo-handle,
|
||
XObject namespace)
|
||
- ours: `NtCreateEvent` → `0x00001454` (small-int handle ID,
|
||
KernelState::handle_table namespace)
|
||
|
||
Both runtimes call NtCreateEvent 395× during boot; both succeed. The
|
||
divergence is purely **handle-value namespace cosmetics**.
|
||
|
||
### Bug-class refinement
|
||
|
||
**δ-namespace** (handle representation divergence; benign unless
|
||
downstream code interprets handle bits semantically). NOT a logic bug
|
||
in our code path — both impls correctly route the handle through
|
||
`WaitForSingleObject(handle, INFINITE)` at sub_822DFC34.
|
||
|
||
The audit-037 framing of "canary records hold pointer-bearing structs
|
||
while ours holds inline-string structs" is partially incorrect:
|
||
- The 28 bytes copied at sub_8244FC90 (record `+0x3C..+0x57`) ARE
|
||
different in handle slot, but only by namespace.
|
||
- The "filename text starting at +0" lives at a DIFFERENT region of the
|
||
dest record (+0x40+ in our `0x40542100` dump shows
|
||
`40541F80 40542000 745c4750 ... LE.pak\0eng\p`) — written by
|
||
`bl 0x822F8A70` / `bl 0x82150030` AFTER sub_8244FC90 returns.
|
||
|
||
### Recommended audit-041 (sharp prediction)
|
||
|
||
**Two parallel options:**
|
||
1. **DOWNSTREAM-USE PROBE (preferred)**: probe sub_822DFC34
|
||
(`bl 0x824AA330` waitsite) in BOTH runtimes. Capture r3 (handle being
|
||
waited on) and verify wait completes. If canary's wait completes but
|
||
ours doesn't, audit-041 is signaler-missing (trace which kernel call
|
||
signals canary's `0xF80000DC`). If canary's wait ALSO doesn't
|
||
complete, the namespace finding is benign and the gate is upstream
|
||
of the wait (RDX search-criteria producer).
|
||
2. **AUDIT-037 RE-VERIFICATION**: dump 128 bytes from canary's r3 and
|
||
ours's r3 AT THE EXIT of sub_8244FC90 (not at session-end). If the
|
||
filename text is written by sub_824503A0+0x478 callees
|
||
(sub_822F8A70 / sub_82150030), those are the real audit-041 targets.
|
||
|
||
### Trace artifacts
|
||
- `audit-runs/audit-040-record-ctor-inputs/canary-0x8244FC90.log` (33 fires)
|
||
- `audit-runs/audit-040-record-ctor-inputs/ours-lrtrace.jsonl` (8 fires)
|
||
- `audit-runs/audit-040-record-ctor-inputs/ours-dump.log` (10 dump-addr)
|
||
- `audit-runs/audit-040-record-ctor-inputs/canary-patch.diff` (notes)
|
||
|
||
### Cleanup
|
||
Canary patch reverted (clean baseline confirmed; HEAD `6de80dffe`
|
||
unchanged). xenia-rs source unmodified, no commit, master HEAD
|
||
`d8766c6` unchanged. Tests 645.
|
||
|
||
### Discipline gate (5/5 PASS)
|
||
1. Hypothesis explicitly tested (extracted divergent input arg + named
|
||
upstream producer NtCreateEvent).
|
||
2. Canary patch applied + reverted at session close.
|
||
3. xenia-rs source unmodified, no commit.
|
||
4. Single-step (data-gathering only, no fix attempt).
|
||
5. Trace files saved per audit dir convention.
|
||
|
||
## KRNBUG-AUDIT-041 — wait-site signaler determination (READ-ONLY, 2026-05-09)
|
||
|
||
Re-applied audit-030 `--log_lr_on_pc` canary patch (30 LOC, 4 files);
|
||
reverted at session close (canary HEAD `6de80dffe` clean).
|
||
|
||
**Wait site**: PC `0x822DFC34` `bl 0x824AA330` (KeWaitForSingleObject
|
||
wrapper, INFINITE timeout) inside sub_822DFBC8. Wait loops on r3=0x102
|
||
(STATUS_TIMEOUT) and on `[r31+52]==3`. Containing function is the
|
||
direct caller of audit-040's NtCreateEvent at sub_822DFC74; the handle
|
||
flowing into r3 is the OUT handle from that create.
|
||
|
||
**Wait completion ratio (30s canary trace; 500M-instr ours)**:
|
||
|
||
| Runtime | bl/pre-bl | post-bl | completes |
|
||
|---------|-----------|---------|-----------|
|
||
| canary | 9 | 9 | 100% |
|
||
| ours | 7 | 6 | **6/7 = 85%** |
|
||
|
||
The 7th wait in ours stalls. **Stalled handle = `0x00001454`**
|
||
(audit-040 family). Bl-PC 0x822DFC34 returns 0 fires in our HIR
|
||
(`bl` is a control-flow terminator, probe elided); pre-bl
|
||
`0x822DFC30 addi r4,r0,-1` fires 7× (fair comparison). The 7th
|
||
pre-bl fire (cycle 48,849) has no matching post-bl.
|
||
|
||
**Outcome (i) confirmed**: handle-namespace divergence is
|
||
**load-bearing**.
|
||
|
||
**Signaler identified**: probed canary KeSetEvent (0x8284DDDC, 20588
|
||
fires, 0 on F80000CC/C0 — takes KEVENT*, not handle) and NtSetEvent
|
||
(0x8284DF5C, 9245 fires, **2 on F80000CC/C0**). Both fires LR=0x824AA304
|
||
inside wrapper sub_824AA2F0 (89 static callers). **Signaler =
|
||
NtSetEvent** (xboxkrnl ord 246).
|
||
|
||
**Cross-check ours**: NtSetEvent at 0x8284DF5C fires 3334 times in ours;
|
||
**1 fire on `r3=0x00001454`** at cycle 3,519,453 (after the stall at
|
||
cycle 48,849). So signaler IS firing — bug is NOT pure
|
||
signaler-missing.
|
||
|
||
**Bug class refinement (provisional)**: δ-namespace AND δ-wakeup. The
|
||
signal exists but doesn't wake the waiter. Candidate causes:
|
||
|
||
- Handle table recycles slot 0x1454 between create-epochs in our impl
|
||
(so signal hits a *different* KEVENT than wait registered for).
|
||
- KeSetEvent / wait-queue machinery has a missed-wake (signal-before-
|
||
wait race ruled out: signal at 3.5M is AFTER wait at 48,849).
|
||
|
||
**Recommended audit-042** (autonomous, two-track):
|
||
|
||
1. Probe sub_824AA2F0 entry; capture LR + r31 per fire on r3=0x1454.
|
||
Names the actual signaler caller chain.
|
||
2. Dump handle table state for slot 0x1454 at cycles 48,849 (wait) and
|
||
3,519,453 (signal). If different KEVENT pointers → handle aliasing
|
||
bug in our `xenia_kernel::handle_table` (slab recycle between
|
||
NtCreate/NtClose). If same → bug in `KeSetEvent` / wait-queue.
|
||
|
||
Both fixes ≤60 LOC. xenia-rs source unmodified, no commit, master
|
||
HEAD `d8766c6` unchanged. Tests 645. Trace
|
||
`audit-runs/audit-041-wait-site/`.
|
||
|
||
### Discipline gate (5/5 PASS)
|
||
1. Hypothesis explicitly tested (wait-completion-ratio canary vs ours).
|
||
2. Canary patch applied + reverted at session close.
|
||
3. xenia-rs source unmodified, no commit.
|
||
4. Single-step (data-gathering only, no fix attempt).
|
||
5. Trace files saved per audit dir convention.
|
||
|
||
## KRNBUG-AUDIT-042 — handle 0x1454 lifecycle disambiguation (READ-ONLY, 2026-05-09)
|
||
|
||
Re-applied audit-030 `--log_lr_on_pc` canary patch (30 LOC, 4 files);
|
||
reverted at session close (canary HEAD `6de80dffe` clean). xenia-rs
|
||
master `d8766c6` unchanged. Tests 645.
|
||
|
||
**Goal**: disambiguate audit-041 root cause (A) handle-recycling vs
|
||
(B) wakeup-plumbing for handle 0x1454's missed wakeup.
|
||
|
||
**Method**: ours via `--trace-handles-focus=0x1454` (existing
|
||
audit.rs infrastructure); canary via `--log_lr_on_pc=0x8284DF1C`
|
||
(NtCreateEvent thunk, ord 209) + cross-reference to audit-041's
|
||
existing `canary-bl-0x822DFC34.log` containing canary's
|
||
`Added handle:/Removed handle:` lifecycle markers.
|
||
|
||
### Allocator architecture (decisive structural finding)
|
||
|
||
`KernelState::alloc_handle` (state.rs:588-593) is a **monotonic
|
||
atomic counter** initialized to `0x1000`, advanced via
|
||
`fetch_add(4)`. **Bump-only — no recycling, ever.** `nt_close`
|
||
(exports.rs:1869) decrements refcount and removes the object from
|
||
`state.objects`, but **NEVER returns the handle ID to the pool**.
|
||
|
||
This makes root cause (A) — handle-recycling — **structurally
|
||
impossible in ours**.
|
||
|
||
### Handle 0x1454 lifecycle in ours (`-n 500M`, two reruns identical)
|
||
|
||
```
|
||
created: cycle=0 tid=13 lr=0x824a9f6c src=NtCreateEvent kind=Event/Manual
|
||
stack: lr=0x822dfc94 (caller — audit-041's sub_822DFC74)
|
||
← 0x822e0344 ← 0x822d2ca4 ← 0x822de768 ← 0x821c4b1c
|
||
wait: cycle=0 tid=13 lr=0x824ac578 src=do_wait_single
|
||
signal: cycle=0 tid=5 lr=0x824aa304 src=NtSetEvent
|
||
wake: cycle=0 tid=5 src=wake_eligible_waiters/auto
|
||
final: waiters=0 signaled=true signal_attempts=1 waits=1 wakes=1
|
||
```
|
||
|
||
(`cycle=0` is a separate, pre-existing audit-instrumentation gap
|
||
— `KernelState::audit_entry` reads `scheduler.ctx(0).timebase`
|
||
which is 0 in this build. Counts/ordering still authoritative
|
||
because rings are append-only.)
|
||
|
||
**Single create, single wait, single signal, single wake — fully
|
||
consumed.** Handle 0x1454 is **NOT stuck** at end-of-run in this
|
||
audit. The end-of-run "Handle waiter lists" section names the
|
||
actually-stalled handles: `0x1004 0x1020 0x1544 0x1578 0x10a0
|
||
0x12ac 0x1040 ...` — all `<NO_SIGNALS_DESPITE_WAITS>`. **0x1454 is
|
||
not among them.**
|
||
|
||
### Handle 0xF80000CC family lifecycle in canary
|
||
|
||
From audit-041's `canary-bl-0x822DFC34.log` (debug-helper output
|
||
around `ObjectTable::Add/Release`):
|
||
|
||
```
|
||
Added handle:F80000CC for XObject (ctor — fresh KEVENT slot)
|
||
NtDuplicateObject(F80000CC, ...) × 3 (handle-table dup)
|
||
TRACE-PC-LR pc=822DFC34 r3=F80000CC (wait fires on live KEVENT)
|
||
NtClose(F80000CC) (after wait completes)
|
||
Removed handle:F80000CC for XEvent (slot freed)
|
||
Added handle:F80000CC for XEvent (NEW KEVENT, SAME SLOT VALUE)
|
||
NtClose(F80000CC) → Removed → Added × 4 more iterations
|
||
```
|
||
|
||
**Canary RECYCLES handle slots heavily**: `F8000098` reused 130×,
|
||
`F80000D0` 95×, `F80000DC` 71×, `F80000C0` 10×, `F80000CC` 5× in a
|
||
single 30s window. Canary's `ObjectTable::AllocateHandle` (per
|
||
`xobject.cc/object_table.cc`) is a slab/free-list allocator; ours
|
||
is bump-only.
|
||
|
||
### Decisive disambiguation
|
||
|
||
| | ours | canary |
|
||
|---|---|---|
|
||
| handle 0x1454 NtCreateEvent fires | **1** | N/A (different namespace) |
|
||
| handle 0xF80000CC `Added handle:` | N/A | **5+** within 30s |
|
||
| recycling? | **NO** (structurally impossible) | **YES** (slab) |
|
||
| audit-041 stall handle 0x1454 | wait+signal+wake recorded in `--trace-handles-focus` rerun | — |
|
||
|
||
**Verdict: ROOT CAUSE IS NOT (A) HANDLE-RECYCLING.**
|
||
|
||
Sub-conclusion on audit-041's premise: under
|
||
`--trace-handles-focus=0x1454 -n 500M`, handle 0x1454's wait+signal
|
||
DO complete (1 wake recorded). audit-041's "wait NEVER returns"
|
||
inference came from `--lr-trace`-only data (post-bl missing for the
|
||
7th iteration); but `--quiet` suppressed the end-of-run audit dump
|
||
in audit-041, so the wait-completion was never directly verified.
|
||
The lr-trace miss can be explained by: lr-trace records the
|
||
**guest-side resume PC after `bl`**; if KeWaitForSingleObject's
|
||
return path bypasses that PC (e.g., direct context-restore on
|
||
wake), the post-bl trace doesn't fire even though the wait
|
||
completes. **audit-041's load-bearing premise is provisionally
|
||
falsified for handle 0x1454 specifically.**
|
||
|
||
### Real wedge points
|
||
|
||
The actual stalled handles per this run's end-of-run dump:
|
||
- `0x1004` Event/Manual (tid=11 parked, 0 signals)
|
||
- `0x1020` Event/Manual (tid=3 parked, 0 signals)
|
||
- `0x1040` Event/Auto (tid=5 parked via WaitMultiple, 0 signals)
|
||
- `0x1544` Event/Manual (tid=17 parked, 0 signals)
|
||
- `0x1578` Event/Auto (tid=19 parked, 0 signals)
|
||
- `0x12ac` Semaphore (tid=14, 15 parked, 0 signals)
|
||
- `0x10a0` Event/Auto (tid=6, 0 signals) + paired `0x10a4` Semaphore
|
||
|
||
All carry `<NO_SIGNALS_DESPITE_WAITS>`. These are γ-class
|
||
missing-signaler candidates — distinct from 0x1454.
|
||
|
||
### Bug-class refinement
|
||
|
||
**δ-wakeup ruled out** for 0x1454 (wake DID fire). **δ-namespace
|
||
ruled out** (single create, no aliasing). **The wedge is on a
|
||
different handle set** — needs re-pivot.
|
||
|
||
### Sharp 4-dim cascade prediction (for any audit-043 fix targeting the *real* stalled handles, e.g. `0x1004` or `0x10a0`)
|
||
|
||
- **A**: handle 0x1004's `signal_attempts` goes 0 → ≥1 (signaler
|
||
named; KE/Nt SetEvent or KeReleaseSemaphore reaches it).
|
||
- **B**: tid=11 transitions out of `Blocked(WaitAny [4100])` to
|
||
Ready/Exited; thread-list shrinks by ≥1 stalled thread.
|
||
- **C**: dependent waiters (any handle whose creator/signaler is
|
||
gated by tid=11) start firing — measurable as `<NO_SIGNALS>`
|
||
count drops by ≥2 across the trail set.
|
||
- **D**: `swaps` advances past 2 OR `draws` flips from 0 to >0.
|
||
*Probability*: lower (γ-cluster activation is the audit-009
|
||
plateau; multiple gates must fall, only one is being addressed).
|
||
|
||
### Recommended audit-043
|
||
|
||
**Pivot**: re-target audit on the **actually-stalled** handles per
|
||
this session's end-of-run dump. Ranked by likely impact:
|
||
|
||
1. **`0x10a0` Event/Auto + `0x10a4` Semaphore on tid=6** —
|
||
Event+Semaphore pair is canonical "worker waits for job;
|
||
producer hasn't run." Trace tid=6's entry PC and producer chain.
|
||
2. **`0x12ac` Semaphore (2 waiters: tid=14, tid=15)** — semaphore
|
||
never released; `KeReleaseSemaphore` source is the target.
|
||
3. **`0x1004` Event/Manual on tid=11** — earliest-created stalled
|
||
handle. Its non-signaling caller chain is the bottom-up gate.
|
||
|
||
For each: run with `--trace-handles-focus=<handle>` to capture the
|
||
created stack and identify the producer-side function. Canary
|
||
cross-trace via `--log_lr_on_pc=0x8284DF5C` (NtSetEvent) or
|
||
`0x8284DDDC` (KeSetEvent) filtering for the equivalent canary
|
||
handle at that PC + LR signature. Patch budget unchanged (≤60 LOC).
|
||
|
||
**Bug class for audit-043**: **γ (missing signaler)** — primary
|
||
candidate. **NOT δ-namespace, NOT δ-wakeup.** The handle-namespace
|
||
divergence (audit-040) appears to be benign per this audit's
|
||
finding that 0x1454 actually completes. The real stalled handles
|
||
are γ-class (signaler-missing on a *different* event/semaphore).
|
||
|
||
### Discipline gate (5/5 PASS)
|
||
1. Hypothesis explicitly tested (recycling vs plumbing for 0x1454).
|
||
2. Canary patch applied (30 LOC) + reverted at session close.
|
||
3. xenia-rs source unmodified, no commit.
|
||
4. Single-step (data-gathering only, no fix attempt).
|
||
5. Trace files saved: `audit-runs/audit-042-handle-lifecycle/
|
||
{probe.log, probe-run2.log, canary-create-0x8284DF1C.log}`
|
||
(~11.5 MB) + cross-ref of audit-041's existing
|
||
`canary-bl-0x822DFC34.log`.
|
||
|
||
### Status
|
||
- Tests: 645 (unchanged).
|
||
- Lockstep: instructions=100000004 unchanged (no source mods).
|
||
- Master HEAD: `d8766c6` (unchanged).
|
||
- Canary HEAD: `6de80dffe` (clean, post-revert).
|
||
|
||
---
|
||
|
||
## KRNBUG-AUDIT-043 — record +0x00 writer, allocator-VA divergence (READ-ONLY, 2026-05-09)
|
||
|
||
**Status**: READ-ONLY single-step. Master `d8766c6` unchanged, canary patch reverted. Tests 645 unchanged.
|
||
|
||
### Goal
|
||
|
||
Identify the writer of `+0x00` at records `0x40542300/0x40542340/0x40542400/0x405424c0` in our impl. Audit-039 reported ours has `0x67616D65` ("game" inline) while canary has `0xF80000B8` (kernel handle) — claimed to be the most fundamental layout divergence.
|
||
|
||
### Method
|
||
|
||
Mem-watch on `+0x00,+0x04` of all 4 records (`-n 500_000_000`). Group writers by (PC, LR). Look up containing functions in `sylpheed.db`. Disasm + caller chain. Apply audit-030 LR-trace patch to canary; probe writer PC `0x825F1080` (memcpy) and pool-init PC `0x82152728`.
|
||
|
||
### Findings
|
||
|
||
**The writer of `0x67616D65` is `memcpy` at `pc=0x825F1080`, called from `memcpy_s` (`sub_825ED588`, return = `lr=0x825ED608`)**, invoked from `std::basic_string::reserve_then_assign` (`sub_8216E138+0xC8`). 16 fires across 4 records.
|
||
|
||
**The records are NOT layout records** — they are 64-byte slots in a Sylpheed-managed pool allocator:
|
||
- `sub_821505D8` (called from `sub_8280C42C`) allocates ~58 MB via `sub_824A8858` (size `0x03A723D0`, type `0x20000004`).
|
||
- `sub_82152570` builds 4 free-list buckets; `sub_82152728` chains 64-byte slots over a 1.25 MB span.
|
||
- Slot-size table at `sub_821505D8+0x10`: 4, 16, 32, 64, 96, 128, 160, 192, 256.
|
||
- The "filenames" land in 64-byte slots when a Sylpheed `std::string` is heap-promoted from SSO (capacity ≥ 0x10).
|
||
|
||
### Canary cross-trace
|
||
|
||
Probed `pc=0x825F1080` in canary (audit-030 `--log_lr_on_pc` patch reapplied):
|
||
- 94,945 fires in 25s. **Zero hits to `0x40542xxx`**. Destinations distribute over `0x705Dxxxx` (76674), `0x7033xxxx` (6642), `0xBC36xxxx` (1211), etc.
|
||
- Top LR `0x824AB1D4` (84,400×, an alloc-related path absent in our trace).
|
||
- Canary's matching `LR=0x825ED608` (memcpy_s caller) fires 1,782× — **none target `0x40542xxx`**.
|
||
|
||
Pool-init `pc=0x82152728` in canary fires once with `r3=0xBC32C880` — **canary's 58 MB pool BASE = `0xBC32C880`**; ours' is `~0x40541xxx`.
|
||
|
||
### Bug-class refinement
|
||
|
||
**Audit-039's "0xF80000B8 vs 'game'" is a VA-equality fallacy.** The same guest VA `0x40542300` backs *different live data* in the two emulators because their host-side allocators return different VAs for the same `sub_824A8858` call. Ours: 64-byte std::string heap buffer. Canary: a kernel-handle / NotifyListener slot at *its* unrelated VA.
|
||
|
||
**Class = ε (host-allocator address-space divergence)**, not a guest-write bug. There is no missing/wrong write at `+0x00` in our impl.
|
||
|
||
**Reading-error ledger update**: 12th entry — *VA-equality fallacy across emulators*. Comparing memory contents at identical guest VAs assumes both allocators return the same VA for the same logical allocation; Sylpheed's pool factory makes this assumption false in general. Future audits comparing two emulators' guest memory must compare *logical allocations* (resolved through the producing allocator), not raw VAs.
|
||
|
||
### Recommended audit-044
|
||
|
||
Drop the "record at 0x40542300+" line of investigation entirely.
|
||
|
||
Re-pivot to audit-042's actually-stalled-handle plan:
|
||
1. `0x10A0` Event/Auto + `0x10A4` Semaphore on tid=6 — producer chain.
|
||
2. `0x12AC` Semaphore (tid=14, tid=15 waiters) — `KeReleaseSemaphore` source.
|
||
3. `0x1004` Event/Manual on tid=11 — earliest-created stalled handle.
|
||
|
||
Each: `--trace-handles-focus=<H>` for create/wait stack; canary cross-trace via `--log_lr_on_pc=0x8284DF5C` (NtSetEvent) or `0x8284DDDC` (KeSetEvent) on equivalent handle.
|
||
|
||
**Bug class for audit-044**: **γ (missing signaler)** — same target as audit-042's recommendation; audit-043 did not move the cluster, but eliminated a false-positive line of investigation.
|
||
|
||
### Discipline gate (5/5 PASS)
|
||
|
||
1. Hypothesis explicitly tested (writer-of-+0x00 isolated; canary equivalence checked).
|
||
2. Canary patch applied (30 LOC audit-030 base) + reverted at session close (`git status` clean, config TOML restored to `log_lr_on_pc = 0`).
|
||
3. xenia-rs source unmodified, no commit.
|
||
4. Single-step (data-gathering only).
|
||
5. Trace files saved: `audit-runs/audit-043-record-zero-offset/{mem-watch.log, mem-watch.stdout, canary-825f1080-traces.txt.gz (95k LR records), audit-043-canary-poolinit.log}`.
|
||
|
||
### Status
|
||
|
||
- Tests: 645 (unchanged).
|
||
- Lockstep: instructions=100000004 (unchanged).
|
||
- Master HEAD: `d8766c6` (unchanged).
|
||
- Canary HEAD: `6de80dffe` post-revert clean.
|