PPCBUG-640: For BO=16 (bdnz: decrement CTR, branch if non-zero, ignore CR)
and BO=18 (bdz: same with branch-if-zero), `fmt_bc` fell through to the
`if decr` block and computed `cond_name_opt` from the don't-care BI=0 /
cond_true=false pair, yielding `Some("ge")`. The output was therefore
`bdnzge` / `bdzge` — a CTR-only branch with a spurious CR-derived suffix.
PPCBUG-650 (companion): the golden fixture pinned the wrong output, so
the regression had no detection signal until now.
`fmt_bclr` already had the correct `if decr && uncond` guard at line 872
producing `bdnzlr` / `bdzlr`. `fmt_bc` lacked the equivalent.
Fix: gate the condition string on `!uncond` inside the `if decr` block.
For BO=16/18 (uncond bit set), the condition suffix is now empty.
Tests: extended_mnemonics.json fixture rows for bdnz/bdz now expect the
correct `ext_mnemonic: "bdnz"` / `"bdz"`.
Impact: every analysis-DB query for `bdnz` loops (common in pixel-shader
and vertex processing) was returning zero rows; matches stored as `bdnzge`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PPCBUG-053: bcx and bclrx tested `ctx.ctr != 0` against the full 64-bit
register, but the Xbox 360 ABI runs CTR as a 32-bit counter (canary
explicitly truncates: `f.Truncate(ctr, INT32_TYPE)`). When upstream 64-bit
GPR pollution flowed through `mtspr CTR, rN`, the upper 32 bits stayed
non-zero forever; bdnz then looped past the intended 32-bit zero point
because the 64-bit comparison still saw the high bits.
PPCBUG-054: `mtspr CTR` writeback wrote the full 64-bit GPR value,
acting as a firewall gap that fed PPCBUG-053. Defensive truncation
prevents CTR from ever acquiring non-zero upper 32 bits independently
of the GPR-pollution source.
Fixes:
- interpreter.rs:849, 879: ctr_ok now uses `(ctx.ctr as u32) != 0`
- interpreter.rs:1523: mtspr CTR writes `val as u32 as u64`
Tests:
- bcx_bdnz_uses_32bit_ctr_compare: bdnz with CTR=0x0000_0001_0000_0001
decrements to 0x0000_0001_0000_0000 and exits (low 32 bits = 0).
- bclrx_uses_32bit_ctr_compare: same coverage for bdnzlr.
- mtspr_ctr_truncates_to_32_bits: gpr=0xFFFF_FFFF_8000_0001 → ctr=0x8000_0001.
Coupled fix per the audit: PPCBUG-053 and PPCBUG-054 land together because
either alone is necessary-but-not-sufficient — the truncation prevents new
pollution, the 32-bit compare protects against any pollution that slipped
in via routes other than mtspr (e.g. mfctr-mtctr roundtrips).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PPCBUG-424: vmaddfp128 computed VA×VB+VD instead of ISA-mandated VA×VD+VB.
PPCBUG-425: vmaddcfp128 computed VD×VB+VA instead of ISA-mandated VA×VD+VB.
Root-cause discovered while writing the operand-order regression tests:
va128() was extracting PPC bits 6-10 (the same field as vd128's low 5 bits),
not PPC bits 11-15 where VA lives in VX128 form. This meant va128() silently
aliased vd128 for any instruction where VA != VD, making the operand swap
invisible in the existing denorm-flush test (which used VA == VD == v2).
Fixes in this commit:
- decoder.rs: va128() now extracts PPC bits 11-15 (host bits 20-16) + bit29.
The vmx128_va128_uses_bit29 test encoding updated to match the correct field.
- interpreter.rs: vmaddfp128 changed from ai.mul_add(bi,di) to ai.mul_add(di,bi)
(VA×VD+VB). vmaddcfp128 changed from di.mul_add(bi,ai) to ai.mul_add(di,bi).
vmaddfp128_flushes_denormal_inputs redesigned with distinct VA/VD/VB registers
(v1/v2/v3) so the flush test is independent of the accessor fix.
New vmaddfp128_operand_order_va_times_vd_plus_vb and
vmaddcfp128_operand_order_va_times_vd_plus_vb tests verify 2×3+10=16.
- disasm_goldens.rs + vmx128_registers.json: vmaddfp128/vmaddcfp128/vnmsubfp128
golden raws updated to properly encode VA at PPC bits 11-15 (new raws:
0x146328D4 / 0x14632914 / 0x14632954). vperm128 / vsrw128 golden operands
updated to reflect correct VA extraction (v4 instead of v3/v0).
Affects all VMX128 binary ops that call va128(): vaddfp128, vsubfp128,
vmulfp128, vmaddfp128, vmaddcfp128, vnmsubfp128, vperm128, vsrw128 etc.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
stvewx128 was aligning EA to 16 bytes and writing all 16 bytes of the
vector, corrupting 12 adjacent bytes on every call. ISA semantics:
word-align EA, extract word lane (EA & 0xF) >> 2, write 4 bytes only.
The non-128 stvewx was already correct; stvewx128 was never updated.
Mirror the stvewx body with instr.vs128() substituted for instr.rs().
The invalidate_for_write call from P1 now covers the correct word-aligned
EA rather than the over-wide 16-byte range.
interpreter.rs: stvewx128 arm (~line 2984)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vpkd3d128 was storing the pack codec output directly into vd128 without
applying the MakePermuteMask permutation that merges the packed scalar(s)
into the previous register value according to pack (slot layout) and shift
(destination lane offset).
PPCBUG-363: vpkd3d128 was missing the post-pack lane-placement step.
PPCBUG-369: vpkd3d128 pack field not extracted; pack=0 still worked
(identity), but pack=1/2/3 always wrote raw out instead of blending.
Fix: extract `pack = uimm & 3` and `shift = instr.vx128_4_z()` from the
VX128_4 IMM and z fields. For pack==0 (identity) store out directly as
before. For pack 1-3, read the existing vd128 value and select 4 u32
words from {prev, out} using the 3×4 static permutation tables from
canary ppc_emit_altivec.cc:2126-2188.
Tables derived from canary MakePermuteMask(r0,l0,…r3,l3):
pack=1 (VPACK_32): out[3] placed at lane (3-shift), prev elsewhere
pack=2 (64-bit): out[2..3] placed at lanes (2-shift)..(3-shift)
pack=3 (64-bit): same as pack=2 except shift=3 → out[2] at lane 3
Tests: vpkd3d128_pack0_legacy_unchanged, vpkd3d128_pack1_shift0_d3d_vertex_pack,
vpkd3d128_pack1_shift3_puts_out3_at_lane0
interpreter.rs: vpkd3d128 arm (~line 3999)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PPCBUG-565: Add vx128_5_sh() to decoder.rs — 4-bit shift at PPC bits
22-25 (host bits 6-9). The correct MSB is at PPC bit 22 (host bit 9).
PPCBUG-361: vsldoi128 was reading the SH MSB from host bit 4 (PPC bit
27, reserved) instead of host bit 9 (PPC bit 22). All shift amounts >= 8
decoded incorrectly (e.g. shift=8 executed as shift=0). Replace the
inline bit-shuffle with instr.vx128_5_sh().
Also fix vx128_p_perm_assembles_correctly test: replace nonexistent
DecodedInstr::from_raw() calls with struct literal construction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PPCBUG-563: Add vx128_4_imm() (PPC bits 11-15) and vx128_4_z() (PPC bits
24-25) accessors to decoder.rs for VX128_4-form instructions.
PPCBUG-315: vrlimi128 was reading z from host bits 16-17 (a subset of IMM)
and mask from host bits 2-5 (a reserved/XO region). Replace with the
correct accessors: z selects which word-lane to start the rotation from
(0-3); IMM is the 5-bit per-lane blend mask.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PPCBUG-562: Add vc_rc_bit() (PPC bit 21) and vx128r_rc_bit() (PPC bit 27)
to decoder.rs. The generic rc_bit() reads bit 0 (PPC bit 31); all vcmp XO
values are even so bit 0 is always 0, making CR6 permanently dead.
PPCBUG-275/276/420/421: Replace rc_bit() with vc_rc_bit() at all 8 pure
VC-form vcmp arms (vcmpequb, vcmpequh, vcmpgtub, vcmpgtsb, vcmpgtuh,
vcmpgtsh, vcmpgtuw, vcmpgtsw) and with the correct per-form accessor at
the 4 combined arms (vcmpeqfp|128, vcmpgefp|128, vcmpgtfp|128,
vcmpequw|128) and vcmpbfp|128.
PPCBUG-422: VX128_R-form 128-variants in combined arms now use
vx128r_rc_bit() instead of vc_rc_bit().
PPCBUG-423/600: Add 5 dot-form key entries to decode_op6 so
vcmp*fp128./vcmpequw128. decode as the correct opcode instead of Invalid.
Uses a 5-bit key (bits22-24 + bit25 + bit27) for dot-forms to avoid
aliasing against the shift/merge group (which sets bit25=1 when bit27=1).
Interpreter uses vx128r_rc_bit() to conditionally update CR6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PPCBUG-561: Add DecodedInstr::mb_md() to decoder.rs — the correct MD-form
6-bit mask-begin reconstruction (MB[4:0] at PPC bits 21-25, MB[5] at PPC
bit 26). The disassembler already had the correct local formula; this
promotes it to a single source of truth on DecodedInstr.
PPCBUG-046: All 6 doubleword-rotate arms (rldicl, rldicr, rldic, rldimi,
rldcl, rldcr) inlined "(instr.mb() << 1) | ((instr.raw >> 1) & 1)" which
reads SH5 (host bit 1) instead of MB5 (host bit 5). For the canonical
"clrldi r3, r4, 32" zero-extend idiom (mb=32 → MB5=1, MB[4:0]=0), the
wrong formula produced mb=0, making the instruction a no-op and leaving
upper 32 bits of the GPR polluted. Replace all 6 sites with instr.mb_md().
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PPCBUG-040: decoder.rs sh64() assembled the XS-form shift amount as
(SH[4:0] << 1) | SH[5] instead of (SH[5] << 5) | SH[4:0]. Every
`sradi` with shift N ∈ 1..=62 executed with a completely wrong shift
count (e.g. shift=32 executed as shift=1).
PPCBUG-560: disasm_goldens.rs rldicl() test helper was encoding sh[5:1]
at PPC bits 16-20 and sh[0] at PPC bit 30 — exactly backwards. The wrong
encoder and wrong decoder cancelled out, hiding PPCBUG-040 from tests.
Fix both together so tests validate ISA-correct encodings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PPCBUG-160 partial: stswi's single invalidate_for_write(ea) only covered
the first cache line; with nb up to 32, the write span can cross a 128-byte
line boundary. Replace with two-call guard:
first_line = ea & !RESERVATION_MASK
last_line = ea.wrapping_add(nb - 1) & !RESERVATION_MASK
invalidate first; if last != first, invalidate last.
PPCBUG-160 partial: stswx had the same single-call gap; nb from XER[0:6]
can be up to 127 bytes. Same two-call guard applied; wrapped in `if nb > 0`
to guard against nb==0 underflow (XER TBC field is 0 when no bytes to store).
dcbz: zeroes 32 bytes at a 32-byte-aligned EA — touches exactly one 128-byte
cache line; add canonical single-call invalidate guard (was entirely missing).
dcbz128: zeroes 128 bytes at a 128-byte-aligned EA — one full reservation
line; add canonical single-call invalidate guard (was entirely missing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds doc comments above lwarx/ldarx/stwcx./stdcx. clarifying that the
legacy per-ctx reservation path is only correct in strict lockstep
(single host thread); under --parallel the M3 scheduler must enable
the cross-thread ReservationTable before spawning a second host thread.
A debug_assert fires in the legacy stwcx./stdcx. branch if a
non-primary HW slot (hw_id != 0) takes that path — surfacing
ReservationTable-disabled misconfiguration early in debug builds.
Note: the primary slot (hw_id==0) racing other parallel slots is
not caught by the assert; that case requires the table to be enabled.
Affected:
PPCBUG-108 legacy per-ctx reservation path cannot invalidate
cross-thread; informational — no behavioral change
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Track lwarx vs ldarx reservation width in PpcContext as a u8 (4 = word,
8 = doubleword, 0 = none). stwcx. requires width==4; stdcx. requires
width==8. Cross-width pairs (lwarx + stdcx., ldarx + stwcx.) now fail
deterministically with CR0.EQ=0 instead of spuriously succeeding.
The width is held per-thread; the cross-thread reservation table keeps
its existing slot encoding because each host thread consults its own
ctx.reservation_width before committing.
Affected:
PPCBUG-151 stwcx./stdcx. shared the same reservation slot without
width discriminator; cross-width commits silently succeeded
Tests: lwarx_then_stdcx_cross_width_fails,
ldarx_then_stwcx_cross_width_fails
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continuation of the PPCBUG-107 cascade sweep. All 16 VMX store opcodes
(stvx/stvxl, stvebx/stvehx/stvewx, stvlx/stvrx and 128 variants of each)
now invalidate the reservation table before writing.
stvlx/stvrx partial-vector stores can write at non-16-byte-aligned EAs;
they invalidate both potentially-touched cache lines.
stvewx128 currently writes 16 bytes at the wrong EA scope (PPCBUG-510);
the invalidate guard fires at that over-wide EA today and will narrow
automatically when PPCBUG-510 is fixed in P3.
Affected:
PPCBUG-511 stvx, stvx128, stvxl, stvxl128
PPCBUG-512 stvebx, stvehx, stvewx, stvewx128
PPCBUG-513 stvlx, stvlx128, stvlxl, stvlxl128
PPCBUG-514 stvrx, stvrx128, stvrxl, stvrxl128
Tests: lwarx_then_plain_stvx_invalidates_reservation,
lwarx_then_plain_stvlx_invalidates_reservation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continuation of the PPCBUG-107 cascade sweep. stmw/stswi/stswx (multiple
and string stores) and the 9 floating-point stores now invalidate the
reservation table before writing.
stmw can span two cache lines when the writeback range crosses a line
boundary; the guard iterates over all touched lines so multi-line atomic
holds the same guarantee as single-line stores.
Affected:
PPCBUG-160 3 multiple/string stores: stmw, stswi, stswx
PPCBUG-167 9 FP stores: stfs, stfsu, stfsx, stfsux,
stfd, stfdu, stfdx, stfdux, stfiwx
Tests: lwarx_then_plain_stmw_spans_two_lines_and_invalidates,
lwarx_then_plain_stfd_invalidates_reservation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continuation of the PPCBUG-107 cascade sweep (batch 1: word stores landed
in 4538fa9). Plain stb/stbu/stbx/stbux, sth/sthu/sthx/sthux/sthbrx, and
std/stdu/stdx/stdux/stdbrx now invalidate the reservation table before
writing, so cross-thread lwarx/stwcx. atomicity holds when these widths
are written by another host thread.
Affected:
PPCBUG-130 9 byte/halfword stores missing invalidate_for_write
stb, stbu, stbx, stbux, sth, sthu, sthx, sthux, sthbrx
PPCBUG-150 5 doubleword stores missing invalidate_for_write
std, stdu, stdx, stdux, stdbrx
Tests: lwarx_then_plain_stb_invalidates_reservation,
lwarx_then_plain_std_invalidates_reservation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Word stores (stw, stwu, stwx, stwux, stwbrx) now invalidate the
reservation table for the target line before writing. Without this,
plain stores by other host threads silently fail to clear reservations
held by lwarx, causing stwcx. to spuriously succeed under --parallel.
Affected:
PPCBUG-107 ReservationTable::invalidate_for_write never called from any store
PPCBUG-140 stw missing invalidate_for_write (interpreter.rs:1183)
PPCBUG-141 stwu missing invalidate_for_write (interpreter.rs:1189)
PPCBUG-142 stwx missing invalidate_for_write (interpreter.rs:1195)
PPCBUG-143 stwux missing invalidate_for_write (interpreter.rs:1201)
PPCBUG-144 stwbrx missing invalidate_for_write (interpreter.rs:1568)
Tests: lwarx_then_plain_stw_invalidates_reservation,
lwarx_then_stwcx_succeeds_without_intervening_store
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Split the monolithic interpreter into cohesive modules: dedicated
decoder (decoder.rs) producing 8-byte DecodedInstr; opcode tables
(opcode.rs); explicit traps (trap.rs); FPSCR helpers (fpscr.rs);
overflow/carry helpers (overflow.rs); a 4 KiB-page-versioned decode
cache and basic-block cache (block_cache.rs); and a full VMX/VMX128
implementation (vmx.rs) covering AltiVec + Xenon's 128-bit extensions.
Add the parallel-execution substrate behind --parallel: a 7-party
phaser (phaser.rs) for round-based barrier sync, ReservationTable
(reservation.rs) for guest LL/SC, and the per-HW-thread scheduler
core (scheduler.rs) that owns ThreadRefs, runqueues, and pending IRQs.
Disassembler is now the single source of truth: disasm.rs gains the
full base + extended + VMX128 mnemonic set, with golden JSON fixtures
and a disasm_goldens test suite. Add a criterion-style interpreter
bench. context.rs grows the per-thread state the new modules need
(reservation slot, FPSCR, vector regs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rust reimplementation of the xenia Xbox 360 emulator targeting reverse-
engineering and preservation, initially scoped to Project Sylpheed.
Includes:
- XEX2 loader (LZX decompression, AES decryption, PE parsing)
- XISO / XGD2 disc image VFS
- PPC interpreter with 200+ opcodes and VMX128 decoding
- Static analyzer: functions, cross-references, labels, asm + SQLite output
- HLE kernel covering the xboxkrnl/xam subset used by Sylpheed init
- Debugger with in-memory and SQLite-backed execution tracing
- `xenia-rs` CLI with extract/dis/exec commands that produce cumulative,
superset SQLite databases and opt-in instruction/import/branch traces
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>