xenia-rs

fabi/xenia-rs

Fork 0

Commit Graph

Author	SHA1	Message	Date
MechaCat02	6fe2cbf251	fix(cpu): PPCBUG-426/427/433 single-FMA vnmsubfp + vctsxs NaN saturation Phase 5 batch 6 (5f): saturation and FMA-rounding fixes. - PPCBUG-426 vnmsubfp: was `bi - ai * ci` (two rounding steps); now `-ai.mul_add(ci, -bi)` which is mathematically equivalent (= bi - ai*ci) but uses a single FMA round per ISA. - PPCBUG-427 vnmsubfp128: same single-FMA fix. - PPCBUG-433 vctsxs / vcfpsxws128 NaN saturation: AltiVec ISA saturates NaN to INT_MIN (0x80000000); xenia returned 0. The vctuxs (unsigned) NaN→0 is correct per ISA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 12:31:10 +02:00
MechaCat02	c36cca14f9	xenia-cpu: VMX128, FPSCR, decoder split, scheduler, decode/block caches Split the monolithic interpreter into cohesive modules: dedicated decoder (decoder.rs) producing 8-byte DecodedInstr; opcode tables (opcode.rs); explicit traps (trap.rs); FPSCR helpers (fpscr.rs); overflow/carry helpers (overflow.rs); a 4 KiB-page-versioned decode cache and basic-block cache (block_cache.rs); and a full VMX/VMX128 implementation (vmx.rs) covering AltiVec + Xenon's 128-bit extensions. Add the parallel-execution substrate behind --parallel: a 7-party phaser (phaser.rs) for round-based barrier sync, ReservationTable (reservation.rs) for guest LL/SC, and the per-HW-thread scheduler core (scheduler.rs) that owns ThreadRefs, runqueues, and pending IRQs. Disassembler is now the single source of truth: disasm.rs gains the full base + extended + VMX128 mnemonic set, with golden JSON fixtures and a disasm_goldens test suite. Add a criterion-style interpreter bench. context.rs grows the per-thread state the new modules need (reservation slot, FPSCR, vector regs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:27:43 +02:00

Author

SHA1

Message

Date

MechaCat02

6fe2cbf251

fix(cpu): PPCBUG-426/427/433 single-FMA vnmsubfp + vctsxs NaN saturation

Phase 5 batch 6 (5f): saturation and FMA-rounding fixes.

- PPCBUG-426 vnmsubfp: was `bi - ai * ci` (two rounding steps); now
  `-ai.mul_add(ci, -bi)` which is mathematically equivalent (= bi - ai*ci)
  but uses a single FMA round per ISA.
- PPCBUG-427 vnmsubfp128: same single-FMA fix.
- PPCBUG-433 vctsxs / vcfpsxws128 NaN saturation: AltiVec ISA saturates
  NaN to INT_MIN (0x80000000); xenia returned 0. The vctuxs (unsigned)
  NaN→0 is correct per ISA.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-02 12:31:10 +02:00

MechaCat02

c36cca14f9

xenia-cpu: VMX128, FPSCR, decoder split, scheduler, decode/block caches

Split the monolithic interpreter into cohesive modules: dedicated
decoder (decoder.rs) producing 8-byte DecodedInstr; opcode tables
(opcode.rs); explicit traps (trap.rs); FPSCR helpers (fpscr.rs);
overflow/carry helpers (overflow.rs); a 4 KiB-page-versioned decode
cache and basic-block cache (block_cache.rs); and a full VMX/VMX128
implementation (vmx.rs) covering AltiVec + Xenon's 128-bit extensions.

Add the parallel-execution substrate behind --parallel: a 7-party
phaser (phaser.rs) for round-based barrier sync, ReservationTable
(reservation.rs) for guest LL/SC, and the per-HW-thread scheduler
core (scheduler.rs) that owns ThreadRefs, runqueues, and pending IRQs.

Disassembler is now the single source of truth: disasm.rs gains the
full base + extended + VMX128 mnemonic set, with golden JSON fixtures
and a disasm_goldens test suite. Add a criterion-style interpreter
bench. context.rs grows the per-thread state the new modules need
(reservation slot, FPSCR, vector regs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 16:27:43 +02:00

2 Commits