Files
xenia-rs/migration/project-root/ppc-manual/vmx/vaddfp.md
MechaCat02 e6d43a23ac chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:

  - claude-memory/             ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
                               (103 files, 1.1 MB - MEMORY.md + every
                                project_xenia_rs_*.md from audits
                                addis_signext through audit-058)
  - project-root/dot-claude/   <project-root>/.claude/settings.json
                               (Stop hook + permissions)
  - project-root/ppc-manual/   <project-root>/ppc-manual/
                               (PowerPC reference docs, 397 files, 3.7 MB)
  - project-root/run-canary.sh <project-root>/run-canary.sh
  - README.md                  Human-readable setup checklist
  - setup.sh                   Idempotent installer (also reclones
                               xenia-canary at pinned HEAD 6de80dffe)
  - MANIFEST.md                Per-file mapping + per-file-not-bundled
                               restoration recipe

Excluded from bundle (not shippable via git):
  - Sylpheed ISO (7.8 GB; copyright; manual copy required)
  - sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
  - target/ build artifacts (rebuild on target)
  - audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
  - audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
  - xenia-canary checkout (setup.sh reclones from
    git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:38:38 +02:00

8.5 KiB
Raw Blame History

vaddfp — Vector Add Floating Point

Category: VMX (Altivec) · Form: VX · Opcode: 0x1000000a

Assembler Mnemonics

Mnemonic XML entry Flags Description
vaddfp vaddfp Vector Add Floating Point
vaddfp128 vaddfp128 Vector128 Add Floating Point

Syntax

vaddfp [VD], [VA], [VB]
vaddfp128 [VD], [VA], [VB]

Encoding

vaddfp — form VX

  • Opcode word: 0x1000000a
  • Primary opcode (bits 05): 4
  • Extended opcode: 10
  • Synchronising: no
Bits Field Meaning
05 OPCD primary opcode (4)
610 VRT/VD destination vector register
1115 VRA/VA source A vector register
1620 VRB/VB source B vector register
2131 XO extended opcode (11 bits)

vaddfp128 — form VX128

  • Opcode word: 0x14000010
  • Primary opcode (bits 05): 5
  • Extended opcode: 16
  • Synchronising: no
Bits Field Meaning
05 OPCD primary opcode (4 or 5)
610 VD128l destination low 5 bits
1115 VA128l source A low 5 bits
1620 VB128l source B low 5 bits
21 VA128H source A high bit
22 reserved
2325 VC optional VC / XO sub-field
26 VA128h source A middle bit
27 reserved
2829 VD128h destination high 2 bits
3031 VB128h source B high 2 bits

Operands

Field Role Description
VA vaddfp: read; vaddfp128: read Source A vector register.
VB vaddfp: read; vaddfp128: read Source B vector register.
VD vaddfp: write; vaddfp128: write Destination vector register.

Register Effects

vaddfp

  • Reads (always): VA, VB
  • Reads (conditional): none
  • Writes (always): VD
  • Writes (conditional): none

vaddfp128

  • Reads (always): VA, VB
  • Reads (conditional): none
  • Writes (always): VD
  • Writes (conditional): none

Status-Register Effects

No condition-register or status-register effects.

Operation (pseudocode)

for each 32-bit float lane i in 0..3:
    VD[i] <- VA[i] + VB[i]

C Translation Example

/* vaddfp VD, VA, VB — lane-wise float add                         */
for (int i = 0; i < 4; ++i) v[insn.VD].f[i] = v[insn.VA].f[i] + v[insn.VB].f[i];

Implementation References

vaddfp

xenia-rs interpreter body (frozen snapshot)
        PpcOpcode::vaddfp => {
            // PPCBUG-435: VSCR.NJ=1 (Xbox 360 always boots with this set) requires
            // flush-to-zero on subnormal inputs and outputs. Canary VMX float
            // arithmetic flushes denormals unconditionally.
            let a = ctx.vr[instr.ra()].as_f32x4();
            let b = ctx.vr[instr.rb()].as_f32x4();
            let mut r = [0f32; 4];
            for i in 0..4 {
                let ai = vmx::flush_denorm(a[i]);
                let bi = vmx::flush_denorm(b[i]);
                r[i] = vmx::flush_denorm(ai + bi);
            }
            ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r);
            ctx.pc += 4;
        }

vaddfp128

xenia-rs interpreter body (frozen snapshot)
        PpcOpcode::vaddfp128 => {
            // PPCBUG-435: same as vaddfp.
            let a = ctx.vr[instr.va128()].as_f32x4();
            let b = ctx.vr[instr.vb128()].as_f32x4();
            let mut r = [0f32; 4];
            for i in 0..4 {
                let ai = vmx::flush_denorm(a[i]);
                let bi = vmx::flush_denorm(b[i]);
                r[i] = vmx::flush_denorm(ai + bi);
            }
            ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
            ctx.pc += 4;
        }

Extended Pseudocode

; Four independent lane-wise IEEE-754 single-precision adds
for i in 0..3:
    VD[i] <- VA[i] + VB[i]                       ; binary32, rounded to nearest

; No FPSCR update (VMX uses VSCR, which only has NJ / SAT — and vaddfp doesn't saturate)

Special Cases & Edge Conditions

  • Lane indexing is big-endian. Lane 0 is the most significant 4 bytes of the 128-bit register (the one that appears at the lowest byte offset after a stvx). Xenia's Vec128::as_f32x4() already reads lanes in PPC order on x86-64. When writing C that manipulates individual lanes, index v.f[0] as "the byte 0..3" of the big-endian layout.
  • Flush-denormals ("NJ") mode. Altivec is independent of FPSCR — it has its own 2-bit VSCR (NJ for non-Java mode + SAT sticky-saturation). VMX float operations honour VSCR[NJ]: when set (the Xenon boot default), denormal inputs and outputs are flushed to zero. This is opposite to the scalar FPU, which has its own non-IEEE bit. Xenia sets NJ = 1 at context creation (context.rs).
  • No exception, no trap. Altivec floats never raise exceptions. NaN inputs produce NaN outputs; ±∞ ±∞ yields a NaN; there is no VXISI-style status bit. VSCR[SAT] is not touched by vaddfp (it saturates integer ops, not floats).
  • Four independent lanes. Each lane's operation is unaffected by the others. Aliasing between VA, VB, and VD is legal and common (vaddfp v3, v3, v4).
  • VMX128 sibling (vaddfp128). Semantics identical; only the register encoding differs. VMX128 uses a 7-bit operand ID per source (and destination) built from two or three non-contiguous bit fields — see categories/vmx128.md. Any bit pattern encodable as a 32-register VX-form is also encodable as a VMX128 form, so compilers picked the more compact form that reached the needed register range.
  • On x86-64 hosts. A natural compilation uses _mm_add_ps or AVX vaddps. These preserve lane indexing because PPC lane 0 maps to x86 lane 3 only if you treat the 128-bit value as "big-endian in memory" — i.e. byte-swap on load/store. With xenia's _be memory helpers, _mm_add_ps gives the right per-lane result.
  • vsubfp — lane-wise float subtract.
  • vmaddfp — lane-wise (VA × VC) + VB (fused multiply-add with single rounding).
  • vnmsubfp((VA × VC) VB).
  • vmaxfp, vminfp — IEEE-754-aware max/min (NaN propagation).
  • vcmpeqfp, vcmpgtfp, vcmpgefp, vcmpbfp — compares producing per-lane all-ones / all-zero masks.
  • vrfin, vrfim, vrfip, vrfiz — round to integer (to-nearest / down / up / toward-zero).
  • vmulfp — xenia's helper; not a native Altivec op, included for convenience. Hardware games use vmaddfp v, va, vc, v0_zero instead.

IBM Reference