# `vaddfp` — Vector Add Floating Point > **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000000a` ## Assembler Mnemonics | Mnemonic | XML entry | Flags | Description | | --- | --- | --- | --- | | `vaddfp` | `vaddfp` | — | Vector Add Floating Point | | `vaddfp128` | `vaddfp128` | — | Vector128 Add Floating Point | ## Syntax ```asm vaddfp [VD], [VA], [VB] vaddfp128 [VD], [VA], [VB] ``` ## Encoding ### `vaddfp` — form `VX` - **Opcode word:** `0x1000000a` - **Primary opcode (bits 0–5):** `4` - **Extended opcode:** `10` - **Synchronising:** no | Bits | Field | Meaning | | --- | --- | --- | | 0–5 | `OPCD` | primary opcode (4) | | 6–10 | `VRT/VD` | destination vector register | | 11–15 | `VRA/VA` | source A vector register | | 16–20 | `VRB/VB` | source B vector register | | 21–31 | `XO` | extended opcode (11 bits) | ### `vaddfp128` — form `VX128` - **Opcode word:** `0x14000010` - **Primary opcode (bits 0–5):** `5` - **Extended opcode:** `16` - **Synchronising:** no | Bits | Field | Meaning | | --- | --- | --- | | 0–5 | `OPCD` | primary opcode (4 or 5) | | 6–10 | `VD128l` | destination low 5 bits | | 11–15 | `VA128l` | source A low 5 bits | | 16–20 | `VB128l` | source B low 5 bits | | 21 | `VA128H` | source A high bit | | 22 | `—` | reserved | | 23–25 | `VC` | optional VC / XO sub-field | | 26 | `VA128h` | source A middle bit | | 27 | `—` | reserved | | 28–29 | `VD128h` | destination high 2 bits | | 30–31 | `VB128h` | source B high 2 bits | ## Operands | Field | Role | Description | | --- | --- | --- | | `VA` | vaddfp: read; vaddfp128: read | Source A vector register. | | `VB` | vaddfp: read; vaddfp128: read | Source B vector register. | | `VD` | vaddfp: write; vaddfp128: write | Destination vector register. | ## Register Effects ### `vaddfp` - **Reads (always):** `VA`, `VB` - **Reads (conditional):** _none_ - **Writes (always):** `VD` - **Writes (conditional):** _none_ ### `vaddfp128` - **Reads (always):** `VA`, `VB` - **Reads (conditional):** _none_ - **Writes (always):** `VD` - **Writes (conditional):** _none_ ## Status-Register Effects _No condition-register or status-register effects._ ## Operation (pseudocode) ``` for each 32-bit float lane i in 0..3: VD[i] <- VA[i] + VB[i] ``` ## C Translation Example ```c /* vaddfp VD, VA, VB — lane-wise float add */ for (int i = 0; i < 4; ++i) v[insn.VD].f[i] = v[insn.VA].f[i] + v[insn.VB].f[i]; ``` ## Implementation References **`vaddfp`** - xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddfp"`](../../xenia-canary/tools/ppc-instructions.xml) - xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:341`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L341) - xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89) - xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:438`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L438) - xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1984-1998`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1984-L1998)
xenia-rs interpreter body (frozen snapshot) ```rust PpcOpcode::vaddfp => { // PPCBUG-435: VSCR.NJ=1 (Xbox 360 always boots with this set) requires // flush-to-zero on subnormal inputs and outputs. Canary VMX float // arithmetic flushes denormals unconditionally. let a = ctx.vr[instr.ra()].as_f32x4(); let b = ctx.vr[instr.rb()].as_f32x4(); let mut r = [0f32; 4]; for i in 0..4 { let ai = vmx::flush_denorm(a[i]); let bi = vmx::flush_denorm(b[i]); r[i] = vmx::flush_denorm(ai + bi); } ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); ctx.pc += 4; } ```
**`vaddfp128`** - xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddfp128"`](../../xenia-canary/tools/ppc-instructions.xml) - xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:344`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L344) - xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89) - xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:610`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L610) - xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1999-2011`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1999-L2011)
xenia-rs interpreter body (frozen snapshot) ```rust PpcOpcode::vaddfp128 => { // PPCBUG-435: same as vaddfp. let a = ctx.vr[instr.va128()].as_f32x4(); let b = ctx.vr[instr.vb128()].as_f32x4(); let mut r = [0f32; 4]; for i in 0..4 { let ai = vmx::flush_denorm(a[i]); let bi = vmx::flush_denorm(b[i]); r[i] = vmx::flush_denorm(ai + bi); } ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); ctx.pc += 4; } ```
## Extended Pseudocode ``` ; Four independent lane-wise IEEE-754 single-precision adds for i in 0..3: VD[i] <- VA[i] + VB[i] ; binary32, rounded to nearest ; No FPSCR update (VMX uses VSCR, which only has NJ / SAT — and vaddfp doesn't saturate) ``` ## Special Cases & Edge Conditions - **Lane indexing is big-endian.** Lane 0 is the **most significant** 4 bytes of the 128-bit register (the one that appears at the lowest byte offset after a `stvx`). Xenia's `Vec128::as_f32x4()` already reads lanes in PPC order on x86-64. When writing C that manipulates individual lanes, index `v.f[0]` as "the byte 0..3" of the big-endian layout. - **Flush-denormals ("NJ") mode.** Altivec is independent of FPSCR — it has its own 2-bit VSCR (`NJ` for non-Java mode + `SAT` sticky-saturation). VMX float operations honour `VSCR[NJ]`: when set (the Xenon boot default), denormal inputs and outputs are flushed to zero. This is **opposite** to the scalar FPU, which has its own non-IEEE bit. Xenia sets `NJ = 1` at context creation ([`context.rs`](../../xenia-rs/crates/xenia-cpu/src/context.rs)). - **No exception, no trap.** Altivec floats never raise exceptions. NaN inputs produce NaN outputs; `±∞ − ±∞` yields a NaN; there is no VXISI-style status bit. `VSCR[SAT]` is **not** touched by `vaddfp` (it saturates integer ops, not floats). - **Four independent lanes.** Each lane's operation is unaffected by the others. Aliasing between `VA`, `VB`, and `VD` is legal and common (`vaddfp v3, v3, v4`). - **VMX128 sibling (`vaddfp128`).** Semantics identical; only the register encoding differs. VMX128 uses a 7-bit operand ID per source (and destination) built from two or three non-contiguous bit fields — see [`categories/vmx128.md`](../categories/vmx128.md). Any bit pattern encodable as a 32-register VX-form is also encodable as a VMX128 form, so compilers picked the more compact form that reached the needed register range. - **On x86-64 hosts.** A natural compilation uses `_mm_add_ps` or AVX `vaddps`. These preserve lane indexing because PPC lane 0 maps to x86 lane 3 only if you treat the 128-bit value as "big-endian in memory" — i.e. byte-swap on load/store. With xenia's `_be` memory helpers, `_mm_add_ps` gives the right per-lane result. ## Related Instructions - [`vsubfp`](vsubfp.md) — lane-wise float subtract. - [`vmaddfp`](vmaddfp.md) — lane-wise `(VA × VC) + VB` (fused multiply-add with single rounding). - [`vnmsubfp`](vnmsubfp.md) — `−((VA × VC) − VB)`. - [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — IEEE-754-aware max/min (NaN propagation). - [`vcmpeqfp`](vcmpeqfp.md), [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md), [`vcmpbfp`](vcmpbfp.md) — compares producing per-lane all-ones / all-zero masks. - [`vrfin`](vrfin.md), [`vrfim`](vrfim.md), [`vrfip`](vrfip.md), [`vrfiz`](vrfiz.md) — round to integer (to-nearest / down / up / toward-zero). - [`vmulfp`](vmulfp.md) — xenia's helper; not a native Altivec op, included for convenience. Hardware games use `vmaddfp v, va, vc, v0_zero` instead. ## IBM Reference - [AIX 7.3 — `vaddfp` (Vector Add Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddfp-vector-add-floating-point-instruction) - [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)