# `vmaddfp` — Vector Multiply-Add Floating Point > **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002e` ## Assembler Mnemonics | Mnemonic | XML entry | Flags | Description | | --- | --- | --- | --- | | `vmaddfp` | `vmaddfp` | — | Vector Multiply-Add Floating Point | | `vmaddfp128` | `vmaddfp128` | — | Vector128 Multiply Add Floating Point | ## Syntax ```asm vmaddfp [VD], [VA], [VC], [VB] vmaddfp128 [VD], [VA], [VB], [VD] ``` ## Encoding ### `vmaddfp` — form `VA` - **Opcode word:** `0x1000002e` - **Primary opcode (bits 0–5):** `4` - **Extended opcode:** `46` - **Synchronising:** no | Bits | Field | Meaning | | --- | --- | --- | | 0–5 | `OPCD` | primary opcode (4) | | 6–10 | `VRT` | destination vector register | | 11–15 | `VRA` | source A | | 16–20 | `VRB` | source B | | 21–25 | `VRC` | source C / shift | | 26–31 | `XO` | extended opcode (6 bits) | ### `vmaddfp128` — form `VX128` - **Opcode word:** `0x140000d0` - **Primary opcode (bits 0–5):** `5` - **Extended opcode:** `208` - **Synchronising:** no | Bits | Field | Meaning | | --- | --- | --- | | 0–5 | `OPCD` | primary opcode (4 or 5) | | 6–10 | `VD128l` | destination low 5 bits | | 11–15 | `VA128l` | source A low 5 bits | | 16–20 | `VB128l` | source B low 5 bits | | 21 | `VA128H` | source A high bit | | 22 | `—` | reserved | | 23–25 | `VC` | optional VC / XO sub-field | | 26 | `VA128h` | source A middle bit | | 27 | `—` | reserved | | 28–29 | `VD128h` | destination high 2 bits | | 30–31 | `VB128h` | source B high 2 bits | ## Operands | Field | Role | Description | | --- | --- | --- | | `VA` | vmaddfp: read; vmaddfp128: read | Source A vector register. | | `VC` | vmaddfp: read; vmaddfp128: read | Source C vector register / 3-bit selector. | | `VB` | vmaddfp: read; vmaddfp128: read | Source B vector register. | | `VD` | vmaddfp: write; vmaddfp128: write | Destination vector register. | ## Register Effects ### `vmaddfp` - **Reads (always):** `VA`, `VC`, `VB` - **Reads (conditional):** _none_ - **Writes (always):** `VD` - **Writes (conditional):** _none_ ### `vmaddfp128` - **Reads (always):** `VA`, `VC`, `VB` - **Reads (conditional):** _none_ - **Writes (always):** `VD` - **Writes (conditional):** _none_ ## Status-Register Effects _No condition-register or status-register effects._ ## Operation (pseudocode) ``` for each 32-bit float lane i in 0..3: VD[i] <- (VA[i] * VC[i]) + VB[i] ``` ## C Translation Example ```c /* C translation: the xenia-rs interpreter arm below in */ /* Implementation References is the authoritative semantic */ /* snapshot. Translate it line-by-line: */ /* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ /* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ /* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ /* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ /* The Register Effects and Status-Register Effects tables above */ /* enumerate every side effect a faithful translation must emit. */ ``` ## Implementation References **`vmaddfp`** - xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaddfp"`](../../xenia-canary/tools/ppc-instructions.xml) - xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:801`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L801) - xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:100`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L100) - xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:588`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L588) - xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2038-2054`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2038-L2054)
xenia-rs interpreter body (frozen snapshot) ```rust PpcOpcode::vmaddfp => { // vD = (vA * vC) + vB. AltiVec unconditionally flushes denormal // *inputs* to 0 regardless of VSCR[NJ] (confirmed on POWER8 hw). let a = ctx.vr[instr.ra()].as_f32x4(); let b = ctx.vr[instr.rb()].as_f32x4(); let c = ctx.vr[instr.rc()].as_f32x4(); let mut r = [0f32; 4]; for i in 0..4 { let ai = vmx::flush_denorm(a[i]); let bi = vmx::flush_denorm(b[i]); let ci = vmx::flush_denorm(c[i]); // PPCBUG-437: flush subnormal output too. r[i] = vmx::flush_denorm(ai.mul_add(ci, bi)); } ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); ctx.pc += 4; } ```
**`vmaddfp128`** - xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaddfp128"`](../../xenia-canary/tools/ppc-instructions.xml) - xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:805`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L805) - xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:100`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L100) - xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:613`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L613) - xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2055-2073`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2055-L2073)
xenia-rs interpreter body (frozen snapshot) ```rust PpcOpcode::vmaddfp128 => { // ISA: (VD) <- (VA × VD) + VB. VD is both the second multiplicand and destination. // Canary InstrEmit_vmaddfp128 (ppc_emit_altivec.cc:806-809): MulAdd(VA, VD, VB). // Previous code computed ai.mul_add(bi, di) = VA×VB+VD — VB and VD roles swapped // (PPCBUG-424). Fix: ai.mul_add(di, bi) = VA×VD+VB. let a = ctx.vr[instr.va128()].as_f32x4(); let b = ctx.vr[instr.vb128()].as_f32x4(); let d = ctx.vr[instr.vd128()].as_f32x4(); let mut r = [0f32; 4]; for i in 0..4 { let ai = vmx::flush_denorm(a[i]); let bi = vmx::flush_denorm(b[i]); let di = vmx::flush_denorm(d[i]); // PPCBUG-437. r[i] = vmx::flush_denorm(ai.mul_add(di, bi)); } ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); ctx.pc += 4; } ```
## Special Cases & Edge Conditions - **Fused multiply-add: `VD = (VA * VC) + VB`** per word lane (single rounding). No intermediate rounding between the multiply and the add — this is critical for numerical accuracy in DSP filters and reduces error in dot products. - **Big-endian word lanes.** Lane 0 is the most-significant word. - **NaN propagation, ±∞ arithmetic.** Standard IEEE-754: any NaN input yields NaN; `(+∞ * 0)` yields NaN; the sum of `+∞` and `-∞` (e.g. `(+∞ * 1) + -∞`) yields NaN. No trap, no sticky bit. - **`VSCR[NJ]` denormals.** With `NJ = 1` (Xenon default), denormal inputs and outputs are flushed to `±0`. - **No `VSCR[SAT]` change, no XER change, no exceptions.** - **VMX128 sibling has surprising operand layout — `VD` is also a source.** Xenia's `vmaddfp128` reads `VA`, `VB`, *and `VD` itself* (as the accumulator), computing `VD = (VA * VB) + VD_prev` ([`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs)). The standard `vmaddfp` keeps the canonical 4-operand `VA, VC, VB → VD` shape. **This is a real difference in operand encoding** (VX128_3 form vs. VA-form) that compilers must respect — VMX128 sacrifices the third source register slot for the extra register-file bits. - **Aliasing legal.** `vmaddfp v3, v3, v3, v3` works (squares + adds itself). - **Common usage.** Per-lane polynomial evaluation, dot-product accumulation, any matrix multiply inner loop. Pair four `vmaddfp` instructions to do a 4×4 × 4-vec multiply. ## Related Instructions - [`vnmsubfp`](vnmsubfp.md) — `−((VA * VC) − VB)`; fused negative-multiply-subtract. - [`vaddfp`](vaddfp.md), [`vsubfp`](vsubfp.md) — plain float add / subtract. - [`vmulfp`](vmulfp.md) — xenia helper for `VA * VC`; on hardware games use `vmaddfp v, va, vc, v0_zero`. - [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — min / max for clamping. - [`vrefp`](vrefp.md), [`vrsqrtefp`](vrsqrtefp.md) — reciprocal / inverse-sqrt estimates that often appear in the same FMA chain. ## IBM Reference - [AIX 7.3 — `vmaddfp` (Vector Multiply-Add Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaddfp-vector-multiply-add-floating-point-instruction) - [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Multiply-Add Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)