# `vnmsubfp` — Vector Negative Multiply-Subtract Floating Point > **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002f` ## Assembler Mnemonics | Mnemonic | XML entry | Flags | Description | | --- | --- | --- | --- | | `vnmsubfp` | `vnmsubfp` | — | Vector Negative Multiply-Subtract Floating Point | | `vnmsubfp128` | `vnmsubfp128` | — | Vector128 Negative Multiply-Subtract Floating Point | ## Syntax ```asm vnmsubfp [VD], [VA], [VC], [VB] vnmsubfp128 [VD], [VA], [VD], [VB] ``` ## Encoding ### `vnmsubfp` — form `VA` - **Opcode word:** `0x1000002f` - **Primary opcode (bits 0–5):** `4` - **Extended opcode:** `47` - **Synchronising:** no | Bits | Field | Meaning | | --- | --- | --- | | 0–5 | `OPCD` | primary opcode (4) | | 6–10 | `VRT` | destination vector register | | 11–15 | `VRA` | source A | | 16–20 | `VRB` | source B | | 21–25 | `VRC` | source C / shift | | 26–31 | `XO` | extended opcode (6 bits) | ### `vnmsubfp128` — form `VX128` - **Opcode word:** `0x14000150` - **Primary opcode (bits 0–5):** `5` - **Extended opcode:** `336` - **Synchronising:** no | Bits | Field | Meaning | | --- | --- | --- | | 0–5 | `OPCD` | primary opcode (4 or 5) | | 6–10 | `VD128l` | destination low 5 bits | | 11–15 | `VA128l` | source A low 5 bits | | 16–20 | `VB128l` | source B low 5 bits | | 21 | `VA128H` | source A high bit | | 22 | `—` | reserved | | 23–25 | `VC` | optional VC / XO sub-field | | 26 | `VA128h` | source A middle bit | | 27 | `—` | reserved | | 28–29 | `VD128h` | destination high 2 bits | | 30–31 | `VB128h` | source B high 2 bits | ## Operands | Field | Role | Description | | --- | --- | --- | | `VA` | vnmsubfp: read; vnmsubfp128: read | Source A vector register. | | `VC` | vnmsubfp: read | Source C vector register / 3-bit selector. | | `VB` | vnmsubfp: read; vnmsubfp128: read | Source B vector register. | | `VD` | vnmsubfp: write; vnmsubfp128: read; vnmsubfp128: write | Destination vector register. | ## Register Effects ### `vnmsubfp` - **Reads (always):** `VA`, `VC`, `VB` - **Reads (conditional):** _none_ - **Writes (always):** `VD` - **Writes (conditional):** _none_ ### `vnmsubfp128` - **Reads (always):** `VA`, `VD`, `VB` - **Reads (conditional):** _none_ - **Writes (always):** `VD` - **Writes (conditional):** _none_ ## Status-Register Effects _No condition-register or status-register effects._ ## Operation (pseudocode) ``` for each 32-bit float lane i in 0..3: VD[i] <- −((VA[i] * VC[i]) − VB[i]) ``` ## C Translation Example ```c /* C translation: the xenia-rs interpreter arm below in */ /* Implementation References is the authoritative semantic */ /* snapshot. Translate it line-by-line: */ /* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ /* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ /* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ /* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ /* The Register Effects and Status-Register Effects tables above */ /* enumerate every side effect a faithful translation must emit. */ ``` ## Implementation References **`vnmsubfp`** - xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vnmsubfp"`](../../xenia-canary/tools/ppc-instructions.xml) - xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1154`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1154) - xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:110`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L110) - xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:589`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L589) - xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2074-2089`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2074-L2089)
xenia-rs interpreter body (frozen snapshot) ```rust PpcOpcode::vnmsubfp => { // vD = -(vA * vC - vB) = vB - vA * vC. Same denorm-flush rule as vmaddfp. let a = ctx.vr[instr.ra()].as_f32x4(); let b = ctx.vr[instr.rb()].as_f32x4(); let c = ctx.vr[instr.rc()].as_f32x4(); let mut r = [0f32; 4]; for i in 0..4 { let ai = vmx::flush_denorm(a[i]); let bi = vmx::flush_denorm(b[i]); let ci = vmx::flush_denorm(c[i]); // PPCBUG-426: single FMA rounding instead of two-step (b - a*c). r[i] = vmx::flush_denorm(-ai.mul_add(ci, -bi)); } ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); ctx.pc += 4; } ```
**`vnmsubfp128`** - xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vnmsubfp128"`](../../xenia-canary/tools/ppc-instructions.xml) - xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1157`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1157) - xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:110`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L110) - xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:615`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L615) - xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2090-2107`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2090-L2107)
xenia-rs interpreter body (frozen snapshot) ```rust PpcOpcode::vnmsubfp128 => { // VMX128 form: vD <- -((vA * vB) - vD) = vD - (vA * vB). Canary // routes through `InstrEmit_vnmsubfp_` with the same arg-swap, // which flushes all inputs unconditionally. let a = ctx.vr[instr.va128()].as_f32x4(); let b = ctx.vr[instr.vb128()].as_f32x4(); let d = ctx.vr[instr.vd128()].as_f32x4(); let mut r = [0f32; 4]; for i in 0..4 { let ai = vmx::flush_denorm(a[i]); let bi = vmx::flush_denorm(b[i]); let di = vmx::flush_denorm(d[i]); // PPCBUG-427: single FMA rounding. r[i] = vmx::flush_denorm(-ai.mul_add(bi, -di)); } ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); ctx.pc += 4; } ```
## Special Cases & Edge Conditions - **Lane-wise negative multiply-subtract.** Each of the four lanes computes `VD[i] = −((VA[i] × VC[i]) − VB[i])`, i.e. `VB[i] − VA[i] × VC[i]`. The multiply and the subsequent add are **not** a single fused rounding step in xenia — they're a multiply, a subtract, then a negate — but the PowerPC ISA specifies the sequence to behave *as if* it were fused (single IEEE-754 rounding). Hardware Xenon indeed rounds only once. - **IEEE-754 binary32 lanes.** Follows `VSCR[NJ]`: denormal inputs/outputs flush to zero when `NJ = 1`. - **No VSCR[SAT] update.** VMX float ops never set saturation. - **No FPSCR effect.** Unlike scalar `fnmsub[s]`, `vnmsubfp` does not touch FPSCR. - **NaN propagation.** A NaN in any of `VA`, `VB`, or `VC` yields a NaN in the corresponding lane. Sign-of-NaN is unspecified but stable in xenia (matches the x86 host's `vfnmadd`-family output). - **Big-endian lane indexing.** Lane 0 is the MSB-most 4 bytes. - **VMX128 sibling: [`vnmsubfp128`](vnmsubfp128.md).** Identical operation with access to `v0..v127`. - **No `Rc` bit** on this opcode; it never touches CR. ## Related Instructions - [`vmaddfp`](vmaddfp.md) — the positive-rounded fused MAC `(VA × VC) + VB`. - [`vaddfp`](vaddfp.md), [`vsubfp`](vsubfp.md) — the underlying adds/subs. - [`vmulfp`](vmulfp.md) — xenia-convenience lane-wise float multiply (no native Altivec form; usually encoded as `vmaddfp VD, VA, VC, v0_zero`). - [`vrefp`](vrefp.md), [`vrsqrtefp`](vrsqrtefp.md) — Newton iterations that pair with `vnmsubfp`. - [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — the other float-arithmetic primitives. ## IBM Reference - [AIX 7.3 — `vnmsubfp` (Vector Negative Multiply-Subtract Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vnmsubfp-vector-negative-multiply-subtract-floating-point-instruction) - [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)