Files
xenia-rs/migration/project-root/ppc-manual/vmx/vmaddfp.md
MechaCat02 e6d43a23ac chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:

  - claude-memory/             ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
                               (103 files, 1.1 MB - MEMORY.md + every
                                project_xenia_rs_*.md from audits
                                addis_signext through audit-058)
  - project-root/dot-claude/   <project-root>/.claude/settings.json
                               (Stop hook + permissions)
  - project-root/ppc-manual/   <project-root>/ppc-manual/
                               (PowerPC reference docs, 397 files, 3.7 MB)
  - project-root/run-canary.sh <project-root>/run-canary.sh
  - README.md                  Human-readable setup checklist
  - setup.sh                   Idempotent installer (also reclones
                               xenia-canary at pinned HEAD 6de80dffe)
  - MANIFEST.md                Per-file mapping + per-file-not-bundled
                               restoration recipe

Excluded from bundle (not shippable via git):
  - Sylpheed ISO (7.8 GB; copyright; manual copy required)
  - sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
  - target/ build artifacts (rebuild on target)
  - audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
  - audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
  - xenia-canary checkout (setup.sh reclones from
    git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:38:38 +02:00

197 lines
8.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# `vmaddfp` — Vector Multiply-Add Floating Point
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002e`
<!-- GENERATED: BEGIN -->
## Assembler Mnemonics
| Mnemonic | XML entry | Flags | Description |
| --- | --- | --- | --- |
| `vmaddfp` | `vmaddfp` | — | Vector Multiply-Add Floating Point |
| `vmaddfp128` | `vmaddfp128` | — | Vector128 Multiply Add Floating Point |
## Syntax
```asm
vmaddfp [VD], [VA], [VC], [VB]
vmaddfp128 [VD], [VA], [VB], [VD]
```
## Encoding
### `vmaddfp` — form `VA`
- **Opcode word:** `0x1000002e`
- **Primary opcode (bits 05):** `4`
- **Extended opcode:** `46`
- **Synchronising:** no
| Bits | Field | Meaning |
| --- | --- | --- |
| 05 | `OPCD` | primary opcode (4) |
| 610 | `VRT` | destination vector register |
| 1115 | `VRA` | source A |
| 1620 | `VRB` | source B |
| 2125 | `VRC` | source C / shift |
| 2631 | `XO` | extended opcode (6 bits) |
### `vmaddfp128` — form `VX128`
- **Opcode word:** `0x140000d0`
- **Primary opcode (bits 05):** `5`
- **Extended opcode:** `208`
- **Synchronising:** no
| Bits | Field | Meaning |
| --- | --- | --- |
| 05 | `OPCD` | primary opcode (4 or 5) |
| 610 | `VD128l` | destination low 5 bits |
| 1115 | `VA128l` | source A low 5 bits |
| 1620 | `VB128l` | source B low 5 bits |
| 21 | `VA128H` | source A high bit |
| 22 | `—` | reserved |
| 2325 | `VC` | optional VC / XO sub-field |
| 26 | `VA128h` | source A middle bit |
| 27 | `—` | reserved |
| 2829 | `VD128h` | destination high 2 bits |
| 3031 | `VB128h` | source B high 2 bits |
## Operands
| Field | Role | Description |
| --- | --- | --- |
| `VA` | vmaddfp: read; vmaddfp128: read | Source A vector register. |
| `VC` | vmaddfp: read; vmaddfp128: read | Source C vector register / 3-bit selector. |
| `VB` | vmaddfp: read; vmaddfp128: read | Source B vector register. |
| `VD` | vmaddfp: write; vmaddfp128: write | Destination vector register. |
## Register Effects
### `vmaddfp`
- **Reads (always):** `VA`, `VC`, `VB`
- **Reads (conditional):** _none_
- **Writes (always):** `VD`
- **Writes (conditional):** _none_
### `vmaddfp128`
- **Reads (always):** `VA`, `VC`, `VB`
- **Reads (conditional):** _none_
- **Writes (always):** `VD`
- **Writes (conditional):** _none_
## Status-Register Effects
_No condition-register or status-register effects._
## Operation (pseudocode)
```
for each 32-bit float lane i in 0..3:
VD[i] <- (VA[i] * VC[i]) + VB[i]
```
## C Translation Example
```c
/* C translation: the xenia-rs interpreter arm below in */
/* Implementation References is the authoritative semantic */
/* snapshot. Translate it line-by-line: */
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
/* The Register Effects and Status-Register Effects tables above */
/* enumerate every side effect a faithful translation must emit. */
```
## Implementation References
**`vmaddfp`**
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaddfp"`](../../xenia-canary/tools/ppc-instructions.xml)
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:801`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L801)
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:100`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L100)
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:588`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L588)
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2038-2054`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2038-L2054)
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
```rust
PpcOpcode::vmaddfp => {
// vD = (vA * vC) + vB. AltiVec unconditionally flushes denormal
// *inputs* to 0 regardless of VSCR[NJ] (confirmed on POWER8 hw).
let a = ctx.vr[instr.ra()].as_f32x4();
let b = ctx.vr[instr.rb()].as_f32x4();
let c = ctx.vr[instr.rc()].as_f32x4();
let mut r = [0f32; 4];
for i in 0..4 {
let ai = vmx::flush_denorm(a[i]);
let bi = vmx::flush_denorm(b[i]);
let ci = vmx::flush_denorm(c[i]);
// PPCBUG-437: flush subnormal output too.
r[i] = vmx::flush_denorm(ai.mul_add(ci, bi));
}
ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r);
ctx.pc += 4;
}
```
</details>
**`vmaddfp128`**
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaddfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:805`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L805)
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:100`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L100)
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:613`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L613)
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2055-2073`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2055-L2073)
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
```rust
PpcOpcode::vmaddfp128 => {
// ISA: (VD) <- (VA × VD) + VB. VD is both the second multiplicand and destination.
// Canary InstrEmit_vmaddfp128 (ppc_emit_altivec.cc:806-809): MulAdd(VA, VD, VB).
// Previous code computed ai.mul_add(bi, di) = VA×VB+VD — VB and VD roles swapped
// (PPCBUG-424). Fix: ai.mul_add(di, bi) = VA×VD+VB.
let a = ctx.vr[instr.va128()].as_f32x4();
let b = ctx.vr[instr.vb128()].as_f32x4();
let d = ctx.vr[instr.vd128()].as_f32x4();
let mut r = [0f32; 4];
for i in 0..4 {
let ai = vmx::flush_denorm(a[i]);
let bi = vmx::flush_denorm(b[i]);
let di = vmx::flush_denorm(d[i]);
// PPCBUG-437.
r[i] = vmx::flush_denorm(ai.mul_add(di, bi));
}
ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
ctx.pc += 4;
}
```
</details>
<!-- GENERATED: END -->
## Special Cases & Edge Conditions
- **Fused multiply-add: `VD = (VA * VC) + VB`** per word lane (single rounding). No intermediate rounding between the multiply and the add — this is critical for numerical accuracy in DSP filters and reduces error in dot products.
- **Big-endian word lanes.** Lane 0 is the most-significant word.
- **NaN propagation, ±∞ arithmetic.** Standard IEEE-754: any NaN input yields NaN; `(+∞ * 0)` yields NaN; the sum of `+∞` and `-∞` (e.g. `(+∞ * 1) + -∞`) yields NaN. No trap, no sticky bit.
- **`VSCR[NJ]` denormals.** With `NJ = 1` (Xenon default), denormal inputs and outputs are flushed to `±0`.
- **No `VSCR[SAT]` change, no XER change, no exceptions.**
- **VMX128 sibling has surprising operand layout — `VD` is also a source.** Xenia's `vmaddfp128` reads `VA`, `VB`, *and `VD` itself* (as the accumulator), computing `VD = (VA * VB) + VD_prev` ([`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs)). The standard `vmaddfp` keeps the canonical 4-operand `VA, VC, VB → VD` shape. **This is a real difference in operand encoding** (VX128_3 form vs. VA-form) that compilers must respect — VMX128 sacrifices the third source register slot for the extra register-file bits.
- **Aliasing legal.** `vmaddfp v3, v3, v3, v3` works (squares + adds itself).
- **Common usage.** Per-lane polynomial evaluation, dot-product accumulation, any matrix multiply inner loop. Pair four `vmaddfp` instructions to do a 4×4 × 4-vec multiply.
## Related Instructions
- [`vnmsubfp`](vnmsubfp.md) — `((VA * VC) VB)`; fused negative-multiply-subtract.
- [`vaddfp`](vaddfp.md), [`vsubfp`](vsubfp.md) — plain float add / subtract.
- [`vmulfp`](vmulfp.md) — xenia helper for `VA * VC`; on hardware games use `vmaddfp v, va, vc, v0_zero`.
- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — min / max for clamping.
- [`vrefp`](vrefp.md), [`vrsqrtefp`](vrsqrtefp.md) — reciprocal / inverse-sqrt estimates that often appear in the same FMA chain.
## IBM Reference
- [AIX 7.3 — `vmaddfp` (Vector Multiply-Add Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaddfp-vector-multiply-add-floating-point-instruction)
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Multiply-Add Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)