chore: add migration/ bundle for cross-machine setup

Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on another machine can be brought up to identical configuration via migration/setup.sh: - claude-memory/ ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/ (103 files, 1.1 MB - MEMORY.md + every project_xenia_rs_*.md from audits addis_signext through audit-058) - project-root/dot-claude/ <project-root>/.claude/settings.json (Stop hook + permissions) - project-root/ppc-manual/ <project-root>/ppc-manual/ (PowerPC reference docs, 397 files, 3.7 MB) - project-root/run-canary.sh <project-root>/run-canary.sh - README.md Human-readable setup checklist - setup.sh Idempotent installer (also reclones xenia-canary at pinned HEAD 6de80dffe) - MANIFEST.md Per-file mapping + per-file-not-bundled restoration recipe Excluded from bundle (not shippable via git): - Sylpheed ISO (7.8 GB; copyright; manual copy required) - sylpheed.db (395 MB; regenerable from XEX via analysis tooling) - target/ build artifacts (rebuild on target) - audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed) - audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed) - xenia-canary checkout (setup.sh reclones from git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:38:38 +02:00
parent 8e709b0a24
commit e6d43a23ac
505 changed files with 86028 additions and 0 deletions
--- a/migration/project-root/ppc-manual/vmx128/vcfpsxws128.md
+++ b/migration/project-root/ppc-manual/vmx128/vcfpsxws128.md
@@ -0,0 +1,137 @@
+# `vcfpsxws128` — Vector128 Convert From Floating-Point to Signed Fixed-Point Word Saturate
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_3](../forms/VX128_3.md) · **Opcode:** `0x18000230`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vcfpsxws128` | `vcfpsxws128` | — | Vector128 Convert From Floating-Point to Signed Fixed-Point Word Saturate |
+
+## Syntax
+
+```asm
+vcfpsxws128 [VD], [VB], [UIMM]
+```
+
+## Encoding
+
+### `vcfpsxws128` — form `VX128_3`
+
+- **Opcode word:** `0x18000230`
+- **Primary opcode (bits 0–5):** `6`
+- **Extended opcode:** `560`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (6) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `IMM` | 5-bit immediate |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21–27 | `XO` | extended opcode |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VB` | vcfpsxws128: read | Source B vector register. |
+| `UIMM` | vcfpsxws128: read | 16-bit unsigned immediate. Zero-extended. |
+| `VD` | vcfpsxws128: write | Destination vector register. |
+| `VSCR` | vcfpsxws128: write | Vector Status and Control Register (NJ/SAT bits). |
+
+## Register Effects
+
+### `vcfpsxws128`
+
+- **Reads (always):** `VB`, `UIMM`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`, `VSCR`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+- `vcfpsxws128`: **VSCR[SAT]** may be stickied on saturating vector operations.
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vcfpsxws128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcfpsxws128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:539`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L539)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:93`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L93)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:656`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L656)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4323-4334`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4323-L4334)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vcfpsxws128 => {
+            let uimm = (instr.raw >> 16) & 0x1F;
+            let b = ctx.vr[instr.vb128()].as_f32x4();
+            let mut r = [0i32; 4]; let mut sat = false;
+            for i in 0..4 {
+                let (v, s) = crate::vmx::cvt_f32_to_i32_sat(b[i], uimm);
+                r[i] = v; sat |= s;
+            }
+            if sat { ctx.set_vscr_sat(true); }
+            ctx.vr[instr.vd128()] = crate::vmx::from_i32x4(r);
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **Float → signed fixed-point (int32) with explicit scale.** Each lane computes `VD.w[i] = sat_int32(VB[i] * 2^UIMM)`, truncating toward zero and clamping to `[−2^31, 2^31−1]`. `UIMM` is a 5-bit unsigned bias (range 0..31) that specifies a power-of-two pre-scale on the float value.
+- **Use case: fixed-point pipelines.** The `UIMM` pre-scale lets game code convert a `[0.0, 1.0]` float channel into a `uint16`-range fixed-point value in one instruction (e.g. `UIMM = 15` → scale by 32768).
+- **Sticky VSCR[SAT]** set whenever a lane clamps (including NaN inputs, which xenia's `cvt_f32_to_i32_sat` treats as 0 and flags saturation).
+- **`VSCR[NJ]` honoured** on the float input side.
+- **VMX128 register-fusion** applies to `VD` and `VB`: 7-bit register IDs via `VD128l ‖ VD128h` and `VB128l ‖ VB128h`.
+- **No IBM AIX entry** — this is Xenon-only. The closest standard Altivec op is [`vctsxs`](../vmx/vctsxs.md).
+- **No `Rc`, no XER / FPSCR.**
+
+## Related Instructions
+
+- [`vctsxs`](../vmx/vctsxs.md) — the standard Altivec equivalent (same semantics, 32-register file).
+- [`vcfpuxws128`](vcfpuxws128.md) — unsigned variant (clamps to `uint32`).
+- [`vcsxwfp128`](vcsxwfp128.md), [`vcuxwfp128`](vcuxwfp128.md) — the inverse (int → float with scale).
+- [`vrfiz`](../vmx/vrfiz.md) — plain truncate-to-float-integer without scale.
+
+## IBM Reference
+
+- No IBM AIX entry — this instruction is exclusive to the Xbox 360's VMX128 extension.
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions (Microsoft internal documentation); semantics cross-referenced with [IBM AltiVec Technology Programmer's Interface Manual §`vctsxs`](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf).
--- a/migration/project-root/ppc-manual/vmx128/vcfpuxws128.md
+++ b/migration/project-root/ppc-manual/vmx128/vcfpuxws128.md
@@ -0,0 +1,137 @@
+# `vcfpuxws128` — Vector128 Convert From Floating-Point to Unsigned Fixed-Point Word Saturate
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_3](../forms/VX128_3.md) · **Opcode:** `0x18000270`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vcfpuxws128` | `vcfpuxws128` | — | Vector128 Convert From Floating-Point to Unsigned Fixed-Point Word Saturate |
+
+## Syntax
+
+```asm
+vcfpuxws128 [VD], [VB], [UIMM]
+```
+
+## Encoding
+
+### `vcfpuxws128` — form `VX128_3`
+
+- **Opcode word:** `0x18000270`
+- **Primary opcode (bits 0–5):** `6`
+- **Extended opcode:** `624`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (6) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `IMM` | 5-bit immediate |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21–27 | `XO` | extended opcode |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VB` | vcfpuxws128: read | Source B vector register. |
+| `UIMM` | vcfpuxws128: read | 16-bit unsigned immediate. Zero-extended. |
+| `VD` | vcfpuxws128: write | Destination vector register. |
+| `VSCR` | vcfpuxws128: write | Vector Status and Control Register (NJ/SAT bits). |
+
+## Register Effects
+
+### `vcfpuxws128`
+
+- **Reads (always):** `VB`, `UIMM`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`, `VSCR`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+- `vcfpuxws128`: **VSCR[SAT]** may be stickied on saturating vector operations.
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vcfpuxws128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcfpuxws128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:557`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L557)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:93`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L93)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:657`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L657)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4335-4346`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4335-L4346)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vcfpuxws128 => {
+            let uimm = (instr.raw >> 16) & 0x1F;
+            let b = ctx.vr[instr.vb128()].as_f32x4();
+            let mut r = [0u32; 4]; let mut sat = false;
+            for i in 0..4 {
+                let (v, s) = crate::vmx::cvt_f32_to_u32_sat(b[i], uimm);
+                r[i] = v; sat |= s;
+            }
+            if sat { ctx.set_vscr_sat(true); }
+            ctx.vr[instr.vd128()] = xenia_types::Vec128::from_u32x4_array(r);
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **Float → unsigned fixed-point (uint32) with explicit scale.** Each lane computes `VD.w[i] = sat_uint32(VB[i] * 2^UIMM)`, truncating toward zero and clamping to `[0, 2^32−1]`. `UIMM` is a 5-bit unsigned bias (range 0..31).
+- **Negative floats clamp to 0** and sticky-set `VSCR[SAT]`.
+- **NaN inputs** → 0 with `VSCR[SAT]` set (xenia's `cvt_f32_to_u32_sat`).
+- **`VSCR[NJ]` honoured** for denormal inputs.
+- **VMX128 register-fusion** applies to `VD` and `VB` (7-bit IDs).
+- **No IBM AIX entry** — Xenon-only.
+- **No `Rc`, no XER / FPSCR.**
+
+## Related Instructions
+
+- [`vctuxs`](../vmx/vctuxs.md) — the standard Altivec equivalent (uint32 clamp with scale).
+- [`vcfpsxws128`](vcfpsxws128.md) — signed variant.
+- [`vcuxwfp128`](vcuxwfp128.md) — the inverse (uint → float with scale).
+- [`vrfiz`](../vmx/vrfiz.md) — plain truncate-to-integer-float without scale.
+
+## IBM Reference
+
+- No IBM AIX entry — this instruction is exclusive to the Xbox 360's VMX128 extension.
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions (Microsoft internal documentation); cross-referenced with [IBM AltiVec Technology Programmer's Interface Manual §`vctuxs`](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf).
--- a/migration/project-root/ppc-manual/vmx128/vcsxwfp128.md
+++ b/migration/project-root/ppc-manual/vmx128/vcsxwfp128.md
@@ -0,0 +1,131 @@
+# `vcsxwfp128` — Vector128 Convert From Signed Fixed-Point Word to Floating-Point
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_3](../forms/VX128_3.md) · **Opcode:** `0x180002b0`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vcsxwfp128` | `vcsxwfp128` | — | Vector128 Convert From Signed Fixed-Point Word to Floating-Point |
+
+## Syntax
+
+```asm
+vcsxwfp128 [VD], [VB], [UIMM]
+```
+
+## Encoding
+
+### `vcsxwfp128` — form `VX128_3`
+
+- **Opcode word:** `0x180002b0`
+- **Primary opcode (bits 0–5):** `6`
+- **Extended opcode:** `688`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (6) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `IMM` | 5-bit immediate |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21–27 | `XO` | extended opcode |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VB` | vcsxwfp128: read | Source B vector register. |
+| `UIMM` | vcsxwfp128: read | 16-bit unsigned immediate. Zero-extended. |
+| `VD` | vcsxwfp128: write | Destination vector register. |
+
+## Register Effects
+
+### `vcsxwfp128`
+
+- **Reads (always):** `VB`, `UIMM`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+_No condition-register or status-register effects._
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vcsxwfp128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcsxwfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:503`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L503)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:98`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L98)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:658`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L658)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4347-4354`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4347-L4354)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vcsxwfp128 => {
+            let uimm = (instr.raw >> 16) & 0x1F;
+            let b = crate::vmx::as_i32x4(ctx.vr[instr.vb128()]);
+            let mut r = [0f32; 4];
+            for i in 0..4 { r[i] = crate::vmx::cvt_i32_to_f32(b[i], uimm); }
+            ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **Signed fixed-point (int32) → float with explicit scale.** Each lane computes `VD[i] = (float)VB.w[i] * 2^-UIMM` (equivalently `(int32)VB.w[i] / 2^UIMM`). `UIMM` is a 5-bit unsigned bias that specifies a post-scale — the inverse direction of [`vcfpsxws128`](vcfpsxws128.md), so the `UIMM`s should match for a round-trip.
+- **IEEE-754 binary32 output, round-to-nearest.** Values outside the exactly-representable range (`|x| > 2^24`) lose low-order bits; no saturation on the float side.
+- **No `VSCR[SAT]` effect** — conversion in this direction never saturates.
+- **`VSCR[NJ]` does not affect the int → float path.**
+- **VMX128 register-fusion** applies (7-bit register IDs).
+- **No IBM AIX entry** — Xenon-only. Closest standard Altivec op is [`vcfsx`](../vmx/vcfsx.md).
+- **No `Rc`, no XER / FPSCR.**
+
+## Related Instructions
+
+- [`vcfsx`](../vmx/vcfsx.md) — the standard Altivec `int32 → float` with scale.
+- [`vcuxwfp128`](vcuxwfp128.md) — unsigned-int variant.
+- [`vcfpsxws128`](vcfpsxws128.md), [`vcfpuxws128`](vcfpuxws128.md) — the inverse (float → int with scale).
+
+## IBM Reference
+
+- No IBM AIX entry — Xbox 360 VMX128 extension only.
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions; cross-referenced with [IBM AltiVec Technology Programmer's Interface Manual §`vcfsx`](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf).
--- a/migration/project-root/ppc-manual/vmx128/vcuxwfp128.md
+++ b/migration/project-root/ppc-manual/vmx128/vcuxwfp128.md
@@ -0,0 +1,131 @@
+# `vcuxwfp128` — Vector128 Convert From Unsigned Fixed-Point Word to Floating-Point
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_3](../forms/VX128_3.md) · **Opcode:** `0x180002f0`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vcuxwfp128` | `vcuxwfp128` | — | Vector128 Convert From Unsigned Fixed-Point Word to Floating-Point |
+
+## Syntax
+
+```asm
+vcuxwfp128 [VD], [VB], [UIMM]
+```
+
+## Encoding
+
+### `vcuxwfp128` — form `VX128_3`
+
+- **Opcode word:** `0x180002f0`
+- **Primary opcode (bits 0–5):** `6`
+- **Extended opcode:** `752`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (6) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `IMM` | 5-bit immediate |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21–27 | `XO` | extended opcode |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VB` | vcuxwfp128: read | Source B vector register. |
+| `UIMM` | vcuxwfp128: read | 16-bit unsigned immediate. Zero-extended. |
+| `VD` | vcuxwfp128: write | Destination vector register. |
+
+## Register Effects
+
+### `vcuxwfp128`
+
+- **Reads (always):** `VB`, `UIMM`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+_No condition-register or status-register effects._
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vcuxwfp128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcuxwfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:521`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L521)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:98`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L98)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:659`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L659)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4355-4362`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4355-L4362)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vcuxwfp128 => {
+            let uimm = (instr.raw >> 16) & 0x1F;
+            let b = ctx.vr[instr.vb128()].as_u32x4();
+            let mut r = [0f32; 4];
+            for i in 0..4 { r[i] = crate::vmx::cvt_u32_to_f32(b[i], uimm); }
+            ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **Unsigned fixed-point (uint32) → float with explicit scale.** Each lane computes `VD[i] = (float)VB.w[i] * 2^-UIMM`. Treats the 32-bit input as unsigned, so values ≥ `0x80000000` produce positive floats (unlike `vcsxwfp128` which would produce negatives).
+- **IEEE-754 binary32 output, round-to-nearest.** Precision loss above `2^24`.
+- **No `VSCR[SAT]` effect.**
+- **`VSCR[NJ]` does not affect the uint → float path.**
+- **VMX128 register-fusion** applies.
+- **No IBM AIX entry** — Xenon-only. Closest standard is [`vcfux`](../vmx/vcfux.md).
+- **No `Rc`, no XER / FPSCR.**
+
+## Related Instructions
+
+- [`vcfux`](../vmx/vcfux.md) — the standard Altivec `uint32 → float` with scale.
+- [`vcsxwfp128`](vcsxwfp128.md) — signed-int variant.
+- [`vcfpuxws128`](vcfpuxws128.md), [`vcfpsxws128`](vcfpsxws128.md) — the inverse (float → int with scale).
+
+## IBM Reference
+
+- No IBM AIX entry — Xbox 360 VMX128 extension only.
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions; cross-referenced with [IBM AltiVec Technology Programmer's Interface Manual §`vcfux`](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf).
--- a/migration/project-root/ppc-manual/vmx128/vmaddcfp128.md
+++ b/migration/project-root/ppc-manual/vmx128/vmaddcfp128.md
@@ -0,0 +1,148 @@
+# `vmaddcfp128` — Vector128 Multiply Add Floating Point
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128](../forms/VX128.md) · **Opcode:** `0x14000110`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vmaddcfp128` | `vmaddcfp128` | — | Vector128 Multiply Add Floating Point |
+
+## Syntax
+
+```asm
+vmaddcfp128 [VD], [VA], [VD], [VB]
+```
+
+## Encoding
+
+### `vmaddcfp128` — form `VX128`
+
+- **Opcode word:** `0x14000110`
+- **Primary opcode (bits 0–5):** `5`
+- **Extended opcode:** `272`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (4 or 5) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `VA128l` | source A low 5 bits |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21 | `VA128H` | source A high bit |
+| 22 | `—` | reserved |
+| 23–25 | `VC` | optional VC / XO sub-field |
+| 26 | `VA128h` | source A middle bit |
+| 27 | `—` | reserved |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VA` | vmaddcfp128: read | Source A vector register. |
+| `VD` | vmaddcfp128: read; vmaddcfp128: write | Destination vector register. |
+| `VB` | vmaddcfp128: read | Source B vector register. |
+
+## Register Effects
+
+### `vmaddcfp128`
+
+- **Reads (always):** `VA`, `VD`, `VB`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+_No condition-register or status-register effects._
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vmaddcfp128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaddcfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:812`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L812)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:100`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L100)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:614`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L614)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4492-4509`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4492-L4509)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vmaddcfp128 => {
+            // ISA: (VD) <- (VA × VD) + VB. Canary InstrEmit_vmaddcfp128 (cc:819): MulAdd(VA, VD, VB).
+            // Previous code computed di.mul_add(bi, ai) = VD×VB+VA — both operands wrong
+            // (PPCBUG-425). Fix: ai.mul_add(di, bi) = VA×VD+VB.
+            let a = ctx.vr[instr.va128()].as_f32x4();
+            let b = ctx.vr[instr.vb128()].as_f32x4();
+            let d = ctx.vr[instr.vd128()].as_f32x4();
+            let mut r = [0f32; 4];
+            for i in 0..4 {
+                let ai = vmx::flush_denorm(a[i]);
+                let bi = vmx::flush_denorm(b[i]);
+                let di = vmx::flush_denorm(d[i]);
+                // PPCBUG-437: flush subnormal output too.
+                r[i] = vmx::flush_denorm(ai.mul_add(di, bi));
+            }
+            ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **Xbox-specific fused multiply-add variant.** Each lane computes `VD[i] = VD[i] * VB[i] + VA[i]` — note that `VD` is both source and destination (xenia reads `VD` first, then writes). This is *not* the standard [`vmaddfp`](../vmx/vmaddfp.md) operand order: the "addend" position is `VA`, the other factor is `VB`, and `VD` carries the on-going accumulator. The mnemonic's trailing `c` denotes "accumulator-in-VD" rather than a separate `VC` operand.
+- **Fused, single-rounding.** Xenia uses `f32::mul_add`, which maps to a host FMA instruction when available. Bit-for-bit result depends on host support; xenia-canary's LLVM path emits the equivalent IR node.
+- **IEEE-754 binary32 lanes; `VSCR[NJ]` honoured.**
+- **No VSCR[SAT], no FPSCR update.**
+- **NaN propagation** per IEEE-754.
+- **VMX128 register-fusion** (7-bit IDs on `VA`, `VB`, `VD`).
+- **No IBM AIX entry** — Xenon-only.
+- **No `Rc`, no XER.**
+
+## Related Instructions
+
+- [`vmaddfp`](../vmx/vmaddfp.md), [`vmaddfp128`](../vmx/vmaddfp.md) — standard fused `(VA × VC) + VB`.
+- [`vmulfp128`](vmulfp128.md) — plain lane-wise float multiply.
+- [`vnmsubfp`](../vmx/vnmsubfp.md) — negative-multiply-subtract.
+- [`vmsum3fp128`](vmsum3fp128.md), [`vmsum4fp128`](vmsum4fp128.md) — dot-product reductions.
+
+## IBM Reference
+
+- No IBM AIX entry — this is an Xbox 360 VMX128 extension. Its semantics differ from the base Altivec [`vmaddfp`](../vmx/vmaddfp.md) in the operand order (accumulator in `VD`, not `VC`).
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions.
+- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for the base FMA semantics.
--- a/migration/project-root/ppc-manual/vmx128/vmsum3fp128.md
+++ b/migration/project-root/ppc-manual/vmx128/vmsum3fp128.md
@@ -0,0 +1,141 @@
+# `vmsum3fp128` — Vector128 Multiply Sum 3-way Floating Point
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128](../forms/VX128.md) · **Opcode:** `0x14000190`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vmsum3fp128` | `vmsum3fp128` | — | Vector128 Multiply Sum 3-way Floating Point |
+
+## Syntax
+
+```asm
+vmsum3fp128 [VD], [VA], [VB]
+```
+
+## Encoding
+
+### `vmsum3fp128` — form `VX128`
+
+- **Opcode word:** `0x14000190`
+- **Primary opcode (bits 0–5):** `5`
+- **Extended opcode:** `400`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (4 or 5) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `VA128l` | source A low 5 bits |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21 | `VA128H` | source A high bit |
+| 22 | `—` | reserved |
+| 23–25 | `VC` | optional VC / XO sub-field |
+| 26 | `VA128h` | source A middle bit |
+| 27 | `—` | reserved |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VA` | vmsum3fp128: read | Source A vector register. |
+| `VB` | vmsum3fp128: read | Source B vector register. |
+| `VD` | vmsum3fp128: write | Destination vector register. |
+
+## Register Effects
+
+### `vmsum3fp128`
+
+- **Reads (always):** `VA`, `VB`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+_No condition-register or status-register effects._
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vmsum3fp128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsum3fp128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1067`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1067)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:106`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L106)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:616`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L616)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4513-4523`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4513-L4523)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vmsum3fp128 => {
+            // PPCBUG-436: flush per-product intermediates (not just the final sum).
+            let a = ctx.vr[instr.va128()].as_f32x4();
+            let b = ctx.vr[instr.vb128()].as_f32x4();
+            let p0 = vmx::flush_denorm(a[0] * b[0]);
+            let p1 = vmx::flush_denorm(a[1] * b[1]);
+            let p2 = vmx::flush_denorm(a[2] * b[2]);
+            let s = vmx::flush_denorm(p0 + p1 + p2);
+            ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4(s, s, s, s);
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **3-way float dot product.** Computes `s = VA[0]*VB[0] + VA[1]*VB[1] + VA[2]*VB[2]` (ignoring lane 3 — the "w" component of a homogeneous vector) and **broadcasts `s` to every lane of `VD`**. Typical call site: 3D vector dot products where the w-component is padding.
+- **Scalar-result-splatted-across-lanes.** Consuming code can then use any lane of `VD` as the dot-product result.
+- **Rounding.** Xenia performs two adds in sequence (no fused triple-add in Rust). The order matches the spec but the summation order affects round-off by ~1 ulp. Games that need deterministic cross-host behaviour typically pre-scale their inputs.
+- **IEEE-754 binary32; `VSCR[NJ]` honoured.**
+- **No VSCR[SAT], no FPSCR update.**
+- **VMX128 register-fusion** (7-bit IDs on `VA`, `VB`, `VD`).
+- **No IBM AIX entry** — Xenon-only.
+- **No `Rc`, no XER.**
+
+## Related Instructions
+
+- [`vmsum4fp128`](vmsum4fp128.md) — 4-way dot-product (includes the w-lane).
+- [`vmulfp128`](vmulfp128.md), [`vaddfp`](../vmx/vaddfp.md) — the building blocks.
+- [`vmaddcfp128`](vmaddcfp128.md), [`vmaddfp`](../vmx/vmaddfp.md) — fused MAC variants.
+- [`vsumsws`](../vmx/vsumsws.md) — integer sum-reduction analogue.
+
+## IBM Reference
+
+- No IBM AIX entry — Xbox 360 VMX128 extension only.
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions. A 3-way dot product is a direct mirror of D3D9's `float3 dot`.
+- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for the base float arithmetic semantics.
--- a/migration/project-root/ppc-manual/vmx128/vmsum4fp128.md
+++ b/migration/project-root/ppc-manual/vmx128/vmsum4fp128.md
@@ -0,0 +1,142 @@
+# `vmsum4fp128` — Vector128 Multiply Sum 4-way Floating-Point
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128](../forms/VX128.md) · **Opcode:** `0x140001d0`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vmsum4fp128` | `vmsum4fp128` | — | Vector128 Multiply Sum 4-way Floating-Point |
+
+## Syntax
+
+```asm
+vmsum4fp128 [VD], [VA], [VB]
+```
+
+## Encoding
+
+### `vmsum4fp128` — form `VX128`
+
+- **Opcode word:** `0x140001d0`
+- **Primary opcode (bits 0–5):** `5`
+- **Extended opcode:** `464`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (4 or 5) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `VA128l` | source A low 5 bits |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21 | `VA128H` | source A high bit |
+| 22 | `—` | reserved |
+| 23–25 | `VC` | optional VC / XO sub-field |
+| 26 | `VA128h` | source A middle bit |
+| 27 | `—` | reserved |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VA` | vmsum4fp128: read | Source A vector register. |
+| `VB` | vmsum4fp128: read | Source B vector register. |
+| `VD` | vmsum4fp128: write | Destination vector register. |
+
+## Register Effects
+
+### `vmsum4fp128`
+
+- **Reads (always):** `VA`, `VB`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+_No condition-register or status-register effects._
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vmsum4fp128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsum4fp128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1077`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1077)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:106`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L106)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:617`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L617)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4524-4535`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4524-L4535)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vmsum4fp128 => {
+            // PPCBUG-436.
+            let a = ctx.vr[instr.va128()].as_f32x4();
+            let b = ctx.vr[instr.vb128()].as_f32x4();
+            let p0 = vmx::flush_denorm(a[0] * b[0]);
+            let p1 = vmx::flush_denorm(a[1] * b[1]);
+            let p2 = vmx::flush_denorm(a[2] * b[2]);
+            let p3 = vmx::flush_denorm(a[3] * b[3]);
+            let s = vmx::flush_denorm(p0 + p1 + p2 + p3);
+            ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4(s, s, s, s);
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **4-way float dot product.** Computes `s = VA[0]*VB[0] + VA[1]*VB[1] + VA[2]*VB[2] + VA[3]*VB[3]` (the full xyzw dot) and **broadcasts `s` to every lane of `VD`**.
+- **Scalar-result-splatted-across-lanes.** Direct mirror of HLSL/GLSL's `float4 dot`.
+- **Rounding.** Three sequential adds; round-off order affects result by ~1 ulp. Not an FMA in xenia.
+- **IEEE-754 binary32; `VSCR[NJ]` honoured.**
+- **No VSCR[SAT], no FPSCR update.**
+- **VMX128 register-fusion** (7-bit IDs on `VA`, `VB`, `VD`).
+- **No IBM AIX entry** — Xenon-only.
+- **No `Rc`, no XER.**
+
+## Related Instructions
+
+- [`vmsum3fp128`](vmsum3fp128.md) — 3-way dot-product (ignores the w-lane).
+- [`vmulfp128`](vmulfp128.md), [`vaddfp`](../vmx/vaddfp.md) — the building blocks.
+- [`vmaddcfp128`](vmaddcfp128.md), [`vmaddfp`](../vmx/vmaddfp.md) — fused MAC variants.
+- [`vsumsws`](../vmx/vsumsws.md) — integer sum-reduction analogue.
+
+## IBM Reference
+
+- No IBM AIX entry — Xbox 360 VMX128 extension only.
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions. Directly mirrors D3D9's `float4 dot`.
+- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for base float semantics.
--- a/migration/project-root/ppc-manual/vmx128/vmulfp128.md
+++ b/migration/project-root/ppc-manual/vmx128/vmulfp128.md
@@ -0,0 +1,142 @@
+# `vmulfp128` — Vector128 Multiply Floating-Point
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128](../forms/VX128.md) · **Opcode:** `0x14000090`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vmulfp128` | `vmulfp128` | — | Vector128 Multiply Floating-Point |
+
+## Syntax
+
+```asm
+vmulfp128 [VD], [VA], [VB]
+```
+
+## Encoding
+
+### `vmulfp128` — form `VX128`
+
+- **Opcode word:** `0x14000090`
+- **Primary opcode (bits 0–5):** `5`
+- **Extended opcode:** `144`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (4 or 5) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `VA128l` | source A low 5 bits |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21 | `VA128H` | source A high bit |
+| 22 | `—` | reserved |
+| 23–25 | `VC` | optional VC / XO sub-field |
+| 26 | `VA128h` | source A middle bit |
+| 27 | `—` | reserved |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VA` | vmulfp128: read | Source A vector register. |
+| `VB` | vmulfp128: read | Source B vector register. |
+| `VD` | vmulfp128: write | Destination vector register. |
+
+## Register Effects
+
+### `vmulfp128`
+
+- **Reads (always):** `VA`, `VB`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+_No condition-register or status-register effects._
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vmulfp128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1126`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1126)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:108`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L108)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:612`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L612)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2108-2120`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2108-L2120)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vmulfp128 => {
+            // PPCBUG-435 + PPCBUG-437.
+            let a = ctx.vr[instr.va128()].as_f32x4();
+            let b = ctx.vr[instr.vb128()].as_f32x4();
+            let mut r = [0f32; 4];
+            for i in 0..4 {
+                let ai = vmx::flush_denorm(a[i]);
+                let bi = vmx::flush_denorm(b[i]);
+                r[i] = vmx::flush_denorm(ai * bi);
+            }
+            ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **Lane-wise float multiply — Xenon-only.** Base Altivec has no dedicated `vmulfp`; the pattern on traditional PowerPC is `vmaddfp vD, vA, vC, v_zero`. Xenon adds this direct instruction, saving the zero-register setup.
+- **IEEE-754 binary32, round-to-nearest.** Each of the four lanes computes `VD[i] = VA[i] * VB[i]`.
+- **`VSCR[NJ]` honoured** (denormals flush-to-zero).
+- **NaN propagation** per IEEE-754.
+- **No VSCR[SAT], no FPSCR update, no exceptions.**
+- **VMX128 register-fusion** (7-bit IDs).
+- **No IBM AIX entry** — Xbox-specific; contrast with the `vmaddfp`-with-zero workaround used on non-Xenon Altivec.
+- **No `Rc`, no XER.**
+
+## Related Instructions
+
+- [`vmaddfp`](../vmx/vmaddfp.md), [`vmaddcfp128`](vmaddcfp128.md) — fused MAC forms.
+- [`vaddfp`](../vmx/vaddfp.md), [`vsubfp`](../vmx/vsubfp.md) — lane-wise float add/sub.
+- [`vmsum3fp128`](vmsum3fp128.md), [`vmsum4fp128`](vmsum4fp128.md) — dot-product reductions.
+
+## IBM Reference
+
+- No IBM AIX entry — this instruction is exclusive to the Xbox 360's VMX128 extension.
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions. Non-Xenon Altivec code emits `vmaddfp vD, vA, vC, v_zero` to achieve the same effect.
+- [IBM AltiVec Technology Programmer's Interface Manual §`vmaddfp`](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for the underlying float semantics.
--- a/migration/project-root/ppc-manual/vmx128/vpermwi128.md
+++ b/migration/project-root/ppc-manual/vmx128/vpermwi128.md
@@ -0,0 +1,139 @@
+# `vpermwi128` — Vector128 Permutate Word Immediate
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_P](../forms/VX128_P.md) · **Opcode:** `0x18000210`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vpermwi128` | `vpermwi128` | — | Vector128 Permutate Word Immediate |
+
+## Syntax
+
+```asm
+vpermwi128 [VD], [VB], [UIMM]
+```
+
+## Encoding
+
+### `vpermwi128` — form `VX128_P`
+
+- **Opcode word:** `0x18000210`
+- **Primary opcode (bits 0–5):** `6`
+- **Extended opcode:** `528`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (6) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `PERMl` | permute selector low 5 bits |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21–22 | `—` | reserved |
+| 23–25 | `PERMh` | permute selector high 3 bits |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VB` | vpermwi128: read | Source B vector register. |
+| `UIMM` | vpermwi128: read | 16-bit unsigned immediate. Zero-extended. |
+| `VD` | vpermwi128: write | Destination vector register. |
+
+## Register Effects
+
+### `vpermwi128`
+
+- **Reads (always):** `VB`, `UIMM`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+_No condition-register or status-register effects._
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vpermwi128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpermwi128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1207`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1207)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:112`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L112)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:642`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L642)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4537-4548`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4537-L4548)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vpermwi128 => {
+            let imm = instr.vx128_p_perm();
+            let b = ctx.vr[instr.vb128()].as_u32x4();
+            let mut r = [0u32; 4];
+            // Output lane i ← b[(imm >> (2 * (3-i))) & 3]
+            for i in 0..4 {
+                let sel = ((imm >> (2 * (3 - i))) & 3) as usize;
+                r[i] = b[sel];
+            }
+            ctx.vr[instr.vd128()] = xenia_types::Vec128::from_u32x4_array(r);
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **Word-level 4-way permute via an 8-bit immediate.** The 8-bit `PERM` immediate (carried in fields `PERMh ‖ PERMl` of the encoding) is treated as **four 2-bit selectors**, one per output word lane. Each 2-bit field selects which of `VB`'s 4 word lanes is copied to the corresponding output lane.
+- **Bit layout of the immediate.** Output lane 0 (big-endian MSB word) is selected by bits 6–7 of `PERM`; lane 1 by bits 4–5; lane 2 by bits 2–3; lane 3 by bits 0–1. (In xenia: `sel = (imm >> (2 * (3-i))) & 3`.)
+- **Super-set of [`vspltw`](../vmx/vspltw.md).** A splat is `vpermwi128 vD, vB, 0x00` (all lanes = word 0), `0x55` (all = word 1), `0xAA` (all = word 2), `0xFF` (all = word 3). Arbitrary shuffles like "xyzw → wzyx" are a single-instruction operation.
+- **Immediate-only.** No dynamic selector vector; contrast with [`vperm`](../vmx/vperm.md).
+- **Single-source.** Unlike `vperm`/`vperm128`, `vpermwi128` only reshuffles one register (`VB`); it cannot interleave two operands.
+- **VMX128 register-fusion** on `VD` and `VB` (7-bit IDs).
+- **No IBM AIX entry** — Xenon-only.
+- **No `Rc`, no XER, no VSCR.**
+
+## Related Instructions
+
+- [`vperm`](../vmx/vperm.md), [`vperm128`](../vmx/vperm.md) — general byte-granularity permute (two-source).
+- [`vspltw`](../vmx/vspltw.md), [`vspltw128`](../vmx/vspltw.md) — single-word splat (special case of `vpermwi128`).
+- [`vsldoi`](../vmx/vsldoi.md) — static-immediate byte rotate of two registers.
+- [`vrlimi128`](vrlimi128.md) — rotate + mask-insert (per-word rotate with an insert mask).
+
+## IBM Reference
+
+- No IBM AIX entry — this instruction is exclusive to the Xbox 360's VMX128 extension.
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions. Functionally equivalent to HLSL's `.xyzw`-suffix swizzle on `float4`.
+- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for the base permute semantics.
--- a/migration/project-root/ppc-manual/vmx128/vpkd3d128.md
+++ b/migration/project-root/ppc-manual/vmx128/vpkd3d128.md
@@ -0,0 +1,185 @@
+# `vpkd3d128` — Vector128 Pack D3Dtype, Rotate Left Immediate and Mask Insert
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_4](../forms/VX128_4.md) · **Opcode:** `0x18000610`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vpkd3d128` | `vpkd3d128` | — | Vector128 Pack D3Dtype, Rotate Left Immediate and Mask Insert |
+
+## Syntax
+
+```asm
+(no disassembly template)
+```
+
+## Encoding
+
+### `vpkd3d128` — form `VX128_4`
+
+- **Opcode word:** `0x18000610`
+- **Primary opcode (bits 0–5):** `6`
+- **Extended opcode:** `1552`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (6) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `IMM` | 5-bit immediate |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21–23 | `XO` | extended opcode |
+| 24–25 | `z` | sub-operation selector |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VB` | vpkd3d128: read | Source B vector register. |
+| `VD` | vpkd3d128: write | Destination vector register. |
+
+## Register Effects
+
+### `vpkd3d128`
+
+- **Reads (always):** `VB`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+_No condition-register or status-register effects._
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vpkd3d128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkd3d128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2088`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2088)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:112`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L112)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:648`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L648)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4191-4248`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4191-L4248)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vpkd3d128 => {
+            use crate::vmx::D3dPackType;
+            let uimm = crate::decoder::extract_vx128_uimm5(instr.raw);
+            let pack = (uimm & 3) as usize;
+            let shift = instr.vx128_4_z() as usize;
+            let ty = D3dPackType::from_immediate(uimm >> 2);
+            let src = ctx.vr[instr.vb128()];
+            let out = match ty {
+                D3dPackType::D3dColor     => crate::vmx::pack_d3dcolor(src),
+                D3dPackType::NormShort2   => crate::vmx::pack_normshort2(src),
+                D3dPackType::NormPacked32 => crate::vmx::pack_normpacked32(src),
+                D3dPackType::Float16_2    => crate::vmx::pack_float16_2(src),
+                D3dPackType::NormShort4   => crate::vmx::pack_normshort4(src),
+                D3dPackType::Float16_4    => crate::vmx::pack_float16_4(src),
+                D3dPackType::NormPacked64 => crate::vmx::pack_normpacked64(src),
+                D3dPackType::Other(t)     => {
+                    tracing::warn!(
+                        raw = format_args!("{:#010x}", instr.raw),
+                        uimm,
+                        ty = t,
+                        "vpkd3d128: unhandled pack type at {:#010x}",
+                        ctx.pc,
+                    );
+                    src
+                }
+            };
+            // Post-pack permutation: merge packed `out` into previous `vd`
+            // per canary ppc_emit_altivec.cc:2126-2188 MakePermuteMask tables.
+            // MakePermuteMask(r0,l0, r1,l1, r2,l2, r3,l3): result[i] = if ri==0 { prev[li] } else { out[li] }
+            let result = if pack == 0 {
+                out
+            } else {
+                // (source_reg, lane): 0=prev vd, 1=packed out
+                const PERM: [[[(u8, u8); 4]; 4]; 3] = [
+                    // pack=1 (VPACK_32): places out[3] at lane (3-shift)
+                    [[(0,0),(0,1),(0,2),(1,3)], [(0,0),(0,1),(1,3),(0,3)],
+                     [(0,0),(1,3),(0,2),(0,3)], [(1,3),(0,1),(0,2),(0,3)]],
+                    // pack=2 (64-bit): places out[2..3] at lanes (2-shift)..(3-shift)
+                    [[(0,0),(0,1),(1,2),(1,3)], [(0,0),(1,2),(1,3),(0,3)],
+                     [(1,2),(1,3),(0,2),(0,3)], [(1,3),(0,1),(0,2),(0,3)]],
+                    // pack=3 (64-bit): same as pack=2 except shift=3 selects out[2] at lane 3
+                    [[(0,0),(0,1),(1,2),(1,3)], [(0,0),(1,2),(1,3),(0,3)],
+                     [(1,2),(1,3),(0,2),(0,3)], [(0,0),(0,1),(0,2),(1,2)]],
+                ];
+                let prev = ctx.vr[instr.vd128()];
+                let pw = prev.as_u32x4();
+                let ow = out.as_u32x4();
+                let sel = PERM[pack - 1][shift];
+                xenia_types::Vec128::from_u32x4_array([
+                    if sel[0].0 == 0 { pw[sel[0].1 as usize] } else { ow[sel[0].1 as usize] },
+                    if sel[1].0 == 0 { pw[sel[1].1 as usize] } else { ow[sel[1].1 as usize] },
+                    if sel[2].0 == 0 { pw[sel[2].1 as usize] } else { ow[sel[2].1 as usize] },
+                    if sel[3].0 == 0 { pw[sel[3].1 as usize] } else { ow[sel[3].1 as usize] },
+                ])
+            };
+            ctx.vr[instr.vd128()] = result;
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **Pack four float lanes into a single D3D-format 32-bit word.** The `IMM` field and the `z` sub-operation selector (together carried in bits 6–10 of the encoding in xenia's layout) choose *which* D3D format to emit:
+  - `D3dColor` — pack 4×float `[0.0, 1.0]` lanes into a 32-bit RGBA8 (A in high byte, B in low byte) — the canonical Direct3D 9 `D3DCOLOR` format. Xenia's helper is `vmx::pack_d3dcolor`.
+  - Other formats (RGBA16, compressed colour, etc.) are not yet implemented in xenia-rs; the interpreter logs a warning and passes through unchanged.
+- **Also performs rotate-left-immediate and mask-insert.** The mnemonic is "Pack D3Dtype, Rotate Left Immediate and Mask Insert": the result of the pack step is rotated and merged into an existing `VD` under an immediate mask. Xenia currently emits only the pack step and overwrites `VD` wholesale; games rarely rely on the rotate-and-insert aspect.
+- **Sub-operation via the `z` field** (2 bits) + `IMM` (5 bits) gives 7 bits of format selection; the practical set used by Xenon games is small (D3DCOLOR is the dominant one).
+- **No saturation signal.** The packer saturates floats beyond `[0.0, 1.0]` silently; `VSCR[SAT]` is not touched.
+- **VMX128 register-fusion** on `VD` and `VB`.
+- **No IBM AIX entry** — Xenon-only.
+- **No `Rc`, no XER.**
+
+## Related Instructions
+
+- [`vupkd3d128`](vupkd3d128.md) — the inverse (unpack a D3D-format word back into 4 floats).
+- [`vpkpx`](../vmx/vpkpx.md) — the standard Altivec 1-5-5-5 pixel pack.
+- [`vpkshus`](../vmx/vpkshus.md), [`vpkuhus`](../vmx/vpkuhus.md) — byte-range saturating packs (an alternative colour-packing path).
+- [`vcfpsxws128`](vcfpsxws128.md), [`vcfpuxws128`](vcfpuxws128.md) — conversion with explicit scale; software sometimes pre-scales floats to `[0, 255]` before using these in place of `vpkd3d128`.
+
+## IBM Reference
+
+- No IBM AIX entry — Xbox 360 VMX128 extension only. The "D3D" in the mnemonic refers directly to Direct3D 9 vertex/pixel formats (the `D3DDECLTYPE_*` enumeration).
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions.
+- Microsoft D3D9 documentation: `D3DDECLTYPE_D3DCOLOR`, `D3DDECLTYPE_UBYTE4N`, etc.
--- a/migration/project-root/ppc-manual/vmx128/vrlimi128.md
+++ b/migration/project-root/ppc-manual/vmx128/vrlimi128.md
@@ -0,0 +1,141 @@
+# `vrlimi128` — Vector128 Rotate Left Immediate and Mask Insert
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_4](../forms/VX128_4.md) · **Opcode:** `0x18000710`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vrlimi128` | `vrlimi128` | — | Vector128 Rotate Left Immediate and Mask Insert |
+
+## Syntax
+
+```asm
+vrlimi128 [VD], [VB], [IMM], [z]
+```
+
+## Encoding
+
+### `vrlimi128` — form `VX128_4`
+
+- **Opcode word:** `0x18000710`
+- **Primary opcode (bits 0–5):** `6`
+- **Extended opcode:** `1808`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (6) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `IMM` | 5-bit immediate |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21–23 | `XO` | extended opcode |
+| 24–25 | `z` | sub-operation selector |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VB` | vrlimi128: read | Source B vector register. |
+| `VD` | vrlimi128: write | Destination vector register. |
+
+## Register Effects
+
+### `vrlimi128`
+
+- **Reads (always):** `VB`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+_No condition-register or status-register effects._
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vrlimi128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrlimi128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1315`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1315)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:119`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L119)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:649`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L649)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3962-3977`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3962-L3977)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vrlimi128 => {
+            let shift = instr.vx128_4_z() as usize;
+            let mask = instr.vx128_4_imm();
+            let b = ctx.vr[instr.vb128()].as_u32x4();
+            let d = ctx.vr[instr.vd128()].as_u32x4();
+            let rot = [b[shift % 4], b[(shift + 1) % 4], b[(shift + 2) % 4], b[(shift + 3) % 4]];
+            let mut r = [0u32; 4];
+            for i in 0..4 {
+                // mask bit 3 corresponds to word 0 (BE-first). Use rot when
+                // the corresponding mask bit is set.
+                let use_rot = (mask >> (3 - i)) & 1 == 1;
+                r[i] = if use_rot { rot[i] } else { d[i] };
+            }
+            ctx.vr[instr.vd128()] = xenia_types::Vec128::from_u32x4_array(r);
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **Rotate-left-word + mask-insert in one step.** `VB` is rotated left by `IMM & 3` word positions (word-granular, 0..3 — not bits). The resulting rotated vector is merged into the pre-existing `VD` under control of a 4-bit "insert mask" (`fmask`, from bits 26–29 of the encoding in xenia's layout): mask bit `i` = 1 keeps lane `i` from the rotated `VB`; mask bit = 0 keeps lane `i` from the old `VD`.
+- **Destructive destination.** `VD` is both source and destination — software must preserve its value or pre-initialise it.
+- **Typical use: selective-lane overwrite.** Games use this to "rewrite lane `n` of a vector with a shuffled component" without a full permute. A common pattern is "insert a scalar into lane `i` of a vector" where the scalar has been pre-loaded to a known word of `VB`.
+- **Mask bit ↔ lane mapping.** Big-endian: mask bit 3 (MSB of the 4-bit mask) controls lane 0; bit 0 controls lane 3. (In xenia: `use_rot = (mask >> (3 − i)) & 1`.)
+- **VMX128 register-fusion** on `VD` and `VB`.
+- **No IBM AIX entry** — Xenon-only.
+- **No `Rc`, no XER, no VSCR.**
+
+## Related Instructions
+
+- [`vrlw`](../vmx/vrlw.md), [`vrlw128`](../vmx/vrlw.md) — per-lane bit-level rotate (word-granular shift, not lane-granular).
+- [`vpermwi128`](vpermwi128.md) — immediate 4-way word permute (no merge).
+- [`vsel`](../vmx/vsel.md), [`vsel128`](../vmx/vsel.md) — general bit-select; `vrlimi128` is the specialised "rotate + insert" equivalent.
+- [`vsldoi`](../vmx/vsldoi.md) — byte-level immediate shift.
+
+## IBM Reference
+
+- No IBM AIX entry — this instruction is exclusive to the Xbox 360's VMX128 extension. The mnemonic is an adaptation of the scalar `rlwimi` (rotate-left-word-immediate-mask-insert) pattern for vectors.
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions.
+- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for the base rotate semantics.
--- a/migration/project-root/ppc-manual/vmx128/vupkd3d128.md
+++ b/migration/project-root/ppc-manual/vmx128/vupkd3d128.md
@@ -0,0 +1,154 @@
+# `vupkd3d128` — Vector128 Unpack D3Dtype
+
+> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_3](../forms/VX128_3.md) · **Opcode:** `0x180007f0`
+
+<!-- GENERATED: BEGIN -->
+
+## Assembler Mnemonics
+
+| Mnemonic | XML entry | Flags | Description |
+| --- | --- | --- | --- |
+| `vupkd3d128` | `vupkd3d128` | — | Vector128 Unpack D3Dtype |
+
+## Syntax
+
+```asm
+(no disassembly template)
+```
+
+## Encoding
+
+### `vupkd3d128` — form `VX128_3`
+
+- **Opcode word:** `0x180007f0`
+- **Primary opcode (bits 0–5):** `6`
+- **Extended opcode:** `2032`
+- **Synchronising:** no
+
+| Bits | Field | Meaning |
+| --- | --- | --- |
+| 0–5 | `OPCD` | primary opcode (6) |
+| 6–10 | `VD128l` | destination low 5 bits |
+| 11–15 | `IMM` | 5-bit immediate |
+| 16–20 | `VB128l` | source B low 5 bits |
+| 21–27 | `XO` | extended opcode |
+| 28–29 | `VD128h` | destination high 2 bits |
+| 30–31 | `VB128h` | source B high 2 bits |
+
+## Operands
+
+| Field | Role | Description |
+| --- | --- | --- |
+| `VB` | vupkd3d128: read | Source B vector register. |
+| `VD` | vupkd3d128: write | Destination vector register. |
+
+## Register Effects
+
+### `vupkd3d128`
+
+- **Reads (always):** `VB`
+- **Reads (conditional):** _none_
+- **Writes (always):** `VD`
+- **Writes (conditional):** _none_
+
+## Status-Register Effects
+
+_No condition-register or status-register effects._
+
+## Operation (pseudocode)
+
+```
+; Pseudocode derives directly from the xenia-rs interpreter
+; arm (see Implementation References). Operation semantics:
+;   - Read source operands from the fields listed under Operands.
+;   - Apply the arithmetic / logical / memory action described
+;     in the Description field above.
+;   - Write results to the destination register(s); update any
+;     status bits enumerated under Status-Register Effects.
+; Consult the IBM AIX reference link under IBM Reference for
+; canonical PPC-style pseudocode where xenia's expression is
+; terse.
+```
+
+## C Translation Example
+
+```c
+/* C translation: the xenia-rs interpreter arm below in           */
+/* Implementation References is the authoritative semantic        */
+/* snapshot. Translate it line-by-line:                            */
+/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
+/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
+/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
+/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
+/* The Register Effects and Status-Register Effects tables above  */
+/* enumerate every side effect a faithful translation must emit.  */
+```
+
+## Implementation References
+
+**`vupkd3d128`**
+- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vupkd3d128"`](../../xenia-canary/tools/ppc-instructions.xml)
+- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2194`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2194)
+- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:128`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L128)
+- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:670`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L670)
+- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4249-4275`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4249-L4275)
+<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
+
+```rust
+        PpcOpcode::vupkd3d128 => {
+            use crate::vmx::D3dPackType;
+            let uimm = crate::decoder::extract_vx128_uimm5(instr.raw);
+            let ty = D3dPackType::from_immediate(uimm >> 2);
+            let src = ctx.vr[instr.vb128()];
+            let out = match ty {
+                D3dPackType::D3dColor     => crate::vmx::unpack_d3dcolor(src),
+                D3dPackType::NormShort2   => crate::vmx::unpack_normshort2(src),
+                D3dPackType::NormPacked32 => crate::vmx::unpack_normpacked32(src),
+                D3dPackType::Float16_2    => crate::vmx::unpack_float16_2(src),
+                D3dPackType::NormShort4   => crate::vmx::unpack_normshort4(src),
+                D3dPackType::Float16_4    => crate::vmx::unpack_float16_4(src),
+                D3dPackType::NormPacked64 => crate::vmx::unpack_normpacked64(src),
+                D3dPackType::Other(t)     => {
+                    tracing::warn!(
+                        raw = format_args!("{:#010x}", instr.raw),
+                        uimm,
+                        ty = t,
+                        "vupkd3d128: unhandled pack type at {:#010x}",
+                        ctx.pc,
+                    );
+                    src
+                }
+            };
+            ctx.vr[instr.vd128()] = out;
+            ctx.pc += 4;
+        }
+```
+</details>
+
+<!-- GENERATED: END -->
+
+## Special Cases & Edge Conditions
+
+- **Unpack a D3D-format word into 4 float lanes.** The `IMM` field in the encoding selects the target format:
+  - `D3dColor` — decode a 32-bit RGBA8 (`D3DCOLOR`) into 4 float lanes in `[0.0, 1.0]`. Xenia's helper is `vmx::unpack_d3dcolor`.
+  - Other formats (UBYTE4N, SHORT2N, etc.) are not yet implemented in xenia-rs; the interpreter logs a warning and passes `VB` through unchanged.
+- **Inverse of [`vpkd3d128`](vpkd3d128.md).** The same format code used to pack must be used to unpack.
+- **Source-width is a single 32-bit word** of `VB` (typically lane 0; the helpers read the appropriate component). The other three input word lanes are ignored for `D3DCOLOR`.
+- **IEEE-754 binary32 outputs,** already normalised to `[0.0, 1.0]` (integer value divided by 255, then cast to float).
+- **No `VSCR[SAT]` effect**, no FPSCR, no exceptions.
+- **VMX128 register-fusion** on `VD` and `VB`.
+- **No IBM AIX entry** — Xenon-only.
+- **No `Rc`, no XER.**
+
+## Related Instructions
+
+- [`vpkd3d128`](vpkd3d128.md) — the inverse pack.
+- [`vupkhpx`](../vmx/vupkhpx.md), [`vupklpx`](../vmx/vupklpx.md) — standard Altivec 1-5-5-5 pixel unpacks.
+- [`vupkhsb`](../vmx/vupkhsb.md), [`vupklsb`](../vmx/vupklsb.md) — sign-extending byte→half-word unpacks (the integer analogue).
+- [`vcsxwfp128`](vcsxwfp128.md), [`vcuxwfp128`](vcuxwfp128.md) — int → float with scale; sometimes used as an alternate decode path.
+
+## IBM Reference
+
+- No IBM AIX entry — Xbox 360 VMX128 extension only. "D3D" denotes the Direct3D 9 vertex/pixel format catalogue (`D3DDECLTYPE_*`).
+- Xbox 360 XDK, Altivec-128 (VMX128) extensions.
+- Microsoft D3D9 documentation: `D3DDECLTYPE_D3DCOLOR`, `D3DDECLTYPE_UBYTE4N`, etc.