chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:
- claude-memory/ ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
(103 files, 1.1 MB - MEMORY.md + every
project_xenia_rs_*.md from audits
addis_signext through audit-058)
- project-root/dot-claude/ <project-root>/.claude/settings.json
(Stop hook + permissions)
- project-root/ppc-manual/ <project-root>/ppc-manual/
(PowerPC reference docs, 397 files, 3.7 MB)
- project-root/run-canary.sh <project-root>/run-canary.sh
- README.md Human-readable setup checklist
- setup.sh Idempotent installer (also reclones
xenia-canary at pinned HEAD 6de80dffe)
- MANIFEST.md Per-file mapping + per-file-not-bundled
restoration recipe
Excluded from bundle (not shippable via git):
- Sylpheed ISO (7.8 GB; copyright; manual copy required)
- sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
- target/ build artifacts (rebuild on target)
- audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
- audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
- xenia-canary checkout (setup.sh reclones from
git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
184
migration/project-root/ppc-manual/vmx/lvsl.md
Normal file
184
migration/project-root/ppc-manual/vmx/lvsl.md
Normal file
@@ -0,0 +1,184 @@
|
||||
# `lvsl` — Load Vector for Shift Left Indexed
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00000c`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `lvsl` | `lvsl` | — | Load Vector for Shift Left Indexed |
|
||||
| `lvsl128` | `lvsl128` | — | Load Vector for Shift Left Indexed 128 |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
lvsl [VD], [RA0], [RB]
|
||||
lvsl128 [VD], [RA0], [RB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `lvsl` — form `X`
|
||||
|
||||
- **Opcode word:** `0x7c00000c`
|
||||
- **Primary opcode (bits 0–5):** `31`
|
||||
- **Extended opcode:** `6`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode |
|
||||
| 6–10 | `RT/FRT/VRT` | destination |
|
||||
| 11–15 | `RA/FRA/VRA` | source A |
|
||||
| 16–20 | `RB/FRB/VRB` | source B |
|
||||
| 21–30 | `XO` | extended opcode (10 bits) |
|
||||
| 31 | `Rc` | record-form flag |
|
||||
|
||||
### `lvsl128` — form `VX128_1`
|
||||
|
||||
- **Opcode word:** `0x10000003`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `3`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `RA` | address register |
|
||||
| 16–20 | `RB` | offset register |
|
||||
| 21–27 | `XO` | extended opcode |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `—` | reserved |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `RA0` | lvsl: read; lvsl128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. |
|
||||
| `RB` | lvsl: read; lvsl128: read | Source GPR. |
|
||||
| `VD` | lvsl: write; lvsl128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `lvsl`
|
||||
|
||||
- **Reads (always):** `RA0`, `RB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `lvsl128`
|
||||
|
||||
- **Reads (always):** `RA0`, `RB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
addr_lo <- ((RA|0) + (RB))[60:63]
|
||||
for i in 0..15: VD[i] <- addr_lo + i
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* lvsl VD, RA, RB — load-shift-left permute control */
|
||||
uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA];
|
||||
uint8_t sh = (uint8_t)((base + r[insn.RB]) & 0xF);
|
||||
for (int i = 0; i < 16; ++i) v[insn.VD].b[i] = sh + i;
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`lvsl`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvsl"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:111`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L111)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:46`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L46)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:751`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L751)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2520-2529`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2520-L2529)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::lvsl | PpcOpcode::lvsl128 => {
|
||||
let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] };
|
||||
let ea = ea.wrapping_add(ctx.gpr[instr.rb()]);
|
||||
let sh = (ea & 0xF) as u8;
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = sh + i as u8; }
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::lvsl128) { instr.vd128() } else { instr.rd() };
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`lvsl128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvsl128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:114`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L114)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:46`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L46)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:412`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L412)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2520-2529`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2520-L2529)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::lvsl | PpcOpcode::lvsl128 => {
|
||||
let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] };
|
||||
let ea = ea.wrapping_add(ctx.gpr[instr.rb()]);
|
||||
let sh = (ea & 0xF) as u8;
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = sh + i as u8; }
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::lvsl128) { instr.vd128() } else { instr.rd() };
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Extended Pseudocode
|
||||
|
||||
```
|
||||
; lvsl VD, RA, RB — load vector for shift left (generates a permute mask)
|
||||
EA <- (RA|0) + (RB) ; full 64-bit EA; only the low 4 bits matter
|
||||
sh <- EA[60:63] ; bits 60..63 of EA (the misalignment)
|
||||
for i in 0..15:
|
||||
VD[i] <- sh + i ; bytes 0..15 of VD = {sh, sh+1, …, sh+15}
|
||||
```
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **No memory is actually read.** Despite the name, `lvsl` / `lvsr` do **not** touch memory. They consume the effective address only to extract the low four bits (the alignment offset) and materialise a 16-byte permute control vector in `VD`. They are pure "address → permute-mask" converters.
|
||||
- **Big-endian byte indexing.** `VD[0]` is the most-significant byte of the 128-bit register (lane 0). When `EA & 0xF == 0` the output is `{0, 1, 2, …, 15}`, i.e. the identity permute. When `EA & 0xF == 3` the output is `{3, 4, …, 18}` — modulo nothing, the values *do* exceed 15. That's intentional: fed into [`vperm`](vperm.md) (`vperm VD, VA, VB, VC`), byte selectors 0..15 index into `VA` and 16..31 index into `VB`. A stream of `lvsl` + two aligned `lvx` loads of consecutive 16-byte blocks + `vperm` reconstructs the unaligned 16-byte vector at `EA`.
|
||||
- **Pair with [`lvsr`](lvsr.md) for the opposite direction.** `lvsl` shifts "left" (toward the low index / high address byte); `lvsr` shifts "right". Which one to pick depends on which aligned block you're starting from — see the idiom below.
|
||||
- **Standard unaligned-load idiom.**
|
||||
```
|
||||
lvx vAL, r0, rA ; aligned block at EA & ~0xF
|
||||
lvx vAH, r0, rA + 16 ; next aligned block
|
||||
lvsl vC, r0, rA ; permute mask from misalignment
|
||||
vperm vD, vAL, vAH, vC ; the unaligned 16 bytes starting at EA
|
||||
```
|
||||
- **`RA0` semantics.** When `RA = 0` the base is the literal zero, so `lvsl vD, 0, rB` derives the mask from `rB & 0xF`.
|
||||
- **VMX128 sibling (`lvsl128`).** Same semantics; only the `VD` register is encoded with the 7-bit VMX128 register-fusion (`VD128l ‖ VD128h`) so `vD` may be `v0..v127`.
|
||||
- **No flags, no side effects** beyond writing `VD`. Trivial to move and schedule.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`lvsr`](lvsr.md) — the mirror: `VD[i] = 16 − sh + i`.
|
||||
- [`vperm`](vperm.md) — consumes the mask to perform arbitrary byte-level permutation across two vectors.
|
||||
- [`lvx`](lvx.md), [`lvlx`](lvlx.md), [`lvrx`](lvrx.md) — the actual memory loads used alongside the mask.
|
||||
- [`vsldoi`](vsldoi.md) — static-offset shift-double; when the shift is compile-time known, this is cheaper than the `lvsl`/`vperm` pair.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `lvsl` (Load Vector for Shift Left Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvsl-load-vector-shift-left-indexed)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual — unaligned-load idiom](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
181
migration/project-root/ppc-manual/vmx/lvsr.md
Normal file
181
migration/project-root/ppc-manual/vmx/lvsr.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# `lvsr` — Load Vector for Shift Right Indexed
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00004c`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `lvsr` | `lvsr` | — | Load Vector for Shift Right Indexed |
|
||||
| `lvsr128` | `lvsr128` | — | Load Vector for Shift Right Indexed 128 |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
lvsr [VD], [RA0], [RB]
|
||||
lvsr128 [VD], [RA0], [RB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `lvsr` — form `X`
|
||||
|
||||
- **Opcode word:** `0x7c00004c`
|
||||
- **Primary opcode (bits 0–5):** `31`
|
||||
- **Extended opcode:** `38`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode |
|
||||
| 6–10 | `RT/FRT/VRT` | destination |
|
||||
| 11–15 | `RA/FRA/VRA` | source A |
|
||||
| 16–20 | `RB/FRB/VRB` | source B |
|
||||
| 21–30 | `XO` | extended opcode (10 bits) |
|
||||
| 31 | `Rc` | record-form flag |
|
||||
|
||||
### `lvsr128` — form `VX128_1`
|
||||
|
||||
- **Opcode word:** `0x10000043`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `67`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `RA` | address register |
|
||||
| 16–20 | `RB` | offset register |
|
||||
| 21–27 | `XO` | extended opcode |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `—` | reserved |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `RA0` | lvsr: read; lvsr128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. |
|
||||
| `RB` | lvsr: read; lvsr128: read | Source GPR. |
|
||||
| `VD` | lvsr: write; lvsr128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `lvsr`
|
||||
|
||||
- **Reads (always):** `RA0`, `RB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `lvsr128`
|
||||
|
||||
- **Reads (always):** `RA0`, `RB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
addr_lo <- ((RA|0) + (RB))[60:63]
|
||||
for i in 0..15: VD[i] <- 16 − addr_lo + i
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`lvsr`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvsr"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:126`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L126)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:46`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L46)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:762`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L762)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2530-2539`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2530-L2539)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::lvsr | PpcOpcode::lvsr128 => {
|
||||
let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] };
|
||||
let ea = ea.wrapping_add(ctx.gpr[instr.rb()]);
|
||||
let sh = (ea & 0xF) as u8;
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = (16 - sh) + i as u8; }
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::lvsr128) { instr.vd128() } else { instr.rd() };
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`lvsr128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvsr128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:129`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L129)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:46`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L46)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:413`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L413)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2530-2539`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2530-L2539)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::lvsr | PpcOpcode::lvsr128 => {
|
||||
let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] };
|
||||
let ea = ea.wrapping_add(ctx.gpr[instr.rb()]);
|
||||
let sh = (ea & 0xF) as u8;
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = (16 - sh) + i as u8; }
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::lvsr128) { instr.vd128() } else { instr.rd() };
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **No memory access.** Like [`lvsl`](lvsl.md), `lvsr` does not touch memory: the effective address is consumed solely to extract the low four bits, which then drive the synthesised permute mask in `VD`.
|
||||
- **Mirror of `lvsl`.** Where `lvsl` produces `{sh, sh+1, …, sh+15}`, `lvsr` produces `{16−sh, 17−sh, …, 31−sh}`. When `EA & 0xF == 0` the output is `{16, 17, …, 31}` — the identity permute that selects all of `VB` (in the `vperm VD, VA, VB, VC` orientation). When `EA & 0xF == 3` the output is `{13, 14, …, 28}`, splitting the `vperm` between the high three bytes of `VA` and the low thirteen of `VB`.
|
||||
- **Big-endian byte indexing.** `VD[0]` is the most-significant byte (the byte at the lowest address after a `stvx`).
|
||||
- **Right-shift unaligned-load idiom.** Pair with two aligned `lvx` and a `vperm` when the source data is laid out so the wanted vector starts in the *second* aligned block:
|
||||
```
|
||||
lvx vAL, r0, rA ; aligned block at EA & ~0xF
|
||||
lvx vAH, r0, rA + 16 ; next aligned block
|
||||
lvsr vC, r0, rA ; right-shift permute mask
|
||||
vperm vD, vAH, vAL, vC ; note: vAH then vAL — opposite of lvsl
|
||||
```
|
||||
The argument flip versus the `lvsl` idiom is the whole reason both masks exist.
|
||||
- **`RA0` semantics.** When `RA = 0` the base is the literal zero, so `lvsr vD, 0, rB` derives the mask from `rB & 0xF`.
|
||||
- **Selectors >15 are intentional.** Inside `vperm`, byte selectors with bit 4 set (i.e. `>= 16`) index into the second source vector. `lvsr` deliberately produces values up to `31`, since only the low five bits are honoured by `vperm`.
|
||||
- **VMX128 sibling (`lvsr128`).** Identical semantics; the extended `VD128l ‖ VD128h` encoding lets `vD` reach `v0..v127`.
|
||||
- **No flags, no exceptions, trivially reorderable.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`lvsl`](lvsl.md) — the mirror: `VD[i] = sh + i`.
|
||||
- [`vperm`](vperm.md) — consumes the mask to perform arbitrary byte-level permutation across two vectors.
|
||||
- [`lvx`](lvx.md), [`lvlx`](lvlx.md), [`lvrx`](lvrx.md) — the actual memory loads that supply the two aligned halves.
|
||||
- [`vsldoi`](vsldoi.md) — when the misalignment is a compile-time constant, the static-offset shift is cheaper than the `lvsr`/`vperm` pair.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `lvsr` (Load Vector for Shift Right Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvsr-load-vector-shift-right-indexed-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual — unaligned-load idiom](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
133
migration/project-root/ppc-manual/vmx/vaddcuw.md
Normal file
133
migration/project-root/ppc-manual/vmx/vaddcuw.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# `vaddcuw` — Vector Add Carryout Unsigned Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000180`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vaddcuw` | `vaddcuw` | — | Vector Add Carryout Unsigned Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vaddcuw [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vaddcuw` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000180`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `384`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vaddcuw: read | Source A vector register. |
|
||||
| `VB` | vaddcuw: read | Source B vector register. |
|
||||
| `VD` | vaddcuw: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vaddcuw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vaddcuw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddcuw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:325`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L325)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:466`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L466)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3380-3390`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3380-L3390)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vaddcuw => {
|
||||
let a = ctx.vr[instr.ra()].as_u32x4();
|
||||
let b = ctx.vr[instr.rb()].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 {
|
||||
let (_, c) = a[i].overflowing_add(b[i]);
|
||||
r[i] = if c { 1 } else { 0 };
|
||||
}
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Carry-out only — the sum is discarded.** Each of the four 32-bit lanes computes `1` if `VA[i] + VB[i]` overflows in unsigned arithmetic, else `0`. The actual modulo sum lives wherever a paired [`vadduwm`](vadduwm.md) is scheduled.
|
||||
- **Big-endian word lanes.** Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. Each lane is 32-bit unsigned; output values are exactly `0` or `1`, padded to 32 bits.
|
||||
- **Builds wide-integer adds.** Pair `vaddcuw` with [`vadduwm`](vadduwm.md) and a left-byte shift to chain four 32-bit adds into a single 128-bit add — the canonical Altivec implementation of `__uint128_t` arithmetic. To carry into the *next* lane you typically apply [`vsldoi`](vsldoi.md) by 4 bytes and a [`vadduwm`](vadduwm.md).
|
||||
- **Unsigned only.** There is no `vaddcsw` (signed-carry) — the operation is intrinsically unsigned because "carry" is undefined for signed two's-complement.
|
||||
- **No `VSCR[SAT]` update.** Modulo carry is always representable; nothing saturates. XER is also untouched (Altivec never updates `XER[CA]`).
|
||||
- **No VMX128 sibling.** Only the 32-register VX form exists.
|
||||
- **Aliasing legal.** `vaddcuw v3, v3, v4` works as expected.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vadduwm`](vadduwm.md) — the modulo sum that `vaddcuw` complements; together they form a full 32-bit-with-carry add.
|
||||
- [`vsubcuw`](vsubcuw.md) — the matching borrow-out (returns `1` when *no* borrow occurred — i.e. when `VA[i] >= VB[i]`).
|
||||
- [`vsldoi`](vsldoi.md) — used to align the carry vector for the next lane during multi-precision chains.
|
||||
- [`vaddubm`](vaddubm.md), [`vadduhm`](vadduhm.md) — modulo siblings at narrower lane widths (no carrying-instruction variant exists for 8- or 16-bit lanes).
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vaddcuw` (Vector Add Carry-Out Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddcuw-vector-add-carryout-unsigned-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — multi-precision arithmetic idiom](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
189
migration/project-root/ppc-manual/vmx/vaddfp.md
Normal file
189
migration/project-root/ppc-manual/vmx/vaddfp.md
Normal file
@@ -0,0 +1,189 @@
|
||||
# `vaddfp` — Vector Add Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000000a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vaddfp` | `vaddfp` | — | Vector Add Floating Point |
|
||||
| `vaddfp128` | `vaddfp128` | — | Vector128 Add Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vaddfp [VD], [VA], [VB]
|
||||
vaddfp128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vaddfp` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000000a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `10`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vaddfp128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000010`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `16`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vaddfp: read; vaddfp128: read | Source A vector register. |
|
||||
| `VB` | vaddfp: read; vaddfp128: read | Source B vector register. |
|
||||
| `VD` | vaddfp: write; vaddfp128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vaddfp`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vaddfp128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
for each 32-bit float lane i in 0..3:
|
||||
VD[i] <- VA[i] + VB[i]
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* vaddfp VD, VA, VB — lane-wise float add */
|
||||
for (int i = 0; i < 4; ++i) v[insn.VD].f[i] = v[insn.VA].f[i] + v[insn.VB].f[i];
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vaddfp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddfp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:341`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L341)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:438`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L438)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1984-1998`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1984-L1998)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vaddfp => {
|
||||
// PPCBUG-435: VSCR.NJ=1 (Xbox 360 always boots with this set) requires
|
||||
// flush-to-zero on subnormal inputs and outputs. Canary VMX float
|
||||
// arithmetic flushes denormals unconditionally.
|
||||
let a = ctx.vr[instr.ra()].as_f32x4();
|
||||
let b = ctx.vr[instr.rb()].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 {
|
||||
let ai = vmx::flush_denorm(a[i]);
|
||||
let bi = vmx::flush_denorm(b[i]);
|
||||
r[i] = vmx::flush_denorm(ai + bi);
|
||||
}
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vaddfp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:344`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L344)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:610`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L610)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1999-2011`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1999-L2011)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vaddfp128 => {
|
||||
// PPCBUG-435: same as vaddfp.
|
||||
let a = ctx.vr[instr.va128()].as_f32x4();
|
||||
let b = ctx.vr[instr.vb128()].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 {
|
||||
let ai = vmx::flush_denorm(a[i]);
|
||||
let bi = vmx::flush_denorm(b[i]);
|
||||
r[i] = vmx::flush_denorm(ai + bi);
|
||||
}
|
||||
ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Extended Pseudocode
|
||||
|
||||
```
|
||||
; Four independent lane-wise IEEE-754 single-precision adds
|
||||
for i in 0..3:
|
||||
VD[i] <- VA[i] + VB[i] ; binary32, rounded to nearest
|
||||
|
||||
; No FPSCR update (VMX uses VSCR, which only has NJ / SAT — and vaddfp doesn't saturate)
|
||||
```
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Lane indexing is big-endian.** Lane 0 is the **most significant** 4 bytes of the 128-bit register (the one that appears at the lowest byte offset after a `stvx`). Xenia's `Vec128::as_f32x4()` already reads lanes in PPC order on x86-64. When writing C that manipulates individual lanes, index `v.f[0]` as "the byte 0..3" of the big-endian layout.
|
||||
- **Flush-denormals ("NJ") mode.** Altivec is independent of FPSCR — it has its own 2-bit VSCR (`NJ` for non-Java mode + `SAT` sticky-saturation). VMX float operations honour `VSCR[NJ]`: when set (the Xenon boot default), denormal inputs and outputs are flushed to zero. This is **opposite** to the scalar FPU, which has its own non-IEEE bit. Xenia sets `NJ = 1` at context creation ([`context.rs`](../../xenia-rs/crates/xenia-cpu/src/context.rs)).
|
||||
- **No exception, no trap.** Altivec floats never raise exceptions. NaN inputs produce NaN outputs; `±∞ − ±∞` yields a NaN; there is no VXISI-style status bit. `VSCR[SAT]` is **not** touched by `vaddfp` (it saturates integer ops, not floats).
|
||||
- **Four independent lanes.** Each lane's operation is unaffected by the others. Aliasing between `VA`, `VB`, and `VD` is legal and common (`vaddfp v3, v3, v4`).
|
||||
- **VMX128 sibling (`vaddfp128`).** Semantics identical; only the register encoding differs. VMX128 uses a 7-bit operand ID per source (and destination) built from two or three non-contiguous bit fields — see [`categories/vmx128.md`](../categories/vmx128.md). Any bit pattern encodable as a 32-register VX-form is also encodable as a VMX128 form, so compilers picked the more compact form that reached the needed register range.
|
||||
- **On x86-64 hosts.** A natural compilation uses `_mm_add_ps` or AVX `vaddps`. These preserve lane indexing because PPC lane 0 maps to x86 lane 3 only if you treat the 128-bit value as "big-endian in memory" — i.e. byte-swap on load/store. With xenia's `_be` memory helpers, `_mm_add_ps` gives the right per-lane result.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vsubfp`](vsubfp.md) — lane-wise float subtract.
|
||||
- [`vmaddfp`](vmaddfp.md) — lane-wise `(VA × VC) + VB` (fused multiply-add with single rounding).
|
||||
- [`vnmsubfp`](vnmsubfp.md) — `−((VA × VC) − VB)`.
|
||||
- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — IEEE-754-aware max/min (NaN propagation).
|
||||
- [`vcmpeqfp`](vcmpeqfp.md), [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md), [`vcmpbfp`](vcmpbfp.md) — compares producing per-lane all-ones / all-zero masks.
|
||||
- [`vrfin`](vrfin.md), [`vrfim`](vrfim.md), [`vrfip`](vrfip.md), [`vrfiz`](vrfiz.md) — round to integer (to-nearest / down / up / toward-zero).
|
||||
- [`vmulfp`](vmulfp.md) — xenia's helper; not a native Altivec op, included for convenience. Hardware games use `vmaddfp v, va, vc, v0_zero` instead.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vaddfp` (Vector Add Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddfp-vector-add-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
136
migration/project-root/ppc-manual/vmx/vaddsbs.md
Normal file
136
migration/project-root/ppc-manual/vmx/vaddsbs.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# `vaddsbs` — Vector Add Signed Byte Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000300`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vaddsbs` | `vaddsbs` | — | Vector Add Signed Byte Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vaddsbs [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vaddsbs` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000300`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `768`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vaddsbs: read | Source A vector register. |
|
||||
| `VB` | vaddsbs: read | Source B vector register. |
|
||||
| `VD` | vaddsbs: write | Destination vector register. |
|
||||
| `VSCR` | vaddsbs: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vaddsbs`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vaddsbs`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vaddsbs`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddsbs"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:348`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L348)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:498`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L498)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3258-3269`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3258-L3269)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vaddsbs => {
|
||||
let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i8; 16]; let mut sat = false;
|
||||
for i in 0..16 {
|
||||
let (v, s) = crate::vmx::sat_add_i8(a[i], b[i]);
|
||||
r[i] = v; sat |= s;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i8x16(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Sixteen signed-byte lanes, saturating.** Each `VD[i] = clamp(VA[i] + VB[i], -128, +127)` for `i = 0..15`, with both inputs interpreted as signed `int8`. Lane 0 is the most-significant byte (the byte at the lowest address after `stvx`).
|
||||
- **`VSCR[SAT]` is sticky-set** when *any* lane saturates — either positively (overflow above `+127`) or negatively (underflow below `-128`). The SAT bit is never cleared by this op; software must use [`mtvscr`](mtvscr.md) to clear it. Xenia routes the OR of per-lane saturation flags into `ctx.set_vscr_sat(true)` exactly when at least one lane clamped (see `crate::vmx::sat_add_i8` in [`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)).
|
||||
- **Compare with the modulo sibling.** [`vaddubm`](vaddubm.md) is bit-pattern-identical to a hypothetical `vaddsbm` and silently wraps without touching `VSCR[SAT]`. Use `vaddsbs` whenever clipping is desired and you need the sticky overflow flag.
|
||||
- **Asymmetric clamp.** `+127 + 1 = +127`; `-128 + (-1) = -128`. Tests that look for "any saturation" should mask both saturation directions.
|
||||
- **No XER side effects.** Altivec never updates `XER[CA]` / `XER[OV]`. The only status bit affected is `VSCR[SAT]`.
|
||||
- **Aliasing legal.** `vaddsbs v3, v3, v4` is the standard accumulate idiom for a clamping sum.
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vaddubs`](vaddubs.md) — same width, **unsigned** saturating add (clamps to `0..255`).
|
||||
- [`vaddubm`](vaddubm.md) — same width, modulo (non-saturating) add; sign-agnostic.
|
||||
- [`vaddshs`](vaddshs.md), [`vaddsws`](vaddsws.md) — signed saturating add at half / word width.
|
||||
- [`vsubsbs`](vsubsbs.md) — the matching signed saturating subtract.
|
||||
- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the sticky `VSCR[SAT]` bit observed here.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vaddsbs` (Vector Add Signed Byte Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddsbs-vector-add-signed-byte-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
137
migration/project-root/ppc-manual/vmx/vaddshs.md
Normal file
137
migration/project-root/ppc-manual/vmx/vaddshs.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# `vaddshs` — Vector Add Signed Half Word Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000340`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vaddshs` | `vaddshs` | — | Vector Add Signed Half Word Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vaddshs [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vaddshs` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000340`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `832`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vaddshs: read | Source A vector register. |
|
||||
| `VB` | vaddshs: read | Source B vector register. |
|
||||
| `VD` | vaddshs: write | Destination vector register. |
|
||||
| `VSCR` | vaddshs: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vaddshs`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vaddshs`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vaddshs`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddshs"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:356`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L356)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:505`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L505)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3306-3317`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3306-L3317)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vaddshs => {
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i16; 8]; let mut sat = false;
|
||||
for i in 0..8 {
|
||||
let (v, s) = crate::vmx::sat_add_i16(a[i], b[i]);
|
||||
r[i] = v; sat |= s;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Eight signed-half lanes, saturating.** Each `VD[i] = clamp(VA[i] + VB[i], -32768, +32767)` for `i = 0..7`, with both inputs interpreted as signed `int16`. Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half.
|
||||
- **`VSCR[SAT]` is sticky-set** if *any* lane clamps. Once set, it stays set until explicit clear via [`mtvscr`](mtvscr.md). Xenia uses `crate::vmx::sat_add_i16` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) which returns the per-lane saturation flag; the OR is written back via `ctx.set_vscr_sat(true)`.
|
||||
- **The modulo counterpart is `vadduhm`.** Modulo add for signed and unsigned halves is bit-identical, so [`vadduhm`](vadduhm.md) covers both when wraparound is wanted; switch to `vaddshs` only when clipping with sign awareness is desired.
|
||||
- **Asymmetric clamp.** `+32767 + 1 = +32767`; `-32768 + (-1) = -32768`.
|
||||
- **Common 16-bit DSP idiom.** Audio mixing and fixed-point colour blending lean heavily on `vaddshs` to combine signed Q15 / Q1.15 quantities without wraparound artefacts.
|
||||
- **No XER side effects, no NJ involvement** (this is an integer op).
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vadduhs`](vadduhs.md) — same width, unsigned saturating add (clamps to `0..0xFFFF`).
|
||||
- [`vadduhm`](vadduhm.md) — same width, modulo add; sign-agnostic.
|
||||
- [`vaddsbs`](vaddsbs.md), [`vaddsws`](vaddsws.md) — signed saturating add at byte / word width.
|
||||
- [`vsubshs`](vsubshs.md) — the matching signed saturating subtract.
|
||||
- [`vmhaddshs`](vmhaddshs.md), [`vmhraddshs`](vmhraddshs.md) — signed-half multiply-add with saturation, common for fixed-point DSP.
|
||||
- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the `VSCR[SAT]` bit affected here.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vaddshs` (Vector Add Signed Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddshs-vector-add-signed-half-word-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
138
migration/project-root/ppc-manual/vmx/vaddsws.md
Normal file
138
migration/project-root/ppc-manual/vmx/vaddsws.md
Normal file
@@ -0,0 +1,138 @@
|
||||
# `vaddsws` — Vector Add Signed Word Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000380`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vaddsws` | `vaddsws` | — | Vector Add Signed Word Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vaddsws [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vaddsws` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000380`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `896`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vaddsws: read | Source A vector register. |
|
||||
| `VB` | vaddsws: read | Source B vector register. |
|
||||
| `VD` | vaddsws: write | Destination vector register. |
|
||||
| `VSCR` | vaddsws: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vaddsws`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vaddsws`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vaddsws`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddsws"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:364`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L364)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:512`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L512)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3354-3365`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3354-L3365)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vaddsws => {
|
||||
let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i32; 4]; let mut sat = false;
|
||||
for i in 0..4 {
|
||||
let (v, s) = crate::vmx::sat_add_i32(a[i], b[i]);
|
||||
r[i] = v; sat |= s;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Four signed-word lanes, saturating.** Each `VD[i] = clamp(VA[i] + VB[i], INT32_MIN, INT32_MAX)` for `i = 0..3`. Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word.
|
||||
- **`VSCR[SAT]` is sticky-set** if any lane clamps. Xenia tracks this through `crate::vmx::sat_add_i32` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) and ORs the flag into the architectural `VSCR[SAT]`.
|
||||
- **No multi-precision carry.** Unlike [`vaddcuw`](vaddcuw.md), `vaddsws` does not expose a per-lane carry/borrow — a saturated lane simply clips; it does not overflow into the adjacent lane.
|
||||
- **Asymmetric clamp.** `INT32_MAX + 1 = INT32_MAX`; `INT32_MIN + (-1) = INT32_MIN`.
|
||||
- **The modulo sibling is `vadduwm`.** Modulo add for signed and unsigned words is bit-identical; switch to `vaddsws` only when clipping with sign awareness is desired.
|
||||
- **No XER side effects.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Accumulate four 32-bit signed sums per cycle (e.g. dot products of int16 lanes after a [`vmsumshs`](vmsumshs.md) — which already saturates internally — for further accumulation across multiple iterations).
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vadduws`](vadduws.md) — same width, unsigned saturating add.
|
||||
- [`vadduwm`](vadduwm.md) — same width, modulo (non-saturating) add; sign-agnostic.
|
||||
- [`vaddsbs`](vaddsbs.md), [`vaddshs`](vaddshs.md) — signed saturating add at byte / half width.
|
||||
- [`vsubsws`](vsubsws.md) — the matching signed saturating subtract.
|
||||
- [`vmsumshs`](vmsumshs.md), [`vmsumuhs`](vmsumuhs.md) — saturating multiply-sum that often feeds a `vaddsws` chain.
|
||||
- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the `VSCR[SAT]` bit.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vaddsws` (Vector Add Signed Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddsws-vector-add-signed-word-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
132
migration/project-root/ppc-manual/vmx/vaddubm.md
Normal file
132
migration/project-root/ppc-manual/vmx/vaddubm.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# `vaddubm` — Vector Add Unsigned Byte Modulo
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000000`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vaddubm` | `vaddubm` | — | Vector Add Unsigned Byte Modulo |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vaddubm [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vaddubm` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000000`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `0`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vaddubm: read | Source A vector register. |
|
||||
| `VB` | vaddubm: read | Source B vector register. |
|
||||
| `VD` | vaddubm: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vaddubm`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vaddubm`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddubm"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:372`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L372)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:434`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L434)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3198-3205`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3198-L3205)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vaddubm => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = a[i].wrapping_add(b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Sixteen independent byte lanes.** `VD[i] = (VA[i] + VB[i]) mod 256` for `i = 0..15`. Lane 0 is the most-significant byte (the byte at the lowest address after `stvx`).
|
||||
- **Modulo wrap, not saturating.** Overflow silently wraps in 8-bit unsigned arithmetic — there is no carry-out and **`VSCR[SAT]` is not touched**. This is the same bit pattern as a signed-byte modulo add, so `vaddubm` is also the de-facto `vaddsbm` (which doesn't exist in the ISA — modulo arithmetic is sign-agnostic).
|
||||
- **No carry, no flags.** XER is untouched (Altivec never updates `XER[CA]`/`XER[OV]`). The dedicated [`vaddcuw`](vaddcuw.md) instruction exists *only* because there is no SAT/CA byproduct — extracting the carry needs an explicit op.
|
||||
- **Aliasing is legal.** `vaddubm v3, v3, v4` (in-place accumulate) is a single-cycle issue on Xenon's VMX pipe.
|
||||
- **VSCR untouched.** Neither `SAT` nor `NJ` is read or written. Schedulable next to floats, compares and saturating ops without dependency stalls.
|
||||
- **Pairs with a saturating sibling.** When you need 8-bit add with clamping, switch to [`vaddubs`](vaddubs.md) (unsigned saturate, range `0..0xFF`) or [`vaddsbs`](vaddsbs.md) (signed saturate, range `-128..+127`) — both of which *do* sticky-set `VSCR[SAT]`.
|
||||
- **No VMX128 sibling.** The `vaddubm` opcode is not exposed as a `*128` form; the 32-register encoding is the only one available.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vaddubs`](vaddubs.md) — same lane width, unsigned saturating add (`SAT` sticky-set on overflow).
|
||||
- [`vaddsbs`](vaddsbs.md) — same lane width, signed saturating add.
|
||||
- [`vadduhm`](vadduhm.md), [`vadduwm`](vadduwm.md) — modulo add with 8-lane half / 4-lane word width.
|
||||
- [`vaddcuw`](vaddcuw.md) — produces the per-lane carry bits a 32-bit modulo add discards.
|
||||
- [`vsububm`](vsububm.md) — the matching modulo subtract.
|
||||
- [`vavgub`](vavgub.md) — unsigned byte average (carry-aware: `(a + b + 1) >> 1`), useful when byte addition needs rounding without overflow.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vaddubm` (Vector Add Unsigned Byte Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddubm-vector-add-unsigned-byte-modulo-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
138
migration/project-root/ppc-manual/vmx/vaddubs.md
Normal file
138
migration/project-root/ppc-manual/vmx/vaddubs.md
Normal file
@@ -0,0 +1,138 @@
|
||||
# `vaddubs` — Vector Add Unsigned Byte Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000200`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vaddubs` | `vaddubs` | — | Vector Add Unsigned Byte Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vaddubs [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vaddubs` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000200`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `512`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vaddubs: read | Source A vector register. |
|
||||
| `VB` | vaddubs: read | Source B vector register. |
|
||||
| `VD` | vaddubs: write | Destination vector register. |
|
||||
| `VSCR` | vaddubs: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vaddubs`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vaddubs`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vaddubs`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddubs"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:379`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L379)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:475`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L475)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3233-3245`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3233-L3245)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vaddubs => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
let mut sat = false;
|
||||
for i in 0..16 {
|
||||
let (v, s) = crate::vmx::sat_add_u8(a[i], b[i]);
|
||||
r[i] = v; sat |= s;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Sixteen unsigned-byte lanes, saturating.** Each `VD[i] = min(VA[i] + VB[i], 0xFF)` for `i = 0..15`. Lane 0 is the most-significant byte after `stvx`.
|
||||
- **`VSCR[SAT]` is sticky-set** if any lane saturates. Once set, it stays set until [`mtvscr`](mtvscr.md) clears it. Xenia computes this with `crate::vmx::sat_add_u8` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)).
|
||||
- **One-sided clamp.** Only the upper bound applies (unsigned add cannot underflow). Distinct from [`vaddsbs`](vaddsbs.md), which clips at both `+127` and `-128`.
|
||||
- **Pixel-blend workhorse.** Common usage is to add two unsigned-byte colour vectors with clamp-to-white at `0xFF`. Saturation behaves the same way as `_mm_adds_epu8` on x86 SSE2 — making it a one-to-one host translation candidate.
|
||||
- **Versus modulo.** [`vaddubm`](vaddubm.md) wraps silently and never touches `VSCR[SAT]`. Use `vaddubs` when overflow indicates "too bright" / "out of range" and you want to flag it sticky.
|
||||
- **No XER side effects.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vaddubm`](vaddubm.md) — same width, modulo (non-saturating) add.
|
||||
- [`vaddsbs`](vaddsbs.md) — same width, signed saturating add (range `-128..+127`).
|
||||
- [`vadduhs`](vadduhs.md), [`vadduws`](vadduws.md) — unsigned saturating add at half / word width.
|
||||
- [`vsububs`](vsububs.md) — the matching unsigned saturating subtract (clamps to `0`).
|
||||
- [`vavgub`](vavgub.md) — rounding average; alternative when you want `(a + b + 1) >> 1` without overflow worry.
|
||||
- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the sticky `VSCR[SAT]` bit.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vaddubs` (Vector Add Unsigned Byte Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddubs-vector-add-unsigned-byte-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
131
migration/project-root/ppc-manual/vmx/vadduhm.md
Normal file
131
migration/project-root/ppc-manual/vmx/vadduhm.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# `vadduhm` — Vector Add Unsigned Half Word Modulo
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000040`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vadduhm` | `vadduhm` | — | Vector Add Unsigned Half Word Modulo |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vadduhm [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vadduhm` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000040`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `64`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vadduhm: read | Source A vector register. |
|
||||
| `VB` | vadduhm: read | Source B vector register. |
|
||||
| `VD` | vadduhm: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vadduhm`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vadduhm`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vadduhm"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:387`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L387)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:441`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L441)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3214-3221`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3214-L3221)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vadduhm => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 { r[i] = a[i].wrapping_add(b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Eight half-word lanes.** `VD[i] = (VA[i] + VB[i]) mod 65536` for `i = 0..7`. Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half.
|
||||
- **Modulo wrap, not saturating.** Overflow silently wraps in 16-bit arithmetic; **`VSCR[SAT]` is not touched** and there is no carry-out. Sign-agnostic — modulo add for signed `int16` and unsigned `u16` is bit-pattern-identical, so this is also the de-facto `vaddshm`.
|
||||
- **No XER, no NJ involvement.**
|
||||
- **Aliasing legal.** `vadduhm v3, v3, v4` is a single-issue accumulate.
|
||||
- **Pairs with saturating siblings.** Switch to [`vadduhs`](vadduhs.md) for unsigned clamp at `0xFFFF` or [`vaddshs`](vaddshs.md) for signed clamp at `±32767` when overflow needs to be detected via sticky `VSCR[SAT]`.
|
||||
- **Common usage.** Multi-precision adds composed from 16-bit lanes; UV-coordinate accumulation; per-pixel half-precision counters.
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vadduhs`](vadduhs.md) — same width, unsigned saturating add.
|
||||
- [`vaddshs`](vaddshs.md) — same width, signed saturating add.
|
||||
- [`vaddubm`](vaddubm.md), [`vadduwm`](vadduwm.md) — modulo add at byte / word width.
|
||||
- [`vsubuhm`](vsubuhm.md) — the matching modulo subtract.
|
||||
- [`vavguh`](vavguh.md) — unsigned half-word rounding average; useful when addition needs to stay representable.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vadduhm` (Vector Add Unsigned Half Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vadduhm-vector-add-unsigned-half-word-modulo-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
136
migration/project-root/ppc-manual/vmx/vadduhs.md
Normal file
136
migration/project-root/ppc-manual/vmx/vadduhs.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# `vadduhs` — Vector Add Unsigned Half Word Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000240`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vadduhs` | `vadduhs` | — | Vector Add Unsigned Half Word Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vadduhs [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vadduhs` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000240`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `576`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vadduhs: read | Source A vector register. |
|
||||
| `VB` | vadduhs: read | Source B vector register. |
|
||||
| `VD` | vadduhs: write | Destination vector register. |
|
||||
| `VSCR` | vadduhs: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vadduhs`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vadduhs`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vadduhs`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vadduhs"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:394`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L394)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:482`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L482)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3282-3293`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3282-L3293)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vadduhs => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u16; 8]; let mut sat = false;
|
||||
for i in 0..8 {
|
||||
let (v, s) = crate::vmx::sat_add_u16(a[i], b[i]);
|
||||
r[i] = v; sat |= s;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Eight unsigned-half lanes, saturating.** Each `VD[i] = min(VA[i] + VB[i], 0xFFFF)` for `i = 0..7`. Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half.
|
||||
- **`VSCR[SAT]` is sticky-set** if any lane clamps. Cleared only by [`mtvscr`](mtvscr.md). Xenia uses `crate::vmx::sat_add_u16` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) and ORs the per-lane flag.
|
||||
- **One-sided clamp.** Unsigned add cannot underflow, so only the upper bound `0xFFFF` ever clips.
|
||||
- **The modulo counterpart is `vadduhm`.** Use `vadduhs` when "too large to fit" must be flagged or clipped — typical for accumulating Q16 unsigned counters.
|
||||
- **No XER side effects.**
|
||||
- **Maps directly to `_mm_adds_epu16`** on SSE2 hosts — semantically identical, including the sticky-saturation observation step (xenia recovers the SAT flag from the per-lane comparison).
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vadduhm`](vadduhm.md) — same width, modulo (non-saturating) add.
|
||||
- [`vaddshs`](vaddshs.md) — same width, signed saturating add (range `-32768..+32767`).
|
||||
- [`vaddubs`](vaddubs.md), [`vadduws`](vadduws.md) — unsigned saturating add at byte / word width.
|
||||
- [`vsubuhs`](vsubuhs.md) — the matching unsigned saturating subtract.
|
||||
- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the sticky `VSCR[SAT]` bit.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vadduhs` (Vector Add Unsigned Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vadduhs-vector-add-unsigned-half-word-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
131
migration/project-root/ppc-manual/vmx/vadduwm.md
Normal file
131
migration/project-root/ppc-manual/vmx/vadduwm.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# `vadduwm` — Vector Add Unsigned Word Modulo
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000080`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vadduwm` | `vadduwm` | — | Vector Add Unsigned Word Modulo |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vadduwm [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vadduwm` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000080`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `128`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vadduwm: read | Source A vector register. |
|
||||
| `VB` | vadduwm: read | Source B vector register. |
|
||||
| `VD` | vadduwm: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vadduwm`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vadduwm`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vadduwm"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:402`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L402)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:448`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L448)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2396-2403`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2396-L2403)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vadduwm => {
|
||||
let a = ctx.vr[instr.ra()].as_u32x4();
|
||||
let b = ctx.vr[instr.rb()].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[i].wrapping_add(b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Four 32-bit word lanes.** `VD[i] = (VA[i] + VB[i]) mod 2^32` for `i = 0..3`. Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word.
|
||||
- **Modulo wrap, not saturating.** Carry is dropped; **`VSCR[SAT]` is not touched**. Sign-agnostic — bit-pattern-identical for signed `int32` and unsigned `u32` modulo addition.
|
||||
- **Multi-precision idiom.** Pair with [`vaddcuw`](vaddcuw.md) to recover the per-lane carry, then [`vsldoi`](vsldoi.md) the carry one word left and feed it back into another `vadduwm` to chain a 128-bit add.
|
||||
- **No XER, no NJ involvement.**
|
||||
- **Aliasing legal.** `vadduwm v3, v3, v4`.
|
||||
- **No VMX128 sibling** in the `vadduwm` mnemonic specifically; `vaddfp128` covers the float case, but integer-modulo-word stays VMX-only.
|
||||
- **Common usage.** RGBA8 packed-pixel sums; per-tile counters; BigInt limbs.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vaddcuw`](vaddcuw.md) — produces the per-lane carry that `vadduwm` discards.
|
||||
- [`vadduws`](vadduws.md), [`vaddsws`](vaddsws.md) — unsigned / signed saturating add at the same width.
|
||||
- [`vaddubm`](vaddubm.md), [`vadduhm`](vadduhm.md) — modulo add at byte / half width.
|
||||
- [`vsubuwm`](vsubuwm.md) — the matching modulo subtract.
|
||||
- [`vsldoi`](vsldoi.md) — used to align carries during multi-precision chains.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vadduwm` (Vector Add Unsigned Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vadduwm-vector-add-unsigned-word-modulo-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Integer Arithmetic & multi-precision idiom](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
137
migration/project-root/ppc-manual/vmx/vadduws.md
Normal file
137
migration/project-root/ppc-manual/vmx/vadduws.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# `vadduws` — Vector Add Unsigned Word Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000280`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vadduws` | `vadduws` | — | Vector Add Unsigned Word Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vadduws [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vadduws` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000280`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `640`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vadduws: read | Source A vector register. |
|
||||
| `VB` | vadduws: read | Source B vector register. |
|
||||
| `VD` | vadduws: write | Destination vector register. |
|
||||
| `VSCR` | vadduws: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vadduws`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vadduws`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vadduws`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vadduws"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:409`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L409)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:489`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L489)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3330-3341`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3330-L3341)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vadduws => {
|
||||
let a = ctx.vr[instr.ra()].as_u32x4();
|
||||
let b = ctx.vr[instr.rb()].as_u32x4();
|
||||
let mut r = [0u32; 4]; let mut sat = false;
|
||||
for i in 0..4 {
|
||||
let (v, s) = crate::vmx::sat_add_u32(a[i], b[i]);
|
||||
r[i] = v; sat |= s;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Four unsigned-word lanes, saturating.** Each `VD[i] = min(VA[i] + VB[i], 0xFFFF_FFFF)` for `i = 0..3`. Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word.
|
||||
- **`VSCR[SAT]` is sticky-set** if any lane clamps. Cleared only via [`mtvscr`](mtvscr.md). Xenia uses `crate::vmx::sat_add_u32` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)).
|
||||
- **One-sided clamp** at `UINT32_MAX`. There is no underflow path for unsigned add.
|
||||
- **The modulo counterpart is `vadduwm`.** Use `vadduws` only when overflow needs to be visible / clamped; otherwise the modulo form is one cycle and never touches the sticky bit.
|
||||
- **No XER side effects, no carry exposure.** Unlike `vadduwm + vaddcuw`, the saturating form does **not** make the carry available — it is fused into the clamp.
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Pixel sums where four packed unsigned 32-bit accumulators must clip at white; counter overflow detection.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vadduwm`](vadduwm.md) — same width, modulo add (no saturation, no SAT flag).
|
||||
- [`vaddsws`](vaddsws.md) — same width, signed saturating add.
|
||||
- [`vaddubs`](vaddubs.md), [`vadduhs`](vadduhs.md) — unsigned saturating add at byte / half width.
|
||||
- [`vsubuws`](vsubuws.md) — the matching unsigned saturating subtract.
|
||||
- [`vaddcuw`](vaddcuw.md) — explicit carry-out (paired with the modulo form).
|
||||
- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the sticky `VSCR[SAT]` bit.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vadduws` (Vector Add Unsigned Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vadduws-vector-add-unsigned-word-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
181
migration/project-root/ppc-manual/vmx/vand.md
Normal file
181
migration/project-root/ppc-manual/vmx/vand.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# `vand` — Vector Logical AND
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000404`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vand` | `vand` | — | Vector Logical AND |
|
||||
| `vand128` | `vand128` | — | Vector128 Logical AND |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vand [VD], [VA], [VB]
|
||||
vand128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vand` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000404`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1028`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vand128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000210`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `528`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vand: read; vand128: read | Source A vector register. |
|
||||
| `VB` | vand: read; vand128: read | Source B vector register. |
|
||||
| `VD` | vand: write; vand128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vand`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vand128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vand`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vand"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:423`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L423)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:91`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L91)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:521`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L521)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2208-2216`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2208-L2216)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vand | PpcOpcode::vand128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[i] & b[i]; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vand128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vand128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:426`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L426)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:91`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L91)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:619`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L619)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2208-2216`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2208-L2216)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vand | PpcOpcode::vand128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[i] & b[i]; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Bitwise across the full 128 bits.** `VD = VA & VB`. Lane width is irrelevant — the AND is bit-for-bit and there is no lane boundary. Xenia chooses to express this as four `u32` ANDs, but any widening (`u8`, `u16`, `u64`, `u128`) is observationally identical.
|
||||
- **No flags, no exceptions, no `VSCR` interaction.** Pure combinational op; one of the cheapest VMX instructions.
|
||||
- **Common usage with compares.** Compare ops produce per-lane all-ones / all-zero masks; `vand` with the mask selects the matching lanes (clearing the rest). For "select-by-mask" with a non-zero alternative use [`vsel`](vsel.md) instead.
|
||||
- **Idiom: clear lanes.** `vand VD, VD, vZero` zeroes a register; in practice [`vxor VD, VD, VD`](vxor.md) is preferred since it doesn't need a zero-vector source.
|
||||
- **Aliasing legal.** All three operands may overlap.
|
||||
- **VMX128 sibling (`vand128`).** Identical semantics with the extended 128-register encoding; xenia reuses one match arm via the `vmx_reg_triple` helper (see [`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs)).
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vandc`](vandc.md) — `VA & ~VB`; useful for clearing bits selected by a mask.
|
||||
- [`vor`](vor.md), [`vxor`](vxor.md), [`vnor`](vnor.md) — the rest of the bitwise family.
|
||||
- [`vsel`](vsel.md) — bit-wise select using a mask: `(VC & VB) | (~VC & VA)`. The recommended idiom whenever the "false" path is non-zero.
|
||||
- [`vcmpequb`](vcmpequb.md) and other compares — natural mask producers.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vand` (Vector Logical AND)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vand-vector-logical-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Logical Operations](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
181
migration/project-root/ppc-manual/vmx/vandc.md
Normal file
181
migration/project-root/ppc-manual/vmx/vandc.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# `vandc` — Vector Logical AND with Complement
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000444`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vandc` | `vandc` | — | Vector Logical AND with Complement |
|
||||
| `vandc128` | `vandc128` | — | Vector128 Logical AND with Complement |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vandc [VD], [VA], [VB]
|
||||
vandc128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vandc` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000444`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1092`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vandc128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000250`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `592`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vandc: read; vandc128: read | Source A vector register. |
|
||||
| `VB` | vandc: read; vandc128: read | Source B vector register. |
|
||||
| `VD` | vandc: write; vandc128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vandc`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vandc128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vandc`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vandc"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:436`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L436)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:91`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L91)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:526`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L526)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2217-2225`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2217-L2225)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vandc | PpcOpcode::vandc128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[i] & !b[i]; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vandc128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vandc128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:439`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L439)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:91`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L91)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:621`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L621)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2217-2225`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2217-L2225)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vandc | PpcOpcode::vandc128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[i] & !b[i]; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Bitwise AND-with-complement of the full 128 bits.** `VD = VA & ~VB`. Lane width is irrelevant — the operation is bit-for-bit. Order matters: `vandc VA, VB` is *not* the same as `vandc VB, VA`.
|
||||
- **Standard "clear bits in mask" idiom.** Drop bits selected by the mask in `VB`: `vandc VD, VD, vMask`. Equivalent to `VD &= ~vMask`. Cheaper than synthesising the complement first with [`vnor`](vnor.md) and then ANDing.
|
||||
- **Compare → mask → mask-out idiom.** A compare produces per-lane all-ones; pair with `vandc` to keep only the lanes where the compare was *false*. The complement avoids an extra [`vnor`](vnor.md) or `vxor` with all-ones.
|
||||
- **No flags, no exceptions, no `VSCR` interaction.**
|
||||
- **Aliasing legal.** `vandc VD, VD, VD` clears `VD` (`x & ~x = 0`).
|
||||
- **VMX128 sibling (`vandc128`).** Identical semantics with the extended 128-register encoding; xenia reuses one match arm.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vand`](vand.md) — the un-complemented sibling.
|
||||
- [`vor`](vor.md), [`vxor`](vxor.md), [`vnor`](vnor.md) — the rest of the bitwise family.
|
||||
- [`vsel`](vsel.md) — bitwise select using a third register; useful when the "false" branch is non-zero.
|
||||
- [`vcmpequb`](vcmpequb.md) and other compares — natural mask producers.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vandc` (Vector Logical AND with Complement)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vandc-vector-logical-complement-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Logical Operations](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vavgsb.md
Normal file
130
migration/project-root/ppc-manual/vmx/vavgsb.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vavgsb` — Vector Average Signed Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000502`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vavgsb` | `vavgsb` | — | Vector Average Signed Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vavgsb [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vavgsb` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000502`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1282`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vavgsb: read | Source A vector register. |
|
||||
| `VB` | vavgsb: read | Source B vector register. |
|
||||
| `VD` | vavgsb: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vavgsb`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vavgsb`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavgsb"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:443`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L443)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:533`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L533)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3410-3417`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3410-L3417)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vavgsb => {
|
||||
let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i8; 16];
|
||||
for i in 0..16 { r[i] = crate::vmx::avg_i8(a[i], b[i]); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i8x16(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Sixteen signed-byte rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, performed in arithmetic *wider* than 8 bits (so the `+1` cannot overflow). The result is then truncated back to `int8` — saturation never triggers because the average of two `int8` values fits in `int8`. Rounding is "round half up toward +∞".
|
||||
- **Big-endian byte lanes.** Lane 0 is the most-significant byte after `stvx`.
|
||||
- **No `VSCR[SAT]` impact.** Mathematical impossibility — `(a + b + 1) / 2` for `a, b ∈ [-128, 127]` always lies in `[-128, 127]`. Xenia's `crate::vmx::avg_i8` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) widens to `i16` before the add.
|
||||
- **No XER side effects.**
|
||||
- **Common usage.** Filtering / decimation passes, motion-compensation half-pel interpolation in older video codecs (the rounding-up bias matches MPEG/H.263 averaging conventions).
|
||||
- **Aliasing legal.** `vavgsb v3, v3, v4` is a typical lowpass-step idiom.
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vavgub`](vavgub.md) — same width, unsigned rounding average.
|
||||
- [`vavgsh`](vavgsh.md), [`vavgsw`](vavgsw.md) — signed rounding average at half / word width.
|
||||
- [`vaddubm`](vaddubm.md), [`vaddsbs`](vaddsbs.md) — addition variants without the rounding-divide step.
|
||||
- [`vsububm`](vsububm.md) — modulo subtract; needed for differential before averaging.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vavgsb` (Vector Average Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavgsb-vector-average-signed-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vavgsh.md
Normal file
130
migration/project-root/ppc-manual/vmx/vavgsh.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vavgsh` — Vector Average Signed Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000542`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vavgsh` | `vavgsh` | — | Vector Average Signed Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vavgsh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vavgsh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000542`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1346`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vavgsh: read | Source A vector register. |
|
||||
| `VB` | vavgsh: read | Source B vector register. |
|
||||
| `VD` | vavgsh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vavgsh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vavgsh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavgsh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:450`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L450)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:535`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L535)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3426-3433`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3426-L3433)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vavgsh => {
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i16; 8];
|
||||
for i in 0..8 { r[i] = crate::vmx::avg_i16(a[i], b[i]); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Eight signed-half rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, computed in 32-bit arithmetic to avoid overflow on the intermediate sum, then truncated back to `int16`. Rounding is half-up toward +∞.
|
||||
- **Big-endian half lanes.** Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half.
|
||||
- **No `VSCR[SAT]` impact.** The result is always representable in `int16`. Xenia's `crate::vmx::avg_i16` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) widens to `i32` before adding.
|
||||
- **No XER side effects.**
|
||||
- **Common usage.** Audio sample interpolation, fixed-point Q15 midpoint filters, video upscaling at 16-bit precision.
|
||||
- **Aliasing legal.** `vavgsh v3, v3, v4` collapses two half-precision streams into one.
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vavguh`](vavguh.md) — same width, unsigned rounding average.
|
||||
- [`vavgsb`](vavgsb.md), [`vavgsw`](vavgsw.md) — signed rounding average at byte / word width.
|
||||
- [`vadduhm`](vadduhm.md), [`vaddshs`](vaddshs.md) — addition variants without the divide step.
|
||||
- [`vsubuhm`](vsubuhm.md) — modulo subtract; difference computation before averaging.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vavgsh` (Vector Average Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavgsh-vector-average-signed-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
129
migration/project-root/ppc-manual/vmx/vavgsw.md
Normal file
129
migration/project-root/ppc-manual/vmx/vavgsw.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# `vavgsw` — Vector Average Signed Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000582`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vavgsw` | `vavgsw` | — | Vector Average Signed Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vavgsw [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vavgsw` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000582`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1410`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vavgsw: read | Source A vector register. |
|
||||
| `VB` | vavgsw: read | Source B vector register. |
|
||||
| `VD` | vavgsw: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vavgsw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vavgsw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavgsw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:457`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L457)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:537`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L537)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3442-3449`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3442-L3449)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vavgsw => {
|
||||
let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i32; 4];
|
||||
for i in 0..4 { r[i] = crate::vmx::avg_i32(a[i], b[i]); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Four signed-word rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, computed in 64-bit arithmetic to avoid intermediate overflow, then truncated back to `int32`. Rounding is half-up toward +∞.
|
||||
- **Big-endian word lanes.** Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word.
|
||||
- **No `VSCR[SAT]` impact.** The mathematical result always fits in `int32`. Xenia's `crate::vmx::avg_i32` widens to `i64` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)).
|
||||
- **No XER side effects.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vavguw`](vavguw.md) — same width, unsigned rounding average.
|
||||
- [`vavgsb`](vavgsb.md), [`vavgsh`](vavgsh.md) — signed rounding average at byte / half width.
|
||||
- [`vadduwm`](vadduwm.md), [`vaddsws`](vaddsws.md) — addition variants without the divide step.
|
||||
- [`vsubuwm`](vsubuwm.md) — modulo subtract; difference computation before averaging.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vavgsw` (Vector Average Signed Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavgsw-vector-average-signed-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vavgub.md
Normal file
130
migration/project-root/ppc-manual/vmx/vavgub.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vavgub` — Vector Average Unsigned Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000402`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vavgub` | `vavgub` | — | Vector Average Unsigned Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vavgub [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vavgub` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000402`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1026`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vavgub: read | Source A vector register. |
|
||||
| `VB` | vavgub: read | Source B vector register. |
|
||||
| `VD` | vavgub: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vavgub`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vavgub`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavgub"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:468`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L468)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:520`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L520)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3402-3409`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3402-L3409)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vavgub => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = crate::vmx::avg_u8(a[i], b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Sixteen unsigned-byte rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, computed in 16-bit arithmetic so the `+1` cannot overflow, then truncated back to `u8`. Rounding is half-up.
|
||||
- **Big-endian byte lanes.** Lane 0 is the most-significant byte after `stvx`.
|
||||
- **No `VSCR[SAT]` impact.** The result always fits in `u8` (the average of two `u8` values is at most `255`). Xenia uses `crate::vmx::avg_u8` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)).
|
||||
- **Equivalent to `_mm_avg_epu8`** on x86 SSE2 — semantically identical (rounding mode and width match).
|
||||
- **Common usage.** Pixel-blend `(A + B + 1) / 2`, MPEG/H.264 half-pel motion-compensation averaging, downscale filters, alpha midpoint.
|
||||
- **Aliasing legal.** `vavgub v3, v3, v4`.
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vavgsb`](vavgsb.md) — same width, signed rounding average.
|
||||
- [`vavguh`](vavguh.md), [`vavguw`](vavguw.md) — unsigned rounding average at half / word width.
|
||||
- [`vaddubm`](vaddubm.md), [`vaddubs`](vaddubs.md) — addition variants without the divide step.
|
||||
- [`vsububm`](vsububm.md) — modulo subtract; difference before averaging.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vavgub` (Vector Average Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavgub-vector-average-unsigned-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vavguh.md
Normal file
130
migration/project-root/ppc-manual/vmx/vavguh.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vavguh` — Vector Average Unsigned Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000442`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vavguh` | `vavguh` | — | Vector Average Unsigned Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vavguh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vavguh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000442`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1090`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vavguh: read | Source A vector register. |
|
||||
| `VB` | vavguh: read | Source B vector register. |
|
||||
| `VD` | vavguh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vavguh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vavguh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavguh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:475`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L475)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:525`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L525)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3418-3425`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3418-L3425)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vavguh => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 { r[i] = crate::vmx::avg_u16(a[i], b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Eight unsigned-half rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, computed in 32-bit arithmetic to avoid the intermediate `+1` overflowing, then truncated to `u16`. Rounding is half-up.
|
||||
- **Big-endian half lanes.** Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half.
|
||||
- **No `VSCR[SAT]` impact.** The result always fits in `u16`. Xenia uses `crate::vmx::avg_u16` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)).
|
||||
- **Equivalent to `_mm_avg_epu16`** on x86 SSE2 — same rounding, same width.
|
||||
- **Common usage.** Higher-precision pixel blending (e.g. RGB565 sums after widening), Q16 unsigned filters.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vavgsh`](vavgsh.md) — same width, signed rounding average.
|
||||
- [`vavgub`](vavgub.md), [`vavguw`](vavguw.md) — unsigned rounding average at byte / word width.
|
||||
- [`vadduhm`](vadduhm.md), [`vadduhs`](vadduhs.md) — addition variants without the divide step.
|
||||
- [`vsubuhm`](vsubuhm.md) — modulo subtract; difference before averaging.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vavguh` (Vector Average Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavguh-vector-average-unsigned-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vavguw.md
Normal file
130
migration/project-root/ppc-manual/vmx/vavguw.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vavguw` — Vector Average Unsigned Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000482`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vavguw` | `vavguw` | — | Vector Average Unsigned Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vavguw [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vavguw` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000482`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1154`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vavguw: read | Source A vector register. |
|
||||
| `VB` | vavguw: read | Source B vector register. |
|
||||
| `VD` | vavguw: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vavguw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vavguw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavguw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:482`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L482)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:530`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L530)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3434-3441`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3434-L3441)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vavguw => {
|
||||
let a = ctx.vr[instr.ra()].as_u32x4();
|
||||
let b = ctx.vr[instr.rb()].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = crate::vmx::avg_u32(a[i], b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Four unsigned-word rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, computed in 64-bit arithmetic to avoid intermediate overflow, then truncated to `u32`. Rounding is half-up.
|
||||
- **Big-endian word lanes.** Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word.
|
||||
- **No `VSCR[SAT]` impact.** The result always fits in `u32`. Xenia uses `crate::vmx::avg_u32` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)).
|
||||
- **No SSE2 direct equivalent.** SSE2 only provides `_mm_avg_epu8` and `_mm_avg_epu16`; on x86 hosts xenia has to widen to 64-bit and do the average manually.
|
||||
- **Common usage.** Per-tile counters; midpoint of two 32-bit packed values.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vavgsw`](vavgsw.md) — same width, signed rounding average.
|
||||
- [`vavgub`](vavgub.md), [`vavguh`](vavguh.md) — unsigned rounding average at byte / half width.
|
||||
- [`vadduwm`](vadduwm.md), [`vadduws`](vadduws.md) — addition variants without the divide step.
|
||||
- [`vsubuwm`](vsubuwm.md) — modulo subtract; difference before averaging.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vavguw` (Vector Average Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavguw-vector-average-unsigned-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
131
migration/project-root/ppc-manual/vmx/vcfsx.md
Normal file
131
migration/project-root/ppc-manual/vmx/vcfsx.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# `vcfsx` — Vector Convert from Signed Fixed-Point Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000034a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcfs` | `vcfsx` | — | Vector Convert from Signed Fixed-Point Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcfsx [VD], [VB], [UIMM]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcfsx` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000034a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `842`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vcfsx: read | Source B vector register. |
|
||||
| `UIMM` | vcfsx: read | 16-bit unsigned immediate. Zero-extended. |
|
||||
| `VD` | vcfsx: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcfsx`
|
||||
|
||||
- **Reads (always):** `VB`, `UIMM`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcfsx`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcfsx"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:500`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L500)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:93`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L93)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:509`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L509)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4306-4313`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4306-L4313)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcfsx => {
|
||||
let uimm = (instr.raw >> 16) & 0x1F;
|
||||
let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]);
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = crate::vmx::cvt_i32_to_f32(b[i], uimm); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Convert signed-Q `int32` lane to `binary32`.** For each of the four word lanes, `VD[i] = (float)VB[i] / 2^UIMM`, where `UIMM` is the 5-bit immediate at bits 11..15 of the instruction. UIMM ranges 0..31; UIMM=0 is plain integer-to-float.
|
||||
- **Big-endian word lanes.** Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word.
|
||||
- **Use case.** Q-format fixed-point (`Qm.n`) → IEEE float in one instruction. UIMM gives the fractional bit count, so `vcfsx vD, vB, 16` interprets each lane as Q15.16.
|
||||
- **Inexact rounding.** Values whose magnitude exceeds `2^24` lose mantissa precision (only 24 bits in `binary32`'s significand). The default rounding mode is round-to-nearest-even; VMX has no per-instruction rounding control.
|
||||
- **`VSCR[NJ]` (flush-denormals)** affects the output if the scaled value is sub-normal. Xenia's `crate::vmx::cvt_i32_to_f32` honours this via the architectural `VSCR[NJ]` snapshot.
|
||||
- **No `VSCR[SAT]` or XER changes**, no exceptions raised.
|
||||
- **No VMX128 sibling.**
|
||||
- **Round-trip caveat.** `vctsxs` (the inverse) saturates instead of wrapping, so a `vcfsx`/`vctsxs` round-trip is *not* identity for values outside the signed-int32 representable range — important for fixed-point interpolation kernels.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcfux`](vcfux.md) — same shape, unsigned source.
|
||||
- [`vctsxs`](vctsxs.md) — inverse: float → signed-Q `int32` with saturation.
|
||||
- [`vctuxs`](vctuxs.md) — inverse: float → unsigned-Q `uint32` with saturation.
|
||||
- [`vrfin`](vrfin.md), [`vrfiz`](vrfiz.md) — float-to-integer rounding modes when no Q-format scale is needed.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcfsx` (Vector Convert from Signed Fixed-Point Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcfsx-vector-convert-from-signed-fixed-point-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Conversion Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
131
migration/project-root/ppc-manual/vmx/vcfux.md
Normal file
131
migration/project-root/ppc-manual/vmx/vcfux.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# `vcfux` — Vector Convert from Unsigned Fixed-Point Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000030a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcfu` | `vcfux` | — | Vector Convert from Unsigned Fixed-Point Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcfux [VD], [VB], [UIMM]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcfux` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000030a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `778`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vcfux: read | Source B vector register. |
|
||||
| `UIMM` | vcfux: read | 16-bit unsigned immediate. Zero-extended. |
|
||||
| `VD` | vcfux: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcfux`
|
||||
|
||||
- **Reads (always):** `VB`, `UIMM`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcfux`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcfux"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:518`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L518)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:93`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L93)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:502`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L502)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4314-4321`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4314-L4321)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcfux => {
|
||||
let uimm = (instr.raw >> 16) & 0x1F;
|
||||
let b = ctx.vr[instr.rb()].as_u32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = crate::vmx::cvt_u32_to_f32(b[i], uimm); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Convert unsigned-Q `uint32` lane to `binary32`.** For each of the four word lanes, `VD[i] = (float)VB[i] / 2^UIMM`. The 5-bit `UIMM` (bits 11..15) gives the Q-format fractional shift, in `0..31`.
|
||||
- **Big-endian word lanes.** Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word.
|
||||
- **Use case.** Unsigned Q-format fixed-point → IEEE float; common for normalised colour channels (`vcfux vD, vColor, 8` rescales `0..255` to `0..0.996`).
|
||||
- **Inexact rounding.** Magnitudes above `2^24` lose precision. Default rounding is round-to-nearest-even; VMX has no per-instruction rounding control.
|
||||
- **`VSCR[NJ]`** affects sub-normal outputs. Xenia's `crate::vmx::cvt_u32_to_f32` honours the architectural snapshot.
|
||||
- **No `VSCR[SAT]`, no XER changes, no exceptions.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Round-trip caveat.** Pair with [`vctuxs`](vctuxs.md) for the inverse — but the inverse saturates rather than wraps, so floats above `2^32 − 1` clamp to `0xFFFFFFFF` and stick `VSCR[SAT]`.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcfsx`](vcfsx.md) — same shape, signed source.
|
||||
- [`vctuxs`](vctuxs.md) — inverse: float → unsigned-Q `uint32` with saturation.
|
||||
- [`vctsxs`](vctsxs.md) — inverse: float → signed-Q `int32` with saturation.
|
||||
- [`vrfin`](vrfin.md), [`vrfiz`](vrfiz.md) — float-to-integer rounding modes for the un-scaled case.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcfux` (Vector Convert from Unsigned Fixed-Point Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcfux-vector-convert-from-unsigned-fixed-point-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Conversion Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
220
migration/project-root/ppc-manual/vmx/vcmpbfp.md
Normal file
220
migration/project-root/ppc-manual/vmx/vcmpbfp.md
Normal file
@@ -0,0 +1,220 @@
|
||||
# `vcmpbfp` — Vector Compare Bounds Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x100003c6`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpbfp` | `vcmpbfp` | — | Vector Compare Bounds Floating Point |
|
||||
| `vcmpbfp.` | `vcmpbfp` | Rc=1 | Vector Compare Bounds Floating Point |
|
||||
| `vcmpbfp128` | `vcmpbfp128` | — | Vector128 Compare Bounds Floating Point |
|
||||
| `vcmpbfp128.` | `vcmpbfp128` | Rc=1 | Vector128 Compare Bounds Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpbfp[Rc] [VD], [VA], [VB]
|
||||
vcmpbfp128[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpbfp` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x100003c6`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `966`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
### `vcmpbfp128` — form `VX128_R`
|
||||
|
||||
- **Opcode word:** `0x18000180`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `384`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22–25 | `XO` | extended opcode (compare) |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `Rc` | record-form flag (updates CR6) |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpbfp: read; vcmpbfp128: read | Source A vector register. |
|
||||
| `VB` | vcmpbfp: read; vcmpbfp128: read | Source B vector register. |
|
||||
| `VD` | vcmpbfp: write; vcmpbfp128: write | Destination vector register. |
|
||||
| `CR` | vcmpbfp: write (conditional); vcmpbfp128: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpbfp`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
### `vcmpbfp128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpbfp`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
- `vcmpbfp128`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpbfp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpbfp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:583`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L583)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:94`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L94)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:569`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L569)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3822-3847`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3822-L3847)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpbfp | PpcOpcode::vcmpbfp128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vcmpbfp128);
|
||||
let (ra, rb, rd) = if is_128 {
|
||||
(instr.va128(), instr.vb128(), instr.vd128())
|
||||
} else {
|
||||
(instr.ra(), instr.rb(), instr.rd())
|
||||
};
|
||||
let a = ctx.vr[ra].as_f32x4();
|
||||
let b = ctx.vr[rb].as_f32x4();
|
||||
let mut r = [0u32; 4];
|
||||
let mut any_out = false;
|
||||
for i in 0..4 {
|
||||
let mut lane: u32 = 0;
|
||||
if a[i].is_nan() || b[i].is_nan() || a[i] > b[i] { lane |= 0x8000_0000; any_out = true; }
|
||||
if a[i].is_nan() || b[i].is_nan() || a[i] < -b[i] { lane |= 0x4000_0000; any_out = true; }
|
||||
r[i] = lane;
|
||||
}
|
||||
let rc = if is_128 { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() };
|
||||
if rc {
|
||||
ctx.cr[6] = crate::context::CrField {
|
||||
lt: false, gt: false, eq: !any_out, so: false,
|
||||
};
|
||||
}
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vcmpbfp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpbfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:586`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L586)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:94`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L94)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:684`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L684)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3822-3847`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3822-L3847)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpbfp | PpcOpcode::vcmpbfp128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vcmpbfp128);
|
||||
let (ra, rb, rd) = if is_128 {
|
||||
(instr.va128(), instr.vb128(), instr.vd128())
|
||||
} else {
|
||||
(instr.ra(), instr.rb(), instr.rd())
|
||||
};
|
||||
let a = ctx.vr[ra].as_f32x4();
|
||||
let b = ctx.vr[rb].as_f32x4();
|
||||
let mut r = [0u32; 4];
|
||||
let mut any_out = false;
|
||||
for i in 0..4 {
|
||||
let mut lane: u32 = 0;
|
||||
if a[i].is_nan() || b[i].is_nan() || a[i] > b[i] { lane |= 0x8000_0000; any_out = true; }
|
||||
if a[i].is_nan() || b[i].is_nan() || a[i] < -b[i] { lane |= 0x4000_0000; any_out = true; }
|
||||
r[i] = lane;
|
||||
}
|
||||
let rc = if is_128 { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() };
|
||||
if rc {
|
||||
ctx.cr[6] = crate::context::CrField {
|
||||
lt: false, gt: false, eq: !any_out, so: false,
|
||||
};
|
||||
}
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **"Bounds" compare, not equality.** Per word lane, sets two output bits: bit 0 (mask `0x80000000`) if `VA[i] > VB[i]` (out-of-range high) and bit 1 (mask `0x40000000`) if `VA[i] < -VB[i]` (out-of-range low). Bits 2..31 of each lane are zero.
|
||||
- **NaN inputs are out-of-range in *both* directions.** Xenia sets both `0x80000000` and `0x40000000` if either input is NaN, matching the IBM manual: NaN is treated as "violates both bounds".
|
||||
- **CR6 update when `Rc=1`.** CR6 is set as `[lt=0, gt=0, eq=(no-lane-out-of-range), so=0]` — i.e. only the `eq` bit signifies "all four lanes were within `±VB`". Useful as `bc 12,26` (branch if all in-range) for SIMD clamping loops.
|
||||
- **No `VSCR[SAT]`, no XER changes, no exceptions.**
|
||||
- **The convention is "is point inside box?"** — not a per-lane compare like the other `vcmp*` ops. Output is a flag-pair, not a boolean mask, so it does **not** plug directly into [`vsel`](vsel.md). To get a boolean, OR the two bits down with [`vor`](vor.md) and a shift.
|
||||
- **VMX128 sibling (`vcmpbfp128`).** Identical semantics; the `Rc` bit lives at bit 27 of the VX128_R encoding.
|
||||
- **Lane width is fixed at word.** Bounds check is single-precision float only; there is no `vcmpb*` for half / byte / int.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpeqfp`](vcmpeqfp.md) — element-wise `==` for floats.
|
||||
- [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md) — element-wise `>` and `>=` for floats.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vor`](vor.md) — combine the two bits per lane into a boolean mask if needed.
|
||||
- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — clamp values to a range without testing.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpbfp` (Vector Compare Bounds Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpbfp-vector-compare-bounds-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
192
migration/project-root/ppc-manual/vmx/vcmpeqfp.md
Normal file
192
migration/project-root/ppc-manual/vmx/vcmpeqfp.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# `vcmpeqfp` — Vector Compare Equal-to Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x100000c6`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpeqfp` | `vcmpeqfp` | — | Vector Compare Equal-to Floating Point |
|
||||
| `vcmpeqfp.` | `vcmpeqfp` | Rc=1 | Vector Compare Equal-to Floating Point |
|
||||
| `vcmpeqfp128` | `vcmpeqfp128` | — | Vector128 Compare Equal-to Floating Point |
|
||||
| `vcmpeqfp128.` | `vcmpeqfp128` | Rc=1 | Vector128 Compare Equal-to Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpeqfp[Rc] [VD], [VA], [VB]
|
||||
vcmpeqfp128[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpeqfp` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x100000c6`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `198`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
### `vcmpeqfp128` — form `VX128_R`
|
||||
|
||||
- **Opcode word:** `0x18000000`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `0`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22–25 | `XO` | extended opcode (compare) |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `Rc` | record-form flag (updates CR6) |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpeqfp: read; vcmpeqfp128: read | Source A vector register. |
|
||||
| `VB` | vcmpeqfp: read; vcmpeqfp128: read | Source B vector register. |
|
||||
| `VD` | vcmpeqfp: write; vcmpeqfp128: write | Destination vector register. |
|
||||
| `CR` | vcmpeqfp: write (conditional); vcmpeqfp128: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpeqfp`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
### `vcmpeqfp128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpeqfp`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
- `vcmpeqfp128`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpeqfp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpeqfp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:623`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L623)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:94`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L94)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:560`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L560)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2173-2183`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2173-L2183)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpeqfp | PpcOpcode::vcmpeqfp128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_f32x4();
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = if a[i] == b[i] { 0xFFFF_FFFF } else { 0 }; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
let rc = if matches!(instr.opcode, PpcOpcode::vcmpeqfp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() };
|
||||
if rc { update_cr6_from_vmask(&r, ctx); }
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vcmpeqfp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpeqfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:627`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L627)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:94`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L94)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:681`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L681)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2173-2183`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2173-L2183)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpeqfp | PpcOpcode::vcmpeqfp128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_f32x4();
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = if a[i] == b[i] { 0xFFFF_FFFF } else { 0 }; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
let rc = if matches!(instr.opcode, PpcOpcode::vcmpeqfp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() };
|
||||
if rc { update_cr6_from_vmask(&r, ctx); }
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-lane mask: all-ones / all-zero.** For each of the four word lanes, `VD[i] = (VA[i] == VB[i]) ? 0xFFFFFFFF : 0`. Lane 0 is the most-significant word.
|
||||
- **NaN handling is IEEE-754: never equal.** `NaN == anything` is false (including `NaN == NaN`), so the lane stays zero. This is the standard quiet-compare behaviour — no exception, no sticky flag.
|
||||
- **Sign of zero ignored.** `+0 == -0` per IEEE-754, so the lane is set to all-ones.
|
||||
- **`VSCR[NJ]` — denormals.** With `NJ = 1` (Xenon default), denormal inputs are flushed to `±0` *before* the comparison; `±denormal == ±0` then compares as true. This is one of the few VMX float ops where the NJ flag changes program-visible mask values.
|
||||
- **CR6 update when `Rc=1`** (`vcmpeqfp.`). CR6 is `{any-true, 0, all-true, 0}` = `[lt = all-true, gt = 0, eq = all-false, so = 0]` in the standard mapping; xenia's `update_cr6_from_vmask` ([`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs)) handles the bit packing. Use `bc 12,24` for "all-equal" branches and `bc 4,26` for "any-equal".
|
||||
- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) to pick between two source vectors per lane. Or combine masks with [`vand`](vand.md) / [`vor`](vor.md) / [`vandc`](vandc.md) to express conjunctions.
|
||||
- **No `VSCR[SAT]`, no XER changes, no traps** — even on signaling NaNs (Altivec's quiet-compare semantics).
|
||||
- **VMX128 sibling (`vcmpeqfp128`).** Identical semantics with the extended 128-register encoding; xenia routes both opcodes to one match arm via `vmx_reg_triple`.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md) — element-wise `>` and `>=` for floats.
|
||||
- [`vcmpbfp`](vcmpbfp.md) — IEEE bounds check (`±VB`).
|
||||
- [`vcmpequw`](vcmpequw.md) — same shape, integer compare.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vxor`](vxor.md) — mask consumers.
|
||||
- [`vminfp`](vminfp.md), [`vmaxfp`](vmaxfp.md) — direct min / max without comparing.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpeqfp` (Vector Compare Equal-to Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpeqfp-vector-compare-equal-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
140
migration/project-root/ppc-manual/vmx/vcmpequb.md
Normal file
140
migration/project-root/ppc-manual/vmx/vcmpequb.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# `vcmpequb` — Vector Compare Equal-to Unsigned Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000006`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpequb` | `vcmpequb` | — | Vector Compare Equal-to Unsigned Byte |
|
||||
| `vcmpequb.` | `vcmpequb` | Rc=1 | Vector Compare Equal-to Unsigned Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpequb[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpequb` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x10000006`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `6`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpequb: read | Source A vector register. |
|
||||
| `VB` | vcmpequb: read | Source B vector register. |
|
||||
| `VD` | vcmpequb: write | Destination vector register. |
|
||||
| `CR` | vcmpequb: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpequb`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpequb`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpequb`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpequb"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:719`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L719)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:95`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L95)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:557`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L557)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3723-3735`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3723-L3735)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpequb => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = if a[i] == b[i] { 0xFF } else { 0 }; }
|
||||
let v = xenia_types::Vec128::from_bytes(r);
|
||||
if instr.vc_rc_bit() {
|
||||
let (t, f) = crate::vmx::cr6_flags_from_mask(v);
|
||||
ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false };
|
||||
}
|
||||
ctx.vr[instr.rd()] = v;
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-byte mask: all-ones / all-zero.** Sixteen byte lanes; `VD[i] = (VA[i] == VB[i]) ? 0xFF : 0x00`. Lane 0 is the most-significant byte after `stvx`.
|
||||
- **Sign-agnostic.** Equality compare is identical for signed and unsigned bytes; there is no separate `vcmpeqsb`.
|
||||
- **CR6 update when `Rc=1`** (`vcmpequb.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]` — built by xenia's `crate::vmx::cr6_flags_from_mask` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). Standard SIMD-search idiom: `vcmpequb. vMask, vData, vNeedle` then `bc 12,26` to branch when *no* lane matched.
|
||||
- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) to pick per-byte between two source vectors.
|
||||
- **Common usage.** `memchr` / `strlen` / character classification — compare against a broadcast byte (often built with [`vspltisb`](vspltisb.md)) and inspect CR6 for early-out.
|
||||
- **No `VSCR` interaction, no XER, no traps.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpequh`](vcmpequh.md), [`vcmpequw`](vcmpequw.md) — equality compare at half / word width.
|
||||
- [`vcmpgtub`](vcmpgtub.md), [`vcmpgtsb`](vcmpgtsb.md) — `>` at byte width, unsigned / signed.
|
||||
- [`vsel`](vsel.md) — primary mask consumer.
|
||||
- [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask combinators.
|
||||
- [`vspltisb`](vspltisb.md), [`vspltb`](vspltb.md) — broadcast sources for needle patterns.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpequb` (Vector Compare Equal-to Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpequb-vector-compare-equal-unsigned-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
139
migration/project-root/ppc-manual/vmx/vcmpequh.md
Normal file
139
migration/project-root/ppc-manual/vmx/vcmpequh.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# `vcmpequh` — Vector Compare Equal-to Unsigned Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000046`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpequh` | `vcmpequh` | — | Vector Compare Equal-to Unsigned Half Word |
|
||||
| `vcmpequh.` | `vcmpequh` | Rc=1 | Vector Compare Equal-to Unsigned Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpequh[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpequh` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x10000046`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `70`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpequh: read | Source A vector register. |
|
||||
| `VB` | vcmpequh: read | Source B vector register. |
|
||||
| `VD` | vcmpequh: write | Destination vector register. |
|
||||
| `CR` | vcmpequh: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpequh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpequh`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpequh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpequh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:723`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L723)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:95`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L95)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:558`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L558)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3736-3748`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3736-L3748)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpequh => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 { r[i] = if a[i] == b[i] { 0xFFFF } else { 0 }; }
|
||||
let v = xenia_types::Vec128::from_u16x8_array(r);
|
||||
if instr.vc_rc_bit() {
|
||||
let (t, f) = crate::vmx::cr6_flags_from_mask(v);
|
||||
ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false };
|
||||
}
|
||||
ctx.vr[instr.rd()] = v;
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-half mask: all-ones / all-zero.** Eight half-word lanes; `VD[i] = (VA[i] == VB[i]) ? 0xFFFF : 0x0000`. Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half.
|
||||
- **Sign-agnostic.** Equality is bit-identical for signed and unsigned halves; there is no `vcmpeqsh`.
|
||||
- **CR6 update when `Rc=1`** (`vcmpequh.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. Use `bc 12,24` for "all-equal" branches and `bc 12,26` for "no-equal".
|
||||
- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per half-word.
|
||||
- **Common usage.** UTF-16 character classification, audio-sample needle search, indexed-mesh deduplication.
|
||||
- **No `VSCR` interaction, no XER, no traps.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpequb`](vcmpequb.md), [`vcmpequw`](vcmpequw.md) — equality compare at byte / word width.
|
||||
- [`vcmpgtuh`](vcmpgtuh.md), [`vcmpgtsh`](vcmpgtsh.md) — `>` at half width, unsigned / signed.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators.
|
||||
- [`vspltish`](vspltish.md), [`vsplth`](vsplth.md) — broadcast sources for needle patterns.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpequh` (Vector Compare Equal-to Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpequh-vector-compare-equal-unsigned-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
192
migration/project-root/ppc-manual/vmx/vcmpequw.md
Normal file
192
migration/project-root/ppc-manual/vmx/vcmpequw.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# `vcmpequw` — Vector Compare Equal-to Unsigned Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000086`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpequw` | `vcmpequw` | — | Vector Compare Equal-to Unsigned Word |
|
||||
| `vcmpequw.` | `vcmpequw` | Rc=1 | Vector Compare Equal-to Unsigned Word |
|
||||
| `vcmpequw128` | `vcmpequw128` | — | Vector128 Compare Equal-to Unsigned Word |
|
||||
| `vcmpequw128.` | `vcmpequw128` | Rc=1 | Vector128 Compare Equal-to Unsigned Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpequw[Rc] [VD], [VA], [VB]
|
||||
vcmpequw128[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpequw` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x10000086`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `134`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
### `vcmpequw128` — form `VX128_R`
|
||||
|
||||
- **Opcode word:** `0x18000200`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `512`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22–25 | `XO` | extended opcode (compare) |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `Rc` | record-form flag (updates CR6) |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpequw: read; vcmpequw128: read | Source A vector register. |
|
||||
| `VB` | vcmpequw: read; vcmpequw128: read | Source B vector register. |
|
||||
| `VD` | vcmpequw: write; vcmpequw128: write | Destination vector register. |
|
||||
| `CR` | vcmpequw: write (conditional); vcmpequw128: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpequw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
### `vcmpequw128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpequw`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
- `vcmpequw128`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpequw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpequw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:727`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L727)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:95`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L95)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:559`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L559)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2542-2552`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2542-L2552)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpequw | PpcOpcode::vcmpequw128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = if a[i] == b[i] { 0xFFFF_FFFF } else { 0 }; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
let rc = if matches!(instr.opcode, PpcOpcode::vcmpequw128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() };
|
||||
if rc { update_cr6_from_vmask(&r, ctx); }
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vcmpequw128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpequw128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:731`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L731)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:95`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L95)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:685`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L685)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2542-2552`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2542-L2552)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpequw | PpcOpcode::vcmpequw128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = if a[i] == b[i] { 0xFFFF_FFFF } else { 0 }; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
let rc = if matches!(instr.opcode, PpcOpcode::vcmpequw128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() };
|
||||
if rc { update_cr6_from_vmask(&r, ctx); }
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-word mask: all-ones / all-zero.** Four word lanes; `VD[i] = (VA[i] == VB[i]) ? 0xFFFFFFFF : 0`. Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word.
|
||||
- **Sign-agnostic.** Equality is bit-identical for signed and unsigned words; there is no `vcmpeqsw`.
|
||||
- **CR6 update when `Rc=1`** (`vcmpequw.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. Classic "did all four 32-bit hash buckets match?" early-out pattern.
|
||||
- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per word.
|
||||
- **Common usage.** Hashtable probe matching, packed-RGBA pixel comparisons, packed-int handle equality.
|
||||
- **No `VSCR` interaction, no XER, no traps.**
|
||||
- **Aliasing legal.**
|
||||
- **VMX128 sibling (`vcmpequw128`).** Identical semantics with the extended encoding; xenia routes both to one match arm via `vmx_reg_triple`.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpequb`](vcmpequb.md), [`vcmpequh`](vcmpequh.md) — equality compare at byte / half width.
|
||||
- [`vcmpgtuw`](vcmpgtuw.md), [`vcmpgtsw`](vcmpgtsw.md) — `>` at word width, unsigned / signed.
|
||||
- [`vcmpeqfp`](vcmpeqfp.md) — same shape, IEEE-754 single-precision equality.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators.
|
||||
- [`vspltisw`](vspltisw.md), [`vspltw`](vspltw.md) — broadcast sources for needle patterns.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpequw` (Vector Compare Equal-to Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpequw-vector-compare-equal-unsigned-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
192
migration/project-root/ppc-manual/vmx/vcmpgefp.md
Normal file
192
migration/project-root/ppc-manual/vmx/vcmpgefp.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# `vcmpgefp` — Vector Compare Greater-Than-or-Equal-to Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x100001c6`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpgefp` | `vcmpgefp` | — | Vector Compare Greater-Than-or-Equal-to Floating Point |
|
||||
| `vcmpgefp.` | `vcmpgefp` | Rc=1 | Vector Compare Greater-Than-or-Equal-to Floating Point |
|
||||
| `vcmpgefp128` | `vcmpgefp128` | — | Vector128 Compare Greater-Than-or-Equal-to Floating Point |
|
||||
| `vcmpgefp128.` | `vcmpgefp128` | Rc=1 | Vector128 Compare Greater-Than-or-Equal-to Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpgefp[Rc] [VD], [VA], [VB]
|
||||
vcmpgefp128[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpgefp` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x100001c6`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `454`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
### `vcmpgefp128` — form `VX128_R`
|
||||
|
||||
- **Opcode word:** `0x18000080`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `128`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22–25 | `XO` | extended opcode (compare) |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `Rc` | record-form flag (updates CR6) |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpgefp: read; vcmpgefp128: read | Source A vector register. |
|
||||
| `VB` | vcmpgefp: read; vcmpgefp128: read | Source B vector register. |
|
||||
| `VD` | vcmpgefp: write; vcmpgefp128: write | Destination vector register. |
|
||||
| `CR` | vcmpgefp: write (conditional); vcmpgefp128: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpgefp`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
### `vcmpgefp128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpgefp`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
- `vcmpgefp128`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpgefp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgefp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:631`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L631)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:96`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L96)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:561`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L561)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2184-2194`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2184-L2194)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpgefp | PpcOpcode::vcmpgefp128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_f32x4();
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = if a[i] >= b[i] { 0xFFFF_FFFF } else { 0 }; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
let rc = if matches!(instr.opcode, PpcOpcode::vcmpgefp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() };
|
||||
if rc { update_cr6_from_vmask(&r, ctx); }
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vcmpgefp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgefp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:635`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L635)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:96`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L96)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:682`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L682)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2184-2194`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2184-L2194)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpgefp | PpcOpcode::vcmpgefp128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_f32x4();
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = if a[i] >= b[i] { 0xFFFF_FFFF } else { 0 }; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
let rc = if matches!(instr.opcode, PpcOpcode::vcmpgefp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() };
|
||||
if rc { update_cr6_from_vmask(&r, ctx); }
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-lane mask: all-ones / all-zero.** Four word lanes; `VD[i] = (VA[i] >= VB[i]) ? 0xFFFFFFFF : 0`.
|
||||
- **NaN handling: false.** Any NaN input makes the comparison false (lane stays zero) — matches IEEE-754 quiet-compare semantics. There is no exception, no sticky flag.
|
||||
- **`+0 >= -0` is true.** Zero signs are ignored.
|
||||
- **`VSCR[NJ]` denormals.** With `NJ = 1` (Xenon default), denormal inputs are flushed to zero before the compare; this can flip a comparison's outcome relative to strict IEEE.
|
||||
- **CR6 update when `Rc=1`** (`vcmpgefp.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. `bc 12,24` branches on "all four lanes ≥".
|
||||
- **Compose with `vsel`.** The mask drives [`vsel`](vsel.md) for per-lane selection.
|
||||
- **No `VSCR[SAT]`, no XER changes, no traps.**
|
||||
- **VMX128 sibling (`vcmpgefp128`).** Identical semantics with the extended encoding.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpgtfp`](vcmpgtfp.md) — strict `>` for floats.
|
||||
- [`vcmpeqfp`](vcmpeqfp.md) — equality for floats.
|
||||
- [`vcmpbfp`](vcmpbfp.md) — bounds check `|VA| <= |VB|`.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators.
|
||||
- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — direct max / min when the mask isn't needed elsewhere.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpgefp` (Vector Compare Greater-Than-or-Equal-to Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgefp-vector-compare-greater-than-equal-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
192
migration/project-root/ppc-manual/vmx/vcmpgtfp.md
Normal file
192
migration/project-root/ppc-manual/vmx/vcmpgtfp.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# `vcmpgtfp` — Vector Compare Greater-Than Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x100002c6`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpgtfp` | `vcmpgtfp` | — | Vector Compare Greater-Than Floating Point |
|
||||
| `vcmpgtfp.` | `vcmpgtfp` | Rc=1 | Vector Compare Greater-Than Floating Point |
|
||||
| `vcmpgtfp128` | `vcmpgtfp128` | — | Vector128 Compare Greater-Than Floating-Point |
|
||||
| `vcmpgtfp128.` | `vcmpgtfp128` | Rc=1 | Vector128 Compare Greater-Than Floating-Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpgtfp[Rc] [VD], [VA], [VB]
|
||||
vcmpgtfp128[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpgtfp` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x100002c6`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `710`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
### `vcmpgtfp128` — form `VX128_R`
|
||||
|
||||
- **Opcode word:** `0x18000100`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `256`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22–25 | `XO` | extended opcode (compare) |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `Rc` | record-form flag (updates CR6) |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpgtfp: read; vcmpgtfp128: read | Source A vector register. |
|
||||
| `VB` | vcmpgtfp: read; vcmpgtfp128: read | Source B vector register. |
|
||||
| `VD` | vcmpgtfp: write; vcmpgtfp128: write | Destination vector register. |
|
||||
| `CR` | vcmpgtfp: write (conditional); vcmpgtfp128: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpgtfp`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
### `vcmpgtfp128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpgtfp`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
- `vcmpgtfp128`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpgtfp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtfp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:639`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L639)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:96`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L96)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:565`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L565)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2195-2205`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2195-L2205)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpgtfp | PpcOpcode::vcmpgtfp128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_f32x4();
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = if a[i] > b[i] { 0xFFFF_FFFF } else { 0 }; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
let rc = if matches!(instr.opcode, PpcOpcode::vcmpgtfp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() };
|
||||
if rc { update_cr6_from_vmask(&r, ctx); }
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vcmpgtfp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:643`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L643)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:96`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L96)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:683`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L683)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2195-2205`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2195-L2205)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpgtfp | PpcOpcode::vcmpgtfp128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_f32x4();
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = if a[i] > b[i] { 0xFFFF_FFFF } else { 0 }; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
let rc = if matches!(instr.opcode, PpcOpcode::vcmpgtfp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() };
|
||||
if rc { update_cr6_from_vmask(&r, ctx); }
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-lane mask: all-ones / all-zero.** Four word lanes; `VD[i] = (VA[i] > VB[i]) ? 0xFFFFFFFF : 0`.
|
||||
- **NaN handling: false.** Any NaN input produces a false lane (no sticky flag, no exception) — matches IEEE-754 quiet-compare.
|
||||
- **`+0 > -0` is false.** Zero signs ignored.
|
||||
- **`VSCR[NJ]` denormals.** With `NJ = 1`, denormal inputs flush to zero before the compare.
|
||||
- **CR6 update when `Rc=1`** (`vcmpgtfp.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. Use `bc 12,24` for "all-greater" branches and `bc 12,26` for "no-lane-greater".
|
||||
- **Compose with `vsel`.** Mask plus [`vsel`](vsel.md) implements per-lane `if (a > b) x else y`.
|
||||
- **No `VSCR[SAT]`, no XER changes, no traps.**
|
||||
- **VMX128 sibling (`vcmpgtfp128`).** Identical semantics with the extended encoding.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpgefp`](vcmpgefp.md) — `>=` for floats.
|
||||
- [`vcmpeqfp`](vcmpeqfp.md) — equality for floats.
|
||||
- [`vcmpbfp`](vcmpbfp.md) — bounds check (`|VA| <= |VB|`).
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators.
|
||||
- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — direct max / min.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpgtfp` (Vector Compare Greater-Than Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtfp-vector-compare-greater-than-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
140
migration/project-root/ppc-manual/vmx/vcmpgtsb.md
Normal file
140
migration/project-root/ppc-manual/vmx/vcmpgtsb.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# `vcmpgtsb` — Vector Compare Greater-Than Signed Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000306`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpgtsb` | `vcmpgtsb` | — | Vector Compare Greater-Than Signed Byte |
|
||||
| `vcmpgtsb.` | `vcmpgtsb` | Rc=1 | Vector Compare Greater-Than Signed Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpgtsb[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpgtsb` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x10000306`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `774`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpgtsb: read | Source A vector register. |
|
||||
| `VB` | vcmpgtsb: read | Source B vector register. |
|
||||
| `VD` | vcmpgtsb: write | Destination vector register. |
|
||||
| `CR` | vcmpgtsb: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpgtsb`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpgtsb`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpgtsb`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtsb"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:735`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L735)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:566`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L566)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3762-3774`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3762-L3774)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpgtsb => {
|
||||
let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]);
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = if a[i] > b[i] { 0xFF } else { 0 }; }
|
||||
let v = xenia_types::Vec128::from_bytes(r);
|
||||
if instr.vc_rc_bit() {
|
||||
let (t, f) = crate::vmx::cr6_flags_from_mask(v);
|
||||
ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false };
|
||||
}
|
||||
ctx.vr[instr.rd()] = v;
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-byte mask: all-ones / all-zero.** Sixteen byte lanes; `VD[i] = (int8(VA[i]) > int8(VB[i])) ? 0xFF : 0x00`. Lane 0 is the most-significant byte after `stvx`.
|
||||
- **Sign matters.** Identical bit patterns to [`vcmpgtub`](vcmpgtub.md) compare differently because of the signed interpretation: e.g. `0xFF > 0x01` is `true` unsigned but `false` signed (`-1 > 1`).
|
||||
- **CR6 update when `Rc=1`** (`vcmpgtsb.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]` — built by xenia's `crate::vmx::cr6_flags_from_mask`.
|
||||
- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) to select per byte.
|
||||
- **Common usage.** Signed-byte audio thresholding, signed-difference sign extraction (`vsubsbs` then `vcmpgtsb`).
|
||||
- **No `VSCR` interaction, no XER, no traps.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpgtub`](vcmpgtub.md) — same width, unsigned `>`.
|
||||
- [`vcmpequb`](vcmpequb.md) — equality at byte width.
|
||||
- [`vcmpgtsh`](vcmpgtsh.md), [`vcmpgtsw`](vcmpgtsw.md) — signed `>` at half / word width.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators.
|
||||
- [`vmaxsb`](vmaxsb.md), [`vminsb`](vminsb.md) — direct signed max / min.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpgtsb` (Vector Compare Greater-Than Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtsb-vector-compare-greater-than-signed-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
140
migration/project-root/ppc-manual/vmx/vcmpgtsh.md
Normal file
140
migration/project-root/ppc-manual/vmx/vcmpgtsh.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# `vcmpgtsh` — Vector Compare Greater-Than Signed Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000346`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpgtsh` | `vcmpgtsh` | — | Vector Compare Greater-Than Signed Half Word |
|
||||
| `vcmpgtsh.` | `vcmpgtsh` | Rc=1 | Vector Compare Greater-Than Signed Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpgtsh[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpgtsh` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x10000346`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `838`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpgtsh: read | Source A vector register. |
|
||||
| `VB` | vcmpgtsh: read | Source B vector register. |
|
||||
| `VD` | vcmpgtsh: write | Destination vector register. |
|
||||
| `CR` | vcmpgtsh: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpgtsh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpgtsh`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpgtsh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtsh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:739`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L739)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:567`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L567)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3788-3800`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3788-L3800)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpgtsh => {
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 { r[i] = if a[i] > b[i] { 0xFFFF } else { 0 }; }
|
||||
let v = xenia_types::Vec128::from_u16x8_array(r);
|
||||
if instr.vc_rc_bit() {
|
||||
let (t, f) = crate::vmx::cr6_flags_from_mask(v);
|
||||
ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false };
|
||||
}
|
||||
ctx.vr[instr.rd()] = v;
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-half mask: all-ones / all-zero.** Eight half-word lanes; `VD[i] = (int16(VA[i]) > int16(VB[i])) ? 0xFFFF : 0x0000`. Lane 0 is the most-significant half.
|
||||
- **Sign matters.** `0x8000 > 0x0001` is `true` unsigned but `false` signed (`-32768 > 1`). Pick `vcmpgtsh` deliberately when sign bit affects ordering.
|
||||
- **CR6 update when `Rc=1`** (`vcmpgtsh.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`.
|
||||
- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per half.
|
||||
- **Common usage.** Q15 audio threshold detection, signed image-processing kernels.
|
||||
- **No `VSCR` interaction, no XER, no traps.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpgtuh`](vcmpgtuh.md) — same width, unsigned `>`.
|
||||
- [`vcmpequh`](vcmpequh.md) — equality at half width.
|
||||
- [`vcmpgtsb`](vcmpgtsb.md), [`vcmpgtsw`](vcmpgtsw.md) — signed `>` at byte / word width.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators.
|
||||
- [`vmaxsh`](vmaxsh.md), [`vminsh`](vminsh.md) — direct signed max / min.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpgtsh` (Vector Compare Greater-Than Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtsh-vector-compare-greater-than-signed-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
137
migration/project-root/ppc-manual/vmx/vcmpgtsw.md
Normal file
137
migration/project-root/ppc-manual/vmx/vcmpgtsw.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# `vcmpgtsw` — Vector Compare Greater-Than Signed Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000386`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpgtsw` | `vcmpgtsw` | — | Vector Compare Greater-Than Signed Word |
|
||||
| `vcmpgtsw.` | `vcmpgtsw` | Rc=1 | Vector Compare Greater-Than Signed Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpgtsw[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpgtsw` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x10000386`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `902`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpgtsw: read | Source A vector register. |
|
||||
| `VB` | vcmpgtsw: read | Source B vector register. |
|
||||
| `VD` | vcmpgtsw: write | Destination vector register. |
|
||||
| `CR` | vcmpgtsw: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpgtsw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpgtsw`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpgtsw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtsw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:743`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L743)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:568`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L568)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3811-3820`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3811-L3820)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpgtsw => {
|
||||
let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]);
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = if a[i] > b[i] { 0xFFFFFFFF } else { 0 }; }
|
||||
let v = xenia_types::Vec128::from_u32x4_array(r);
|
||||
if instr.vc_rc_bit() { update_cr6_from_vmask(&r, ctx); }
|
||||
ctx.vr[instr.rd()] = v;
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-word mask: all-ones / all-zero.** Four word lanes; `VD[i] = (int32(VA[i]) > int32(VB[i])) ? 0xFFFFFFFF : 0`. Lane 0 is the most-significant word.
|
||||
- **Sign matters.** `0x8000_0000 > 0x0000_0001` is `true` unsigned but `false` signed (`INT32_MIN > 1`).
|
||||
- **CR6 update when `Rc=1`** (`vcmpgtsw.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`.
|
||||
- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per word.
|
||||
- **Common usage.** Z-buffer ordering, signed counter thresholds, "argmax" of int32 arrays.
|
||||
- **No `VSCR` interaction, no XER, no traps.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpgtuw`](vcmpgtuw.md) — same width, unsigned `>`.
|
||||
- [`vcmpequw`](vcmpequw.md) — equality at word width.
|
||||
- [`vcmpgtsb`](vcmpgtsb.md), [`vcmpgtsh`](vcmpgtsh.md) — signed `>` at byte / half width.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators.
|
||||
- [`vmaxsw`](vmaxsw.md), [`vminsw`](vminsw.md) — direct signed max / min.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpgtsw` (Vector Compare Greater-Than Signed Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtsw-vector-compare-greater-than-signed-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
140
migration/project-root/ppc-manual/vmx/vcmpgtub.md
Normal file
140
migration/project-root/ppc-manual/vmx/vcmpgtub.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# `vcmpgtub` — Vector Compare Greater-Than Unsigned Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000206`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpgtub` | `vcmpgtub` | — | Vector Compare Greater-Than Unsigned Byte |
|
||||
| `vcmpgtub.` | `vcmpgtub` | Rc=1 | Vector Compare Greater-Than Unsigned Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpgtub[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpgtub` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x10000206`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `518`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpgtub: read | Source A vector register. |
|
||||
| `VB` | vcmpgtub: read | Source B vector register. |
|
||||
| `VD` | vcmpgtub: write | Destination vector register. |
|
||||
| `CR` | vcmpgtub: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpgtub`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpgtub`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpgtub`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtub"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:747`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L747)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:562`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L562)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3749-3761`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3749-L3761)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpgtub => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = if a[i] > b[i] { 0xFF } else { 0 }; }
|
||||
let v = xenia_types::Vec128::from_bytes(r);
|
||||
if instr.vc_rc_bit() {
|
||||
let (t, f) = crate::vmx::cr6_flags_from_mask(v);
|
||||
ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false };
|
||||
}
|
||||
ctx.vr[instr.rd()] = v;
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-byte mask: all-ones / all-zero.** Sixteen byte lanes; `VD[i] = (uint8(VA[i]) > uint8(VB[i])) ? 0xFF : 0x00`. Lane 0 is the most-significant byte after `stvx`.
|
||||
- **Sign matters.** `0xFF > 0x01` is `true` unsigned but `false` signed (`-1 > 1`); pick `vcmpgtub` only when both sides should be treated as `0..255`.
|
||||
- **CR6 update when `Rc=1`** (`vcmpgtub.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]` — built by xenia's `crate::vmx::cr6_flags_from_mask`.
|
||||
- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per byte.
|
||||
- **Common usage.** Pixel "brighter than" tests, byte-level histogramming, threshold-based binarisation.
|
||||
- **No `VSCR` interaction, no XER, no traps.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpgtsb`](vcmpgtsb.md) — same width, signed `>`.
|
||||
- [`vcmpequb`](vcmpequb.md) — equality at byte width.
|
||||
- [`vcmpgtuh`](vcmpgtuh.md), [`vcmpgtuw`](vcmpgtuw.md) — unsigned `>` at half / word width.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators.
|
||||
- [`vmaxub`](vmaxub.md), [`vminub`](vminub.md) — direct unsigned max / min.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpgtub` (Vector Compare Greater-Than Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtub-vector-compare-greater-than-unsigned-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
140
migration/project-root/ppc-manual/vmx/vcmpgtuh.md
Normal file
140
migration/project-root/ppc-manual/vmx/vcmpgtuh.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# `vcmpgtuh` — Vector Compare Greater-Than Unsigned Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000246`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpgtuh` | `vcmpgtuh` | — | Vector Compare Greater-Than Unsigned Half Word |
|
||||
| `vcmpgtuh.` | `vcmpgtuh` | Rc=1 | Vector Compare Greater-Than Unsigned Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpgtuh[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpgtuh` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x10000246`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `582`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpgtuh: read | Source A vector register. |
|
||||
| `VB` | vcmpgtuh: read | Source B vector register. |
|
||||
| `VD` | vcmpgtuh: write | Destination vector register. |
|
||||
| `CR` | vcmpgtuh: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpgtuh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpgtuh`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpgtuh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtuh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:751`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L751)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:563`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L563)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3775-3787`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3775-L3787)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpgtuh => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 { r[i] = if a[i] > b[i] { 0xFFFF } else { 0 }; }
|
||||
let v = xenia_types::Vec128::from_u16x8_array(r);
|
||||
if instr.vc_rc_bit() {
|
||||
let (t, f) = crate::vmx::cr6_flags_from_mask(v);
|
||||
ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false };
|
||||
}
|
||||
ctx.vr[instr.rd()] = v;
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-half mask: all-ones / all-zero.** Eight half-word lanes; `VD[i] = (uint16(VA[i]) > uint16(VB[i])) ? 0xFFFF : 0x0000`. Lane 0 is the most-significant half.
|
||||
- **Sign matters.** `0xFFFF > 0x0001` is `true` unsigned but `false` signed (`-1 > 1`).
|
||||
- **CR6 update when `Rc=1`** (`vcmpgtuh.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`.
|
||||
- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per half.
|
||||
- **Common usage.** UTF-16 codepoint range testing, unsigned-half threshold binarisation.
|
||||
- **No `VSCR` interaction, no XER, no traps.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpgtsh`](vcmpgtsh.md) — same width, signed `>`.
|
||||
- [`vcmpequh`](vcmpequh.md) — equality at half width.
|
||||
- [`vcmpgtub`](vcmpgtub.md), [`vcmpgtuw`](vcmpgtuw.md) — unsigned `>` at byte / word width.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators.
|
||||
- [`vmaxuh`](vmaxuh.md), [`vminuh`](vminuh.md) — direct unsigned max / min.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpgtuh` (Vector Compare Greater-Than Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtuh-vector-compare-greater-than-unsigned-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
137
migration/project-root/ppc-manual/vmx/vcmpgtuw.md
Normal file
137
migration/project-root/ppc-manual/vmx/vcmpgtuw.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# `vcmpgtuw` — Vector Compare Greater-Than Unsigned Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000286`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vcmpgtuw` | `vcmpgtuw` | — | Vector Compare Greater-Than Unsigned Word |
|
||||
| `vcmpgtuw.` | `vcmpgtuw` | Rc=1 | Vector Compare Greater-Than Unsigned Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vcmpgtuw[Rc] [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vcmpgtuw` — form `VC`
|
||||
|
||||
- **Opcode word:** `0x10000286`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `646`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21 | `Rc` | record-form flag (updates CR6) |
|
||||
| 22–31 | `XO` | extended opcode (10 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vcmpgtuw: read | Source A vector register. |
|
||||
| `VB` | vcmpgtuw: read | Source B vector register. |
|
||||
| `VD` | vcmpgtuw: write | Destination vector register. |
|
||||
| `CR` | vcmpgtuw: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vcmpgtuw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** `CR`
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vcmpgtuw`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vcmpgtuw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtuw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:755`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L755)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:564`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L564)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3801-3810`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3801-L3810)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vcmpgtuw => {
|
||||
let a = ctx.vr[instr.ra()].as_u32x4();
|
||||
let b = ctx.vr[instr.rb()].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = if a[i] > b[i] { 0xFFFFFFFF } else { 0 }; }
|
||||
let v = xenia_types::Vec128::from_u32x4_array(r);
|
||||
if instr.vc_rc_bit() { update_cr6_from_vmask(&r, ctx); }
|
||||
ctx.vr[instr.rd()] = v;
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-word mask: all-ones / all-zero.** Four word lanes; `VD[i] = (uint32(VA[i]) > uint32(VB[i])) ? 0xFFFFFFFF : 0`. Lane 0 is the most-significant word.
|
||||
- **Sign matters.** `0x8000_0000 > 0x0000_0001` is `true` unsigned but `false` signed.
|
||||
- **CR6 update when `Rc=1`** (`vcmpgtuw.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`.
|
||||
- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per word.
|
||||
- **Common usage.** Hashtable bucket selection, packed-RGBA bit-pattern ordering, ID range checks.
|
||||
- **No `VSCR` interaction, no XER, no traps.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vcmpgtsw`](vcmpgtsw.md) — same width, signed `>`.
|
||||
- [`vcmpequw`](vcmpequw.md) — equality at word width.
|
||||
- [`vcmpgtub`](vcmpgtub.md), [`vcmpgtuh`](vcmpgtuh.md) — unsigned `>` at byte / half width.
|
||||
- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators.
|
||||
- [`vmaxuw`](vmaxuw.md), [`vminuw`](vminuw.md) — direct unsigned max / min.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vcmpgtuw` (Vector Compare Greater-Than Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtuw-vector-compare-greater-than-unsigned-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
138
migration/project-root/ppc-manual/vmx/vctsxs.md
Normal file
138
migration/project-root/ppc-manual/vmx/vctsxs.md
Normal file
@@ -0,0 +1,138 @@
|
||||
# `vctsxs` — Vector Convert to Signed Fixed-Point Word Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100003ca`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vctsxs` | `vctsxs` | — | Vector Convert to Signed Fixed-Point Word Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vctsxs [VD], [VB], [UIMM]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vctsxs` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x100003ca`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `970`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vctsxs: read | Source B vector register. |
|
||||
| `UIMM` | vctsxs: read | 16-bit unsigned immediate. Zero-extended. |
|
||||
| `VD` | vctsxs: write | Destination vector register. |
|
||||
| `VSCR` | vctsxs: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vctsxs`
|
||||
|
||||
- **Reads (always):** `VB`, `UIMM`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vctsxs`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vctsxs`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vctsxs"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:536`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L536)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:98`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L98)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:517`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L517)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4281-4292`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4281-L4292)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vctsxs => {
|
||||
let uimm = (instr.raw >> 16) & 0x1F;
|
||||
let b = ctx.vr[instr.rb()].as_f32x4();
|
||||
let mut r = [0i32; 4]; let mut sat = false;
|
||||
for i in 0..4 {
|
||||
let (v, s) = crate::vmx::cvt_f32_to_i32_sat(b[i], uimm);
|
||||
r[i] = v; sat |= s;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Convert IEEE float lane to signed-Q `int32`, saturating.** For each of the four word lanes, `VD[i] = clamp(round_toward_zero(VB[i] * 2^UIMM), INT32_MIN, INT32_MAX)`. The 5-bit `UIMM` (bits 11..15) gives the Q-format fractional shift, in `0..31`.
|
||||
- **Saturating, not wrapping.** Out-of-range floats clamp to `INT32_MIN` (negative overflow) or `INT32_MAX` (positive overflow) — *not* the wrap-around behaviour of x86 `cvttps2dq` (which produces `0x80000000` on overflow regardless of sign). Xenia's `crate::vmx::cvt_f32_to_i32_sat` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) handles the difference.
|
||||
- **NaN → 0.** A NaN input becomes `0` in the output lane and stickies `VSCR[SAT]`. (Many references state "NaN → INT32_MIN"; verify against [`vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs) for the canonical xenia behaviour, which differs from POWER ISA wording.)
|
||||
- **`VSCR[SAT]` is sticky-set** if any lane saturates (overflow or NaN). Cleared only by [`mtvscr`](mtvscr.md).
|
||||
- **Rounding is truncate-toward-zero.** Always; no per-instruction rounding control.
|
||||
- **`VSCR[NJ]` flushes denormal *inputs* to zero before scaling** (Xenon default).
|
||||
- **Big-endian word lanes.** Lane 0 is the most-significant word.
|
||||
- **No XER changes, no traps.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Inverse of [`vcfsx`](vcfsx.md)**, but the inverse direction saturates rather than wraps — round-trips lose the magnitude of out-of-range values.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vctuxs`](vctuxs.md) — same shape, unsigned destination.
|
||||
- [`vcfsx`](vcfsx.md), [`vcfux`](vcfux.md) — inverse direction (int → float with Q-shift).
|
||||
- [`vrfin`](vrfin.md), [`vrfip`](vrfip.md), [`vrfim`](vrfim.md), [`vrfiz`](vrfiz.md) — float-to-float rounding modes (round-to-nearest, up, down, toward-zero) when staying in float.
|
||||
- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear `VSCR[SAT]`.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vctsxs` (Vector Convert to Signed Fixed-Point Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vctsxs-vector-convert-signed-fixed-point-word-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Conversion Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
137
migration/project-root/ppc-manual/vmx/vctuxs.md
Normal file
137
migration/project-root/ppc-manual/vmx/vctuxs.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# `vctuxs` — Vector Convert to Unsigned Fixed-Point Word Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000038a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vctuxs` | `vctuxs` | — | Vector Convert to Unsigned Fixed-Point Word Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vctuxs [VD], [VB], [UIMM]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vctuxs` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000038a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `906`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vctuxs: read | Source B vector register. |
|
||||
| `UIMM` | vctuxs: read | 16-bit unsigned immediate. Zero-extended. |
|
||||
| `VD` | vctuxs: write | Destination vector register. |
|
||||
| `VSCR` | vctuxs: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vctuxs`
|
||||
|
||||
- **Reads (always):** `VB`, `UIMM`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vctuxs`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vctuxs`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vctuxs"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:554`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L554)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:98`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L98)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:515`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L515)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4293-4304`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4293-L4304)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vctuxs => {
|
||||
let uimm = (instr.raw >> 16) & 0x1F;
|
||||
let b = ctx.vr[instr.rb()].as_f32x4();
|
||||
let mut r = [0u32; 4]; let mut sat = false;
|
||||
for i in 0..4 {
|
||||
let (v, s) = crate::vmx::cvt_f32_to_u32_sat(b[i], uimm);
|
||||
r[i] = v; sat |= s;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Convert IEEE float lane to unsigned-Q `uint32`, saturating.** For each of the four word lanes, `VD[i] = clamp(round_toward_zero(VB[i] * 2^UIMM), 0, UINT32_MAX)`. The 5-bit `UIMM` (bits 11..15) gives the Q-format fractional shift, in `0..31`.
|
||||
- **Saturating, not wrapping.** Negative inputs clamp to `0`; values above `2^32 − 1` clamp to `0xFFFF_FFFF`. NaN → `0`. All clamping events sticky-set `VSCR[SAT]`. Xenia's `crate::vmx::cvt_f32_to_u32_sat` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) handles the boundaries.
|
||||
- **`VSCR[SAT]` sticky.** Cleared only by [`mtvscr`](mtvscr.md).
|
||||
- **Rounding is truncate-toward-zero.** Always.
|
||||
- **`VSCR[NJ]` flushes denormal inputs to zero before scaling** (Xenon default).
|
||||
- **Big-endian word lanes.** Lane 0 is the most-significant word.
|
||||
- **No XER changes, no traps.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Float colour `[0.0, 1.0]` → packed `0..2^N−1` integer with `vctuxs vD, vColor, 8` (Q24.8 → `0..255` after a [`vpkshus`](vpkshus.md)) or `, 32` for full unsigned-int range.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vctsxs`](vctsxs.md) — same shape, signed destination.
|
||||
- [`vcfsx`](vcfsx.md), [`vcfux`](vcfux.md) — inverse direction.
|
||||
- [`vrfin`](vrfin.md), [`vrfip`](vrfip.md), [`vrfim`](vrfim.md), [`vrfiz`](vrfiz.md) — float-to-float rounding modes.
|
||||
- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear `VSCR[SAT]`.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vctuxs` (Vector Convert to Unsigned Fixed-Point Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vctuxs-vector-convert-unsigned-fixed-point-word-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Conversion Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
180
migration/project-root/ppc-manual/vmx/vexptefp.md
Normal file
180
migration/project-root/ppc-manual/vmx/vexptefp.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# `vexptefp` — Vector 2 Raised to the Exponent Estimate Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000018a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vexptefp` | `vexptefp` | — | Vector 2 Raised to the Exponent Estimate Floating Point |
|
||||
| `vexptefp128` | `vexptefp128` | — | Vector128 Log2 Estimate Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vexptefp [VD], [VB]
|
||||
vexptefp128 [VD], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vexptefp` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000018a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `394`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vexptefp128` — form `VX128_3`
|
||||
|
||||
- **Opcode word:** `0x180006b0`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `1712`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (6) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `IMM` | 5-bit immediate |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21–27 | `XO` | extended opcode |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vexptefp: read; vexptefp128: read | Source B vector register. |
|
||||
| `VD` | vexptefp: write; vexptefp128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vexptefp`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vexptefp128`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vexptefp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vexptefp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:766`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L766)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:99`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L99)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:469`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L469)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4367-4376`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4367-L4376)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vexptefp | PpcOpcode::vexptefp128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vexptefp128);
|
||||
let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) }
|
||||
else { (instr.rb(), instr.rd()) };
|
||||
let b = ctx.vr[rb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].exp2(); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vexptefp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vexptefp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:769`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L769)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:99`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L99)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:666`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L666)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4367-4376`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4367-L4376)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vexptefp | PpcOpcode::vexptefp128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vexptefp128);
|
||||
let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) }
|
||||
else { (instr.rb(), instr.rd()) };
|
||||
let b = ctx.vr[rb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].exp2(); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-lane base-2 exponent.** Each of the four word lanes computes `VD[i] = 2^VB[i]` in `binary32`. **Note:** the IBM manual specifies a low-precision estimate (≤ 1/16 ULP relative error). Xenia uses Rust's `f32::exp2`, which is full-precision — programs that depend on hardware-quality estimation may observe small numerical differences.
|
||||
- **Use `vlogefp` for the inverse.** The natural pair is `vexptefp(vlogefp(x)) = x` for positive finite `x`, modulo each estimate's error budget.
|
||||
- **Big-endian word lanes.** Lane 0 is the most-significant word.
|
||||
- **NaN, ±∞.** `2^NaN = NaN`; `2^(+∞) = +∞`; `2^(-∞) = +0`. Subnormal results may be flushed to `±0` if `VSCR[NJ] = 1` (Xenon default).
|
||||
- **No exception, no `VSCR[SAT]` change, no XER change.**
|
||||
- **VMX128 sibling (`vexptefp128`).** Identical semantics with the extended encoding.
|
||||
- **Build natural exp / log via change-of-base.** `e^x = 2^(x * log2(e))`, so combine `vmaddfp` (multiply-by-constant) with `vexptefp`.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vlogefp`](vlogefp.md) — base-2 logarithm (the inverse).
|
||||
- [`vrefp`](vrefp.md) — reciprocal estimate.
|
||||
- [`vrsqrtefp`](vrsqrtefp.md) — reciprocal-square-root estimate.
|
||||
- [`vmaddfp`](vmaddfp.md) — fused multiply-add for change-of-base scaling.
|
||||
- [`vmulfp`](vmulfp.md) — float multiply (xenia helper).
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vexptefp` (Vector 2 Raised to the Exponent Estimate Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vexptefp-vector-2-raised-exponent-estimate-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Estimate Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
179
migration/project-root/ppc-manual/vmx/vlogefp.md
Normal file
179
migration/project-root/ppc-manual/vmx/vlogefp.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# `vlogefp` — Vector Log2 Estimate Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100001ca`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vlogefp` | `vlogefp` | — | Vector Log2 Estimate Floating Point |
|
||||
| `vlogefp128` | `vlogefp128` | — | Vector128 Log2 Estimate Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vlogefp [VD], [VB]
|
||||
vlogefp128 [VD], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vlogefp` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x100001ca`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `458`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vlogefp128` — form `VX128_3`
|
||||
|
||||
- **Opcode word:** `0x180006f0`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `1776`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (6) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `IMM` | 5-bit immediate |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21–27 | `XO` | extended opcode |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vlogefp: read; vlogefp128: read | Source B vector register. |
|
||||
| `VD` | vlogefp: write; vlogefp128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vlogefp`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vlogefp128`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vlogefp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vlogefp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:779`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L779)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:99`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L99)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:473`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L473)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4377-4386`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4377-L4386)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vlogefp | PpcOpcode::vlogefp128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vlogefp128);
|
||||
let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) }
|
||||
else { (instr.rb(), instr.rd()) };
|
||||
let b = ctx.vr[rb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].log2(); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vlogefp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vlogefp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:782`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L782)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:99`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L99)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:667`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L667)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4377-4386`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4377-L4386)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vlogefp | PpcOpcode::vlogefp128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vlogefp128);
|
||||
let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) }
|
||||
else { (instr.rb(), instr.rd()) };
|
||||
let b = ctx.vr[rb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].log2(); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-lane base-2 logarithm.** Each of the four word lanes computes `VD[i] = log2(VB[i])` in `binary32`. **Note:** the IBM manual specifies a low-precision estimate (≤ 1/32 ULP relative error). Xenia uses Rust's `f32::log2`, which is full-precision; hardware-precise programs may observe small numerical differences.
|
||||
- **Use `vexptefp` for the inverse.** Pair gives `2^(log2(x)) ≈ x` for positive finite `x`.
|
||||
- **Big-endian word lanes.** Lane 0 is the most-significant word.
|
||||
- **NaN, negatives, zero, ±∞.** `log2(negative)` and `log2(NaN)` produce NaN; `log2(+0) = -∞`; `log2(-0) = -∞` (per IEEE-754); `log2(+∞) = +∞`. None of these stickies `VSCR[SAT]` — float ops never touch SAT.
|
||||
- **No exception, no `VSCR[SAT]` change, no XER change.**
|
||||
- **VMX128 sibling (`vlogefp128`).** Identical semantics with the extended encoding.
|
||||
- **Natural log via change-of-base.** `ln(x) = log2(x) * (1 / log2(e))` — multiply by a constant with `vmaddfp`.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vexptefp`](vexptefp.md) — base-2 exponent (the inverse).
|
||||
- [`vrefp`](vrefp.md) — reciprocal estimate.
|
||||
- [`vrsqrtefp`](vrsqrtefp.md) — reciprocal-square-root estimate.
|
||||
- [`vmaddfp`](vmaddfp.md) — fused multiply-add for change-of-base scaling.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vlogefp` (Vector log2 Estimate Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vlogefp-vector-log2-estimate-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Estimate Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
196
migration/project-root/ppc-manual/vmx/vmaddfp.md
Normal file
196
migration/project-root/ppc-manual/vmx/vmaddfp.md
Normal file
@@ -0,0 +1,196 @@
|
||||
# `vmaddfp` — Vector Multiply-Add Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002e`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmaddfp` | `vmaddfp` | — | Vector Multiply-Add Floating Point |
|
||||
| `vmaddfp128` | `vmaddfp128` | — | Vector128 Multiply Add Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmaddfp [VD], [VA], [VC], [VB]
|
||||
vmaddfp128 [VD], [VA], [VB], [VD]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmaddfp` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x1000002e`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `46`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
### `vmaddfp128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x140000d0`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `208`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmaddfp: read; vmaddfp128: read | Source A vector register. |
|
||||
| `VC` | vmaddfp: read; vmaddfp128: read | Source C vector register / 3-bit selector. |
|
||||
| `VB` | vmaddfp: read; vmaddfp128: read | Source B vector register. |
|
||||
| `VD` | vmaddfp: write; vmaddfp128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmaddfp`
|
||||
|
||||
- **Reads (always):** `VA`, `VC`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vmaddfp128`
|
||||
|
||||
- **Reads (always):** `VA`, `VC`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
for each 32-bit float lane i in 0..3:
|
||||
VD[i] <- (VA[i] * VC[i]) + VB[i]
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmaddfp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaddfp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:801`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L801)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:100`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L100)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:588`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L588)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2038-2054`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2038-L2054)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmaddfp => {
|
||||
// vD = (vA * vC) + vB. AltiVec unconditionally flushes denormal
|
||||
// *inputs* to 0 regardless of VSCR[NJ] (confirmed on POWER8 hw).
|
||||
let a = ctx.vr[instr.ra()].as_f32x4();
|
||||
let b = ctx.vr[instr.rb()].as_f32x4();
|
||||
let c = ctx.vr[instr.rc()].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 {
|
||||
let ai = vmx::flush_denorm(a[i]);
|
||||
let bi = vmx::flush_denorm(b[i]);
|
||||
let ci = vmx::flush_denorm(c[i]);
|
||||
// PPCBUG-437: flush subnormal output too.
|
||||
r[i] = vmx::flush_denorm(ai.mul_add(ci, bi));
|
||||
}
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vmaddfp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaddfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:805`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L805)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:100`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L100)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:613`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L613)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2055-2073`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2055-L2073)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmaddfp128 => {
|
||||
// ISA: (VD) <- (VA × VD) + VB. VD is both the second multiplicand and destination.
|
||||
// Canary InstrEmit_vmaddfp128 (ppc_emit_altivec.cc:806-809): MulAdd(VA, VD, VB).
|
||||
// Previous code computed ai.mul_add(bi, di) = VA×VB+VD — VB and VD roles swapped
|
||||
// (PPCBUG-424). Fix: ai.mul_add(di, bi) = VA×VD+VB.
|
||||
let a = ctx.vr[instr.va128()].as_f32x4();
|
||||
let b = ctx.vr[instr.vb128()].as_f32x4();
|
||||
let d = ctx.vr[instr.vd128()].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 {
|
||||
let ai = vmx::flush_denorm(a[i]);
|
||||
let bi = vmx::flush_denorm(b[i]);
|
||||
let di = vmx::flush_denorm(d[i]);
|
||||
// PPCBUG-437.
|
||||
r[i] = vmx::flush_denorm(ai.mul_add(di, bi));
|
||||
}
|
||||
ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Fused multiply-add: `VD = (VA * VC) + VB`** per word lane (single rounding). No intermediate rounding between the multiply and the add — this is critical for numerical accuracy in DSP filters and reduces error in dot products.
|
||||
- **Big-endian word lanes.** Lane 0 is the most-significant word.
|
||||
- **NaN propagation, ±∞ arithmetic.** Standard IEEE-754: any NaN input yields NaN; `(+∞ * 0)` yields NaN; the sum of `+∞` and `-∞` (e.g. `(+∞ * 1) + -∞`) yields NaN. No trap, no sticky bit.
|
||||
- **`VSCR[NJ]` denormals.** With `NJ = 1` (Xenon default), denormal inputs and outputs are flushed to `±0`.
|
||||
- **No `VSCR[SAT]` change, no XER change, no exceptions.**
|
||||
- **VMX128 sibling has surprising operand layout — `VD` is also a source.** Xenia's `vmaddfp128` reads `VA`, `VB`, *and `VD` itself* (as the accumulator), computing `VD = (VA * VB) + VD_prev` ([`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs)). The standard `vmaddfp` keeps the canonical 4-operand `VA, VC, VB → VD` shape. **This is a real difference in operand encoding** (VX128_3 form vs. VA-form) that compilers must respect — VMX128 sacrifices the third source register slot for the extra register-file bits.
|
||||
- **Aliasing legal.** `vmaddfp v3, v3, v3, v3` works (squares + adds itself).
|
||||
- **Common usage.** Per-lane polynomial evaluation, dot-product accumulation, any matrix multiply inner loop. Pair four `vmaddfp` instructions to do a 4×4 × 4-vec multiply.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vnmsubfp`](vnmsubfp.md) — `−((VA * VC) − VB)`; fused negative-multiply-subtract.
|
||||
- [`vaddfp`](vaddfp.md), [`vsubfp`](vsubfp.md) — plain float add / subtract.
|
||||
- [`vmulfp`](vmulfp.md) — xenia helper for `VA * VC`; on hardware games use `vmaddfp v, va, vc, v0_zero`.
|
||||
- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — min / max for clamping.
|
||||
- [`vrefp`](vrefp.md), [`vrsqrtefp`](vrsqrtefp.md) — reciprocal / inverse-sqrt estimates that often appear in the same FMA chain.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmaddfp` (Vector Multiply-Add Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaddfp-vector-multiply-add-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Multiply-Add Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
182
migration/project-root/ppc-manual/vmx/vmaxfp.md
Normal file
182
migration/project-root/ppc-manual/vmx/vmaxfp.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# `vmaxfp` — Vector Maximum Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000040a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmaxfp` | `vmaxfp` | — | Vector Maximum Floating Point |
|
||||
| `vmaxfp128` | `vmaxfp128` | — | Vector128 Maximum Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmaxfp [VD], [VA], [VB]
|
||||
vmaxfp128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmaxfp` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000040a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1034`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vmaxfp128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x18000280`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `640`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmaxfp: read; vmaxfp128: read | Source A vector register. |
|
||||
| `VB` | vmaxfp: read; vmaxfp128: read | Source B vector register. |
|
||||
| `VD` | vmaxfp: write; vmaxfp128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmaxfp`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vmaxfp128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmaxfp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxfp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:831`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L831)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:522`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L522)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2121-2128`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2121-L2128)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmaxfp => {
|
||||
let a = ctx.vr[instr.ra()].as_f32x4();
|
||||
let b = ctx.vr[instr.rb()].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = vmx::max_nan(a[i], b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vmaxfp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:834`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L834)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:696`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L696)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2129-2136`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2129-L2136)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmaxfp128 => {
|
||||
let a = ctx.vr[instr.va128()].as_f32x4();
|
||||
let b = ctx.vr[instr.vb128()].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = vmx::max_nan(a[i], b[i]); }
|
||||
ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-lane IEEE max.** Four word lanes; `VD[i] = (VA[i] > VB[i]) ? VA[i] : VB[i]`.
|
||||
- **NaN propagation surprise.** Xenia uses `if a > b { a } else { b }`, so any NaN comparison evaluates false and the result is `VB`. The IBM manual specifies "the larger of `VA[i]` and `VB[i]`, with NaN handling such that any NaN input yields a NaN result" — this is *not* what xenia does. Hardware's `vmaxfp(NaN, x) = NaN` while xenia returns `x`. **Worth checking against `vmx.rs` for any future correctness fixes.**
|
||||
- **Sign of zero.** `vmaxfp(+0, -0)` returns `-0` in xenia (since `+0 > -0` is false → returns `b = -0`). The hardware likely returns the sign-positive zero — also worth verifying.
|
||||
- **`VSCR[NJ]` denormals.** With `NJ = 1` (Xenon default), denormal inputs are flushed to `±0` before comparison.
|
||||
- **No `VSCR[SAT]` change, no XER change, no exceptions.**
|
||||
- **Big-endian word lanes.** Lane 0 is the most-significant word.
|
||||
- **Aliasing legal.** `vmaxfp v3, v3, v4` is the standard "clamp from below by `v4`" idiom.
|
||||
- **VMX128 sibling (`vmaxfp128`).** Identical comparator semantics with the extended encoding.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vminfp`](vminfp.md) — the per-lane minimum.
|
||||
- [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — combine masks with arbitrary alternatives.
|
||||
- [`vmaddfp`](vmaddfp.md) — fused multiply-add when the max is part of a polynomial.
|
||||
- [`vmaxsw`](vmaxsw.md) — integer-word max if the lanes are signed integers.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmaxfp` (Vector Maximum Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxfp-vector-maximum-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmaxsb.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmaxsb.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmaxsb` — Vector Maximum Signed Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000102`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmaxsb` | `vmaxsb` | — | Vector Maximum Signed Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmaxsb [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmaxsb` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000102`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `258`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmaxsb: read | Source A vector register. |
|
||||
| `VB` | vmaxsb: read | Source B vector register. |
|
||||
| `VD` | vmaxsb: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmaxsb`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmaxsb`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxsb"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:838`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L838)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:454`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L454)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4407-4414`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4407-L4414)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmaxsb => {
|
||||
let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i8; 16];
|
||||
for i in 0..16 { r[i] = a[i].max(b[i]); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i8x16(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-byte signed max.** Sixteen byte lanes; `VD[i] = max(int8(VA[i]), int8(VB[i]))`. Lane 0 is the most-significant byte.
|
||||
- **Sign-aware ordering.** `vmaxsb(0xFF, 0x01) = 0x01` (i.e. `max(-1, 1) = 1`), versus [`vmaxub`](vmaxub.md) which would return `0xFF`. Pick `vmaxsb` deliberately when both sides are signed.
|
||||
- **No `VSCR` interaction, no XER, no exceptions.** Pure compare-select.
|
||||
- **Common usage.** Per-lane clamping with [`vminsb`](vminsb.md) implements `clamp(x, lo, hi)` with no branch.
|
||||
- **Aliasing legal.** `vmaxsb v3, v3, v4` is "raise lower-bound to `v4`" idiom.
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vminsb`](vminsb.md) — the matching minimum.
|
||||
- [`vmaxub`](vmaxub.md) — same width, unsigned max.
|
||||
- [`vmaxsh`](vmaxsh.md), [`vmaxsw`](vmaxsw.md) — signed max at half / word width.
|
||||
- [`vcmpgtsb`](vcmpgtsb.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection style with arbitrary fallbacks.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmaxsb` (Vector Maximum Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxsb-vector-maximum-signed-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmaxsh.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmaxsh.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmaxsh` — Vector Maximum Signed Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000142`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmaxsh` | `vmaxsh` | — | Vector Maximum Signed Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmaxsh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmaxsh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000142`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `322`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmaxsh: read | Source A vector register. |
|
||||
| `VB` | vmaxsh: read | Source B vector register. |
|
||||
| `VD` | vmaxsh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmaxsh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmaxsh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxsh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:845`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L845)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:460`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L460)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4439-4446`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4439-L4446)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmaxsh => {
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i16; 8];
|
||||
for i in 0..8 { r[i] = a[i].max(b[i]); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-half signed max.** Eight half-word lanes; `VD[i] = max(int16(VA[i]), int16(VB[i]))`. Lane 0 is the most-significant half.
|
||||
- **Sign-aware ordering.** `vmaxsh(0x8000, 0x0001) = 0x0001` (i.e. `max(-32768, 1) = 1`), versus [`vmaxuh`](vmaxuh.md) which would return `0x8000`.
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Q15 audio peak detection; signed image-processing kernels.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vminsh`](vminsh.md) — the matching minimum.
|
||||
- [`vmaxuh`](vmaxuh.md) — same width, unsigned max.
|
||||
- [`vmaxsb`](vmaxsb.md), [`vmaxsw`](vmaxsw.md) — signed max at byte / word width.
|
||||
- [`vcmpgtsh`](vcmpgtsh.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection style with arbitrary fallbacks.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmaxsh` (Vector Maximum Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxsh-vector-maximum-signed-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmaxsw.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmaxsw.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmaxsw` — Vector Maximum Signed Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000182`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmaxsw` | `vmaxsw` | — | Vector Maximum Signed Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmaxsw [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmaxsw` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000182`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `386`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmaxsw: read | Source A vector register. |
|
||||
| `VB` | vmaxsw: read | Source B vector register. |
|
||||
| `VD` | vmaxsw: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmaxsw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmaxsw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxsw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:852`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L852)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:467`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L467)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4471-4478`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4471-L4478)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmaxsw => {
|
||||
let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i32; 4];
|
||||
for i in 0..4 { r[i] = a[i].max(b[i]); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-word signed max.** Four word lanes; `VD[i] = max(int32(VA[i]), int32(VB[i]))`. Lane 0 is the most-significant word.
|
||||
- **Sign-aware ordering.** `vmaxsw(0x8000_0000, 0x0000_0001) = 0x0000_0001` (i.e. `max(INT32_MIN, 1) = 1`).
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Z-buffer "keep nearest" updates, signed counter ceilings.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vminsw`](vminsw.md) — the matching minimum.
|
||||
- [`vmaxuw`](vmaxuw.md) — same width, unsigned max.
|
||||
- [`vmaxsb`](vmaxsb.md), [`vmaxsh`](vmaxsh.md) — signed max at byte / half width.
|
||||
- [`vcmpgtsw`](vcmpgtsw.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmaxsw` (Vector Maximum Signed Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxsw-vector-maximum-signed-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmaxub.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmaxub.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmaxub` — Vector Maximum Unsigned Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000002`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmaxub` | `vmaxub` | — | Vector Maximum Unsigned Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmaxub [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmaxub` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000002`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `2`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmaxub: read | Source A vector register. |
|
||||
| `VB` | vmaxub: read | Source B vector register. |
|
||||
| `VD` | vmaxub: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmaxub`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmaxub`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxub"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:859`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L859)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:435`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L435)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4391-4398`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4391-L4398)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmaxub => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = a[i].max(b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-byte unsigned max.** Sixteen byte lanes; `VD[i] = max(uint8(VA[i]), uint8(VB[i]))`. Lane 0 is the most-significant byte.
|
||||
- **Unsigned ordering.** `vmaxub(0xFF, 0x01) = 0xFF`, opposite to [`vmaxsb`](vmaxsb.md).
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Pixel "brighter of two" channel selection; alpha mask combining.
|
||||
- **Aliasing legal.** `vmaxub v3, v3, v4` raises `v3`'s lower bound to `v4`.
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vminub`](vminub.md) — the matching minimum.
|
||||
- [`vmaxsb`](vmaxsb.md) — same width, signed max.
|
||||
- [`vmaxuh`](vmaxuh.md), [`vmaxuw`](vmaxuw.md) — unsigned max at half / word width.
|
||||
- [`vcmpgtub`](vcmpgtub.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmaxub` (Vector Maximum Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxub-vector-maximum-unsigned-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmaxuh.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmaxuh.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmaxuh` — Vector Maximum Unsigned Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000042`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmaxuh` | `vmaxuh` | — | Vector Maximum Unsigned Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmaxuh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmaxuh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000042`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `66`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmaxuh: read | Source A vector register. |
|
||||
| `VB` | vmaxuh: read | Source B vector register. |
|
||||
| `VD` | vmaxuh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmaxuh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmaxuh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxuh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:867`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L867)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:442`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L442)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4423-4430`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4423-L4430)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmaxuh => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 { r[i] = a[i].max(b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-half unsigned max.** Eight half-word lanes; `VD[i] = max(uint16(VA[i]), uint16(VB[i]))`. Lane 0 is the most-significant half.
|
||||
- **Unsigned ordering.** `vmaxuh(0xFFFF, 0x0001) = 0xFFFF`, opposite to [`vmaxsh`](vmaxsh.md).
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Audio sample magnitude tracking; UTF-16 codepoint upper bound.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vminuh`](vminuh.md) — the matching minimum.
|
||||
- [`vmaxsh`](vmaxsh.md) — same width, signed max.
|
||||
- [`vmaxub`](vmaxub.md), [`vmaxuw`](vmaxuw.md) — unsigned max at byte / word width.
|
||||
- [`vcmpgtuh`](vcmpgtuh.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmaxuh` (Vector Maximum Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxuh-vector-maximum-unsigned-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmaxuw.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmaxuw.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmaxuw` — Vector Maximum Unsigned Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000082`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmaxuw` | `vmaxuw` | — | Vector Maximum Unsigned Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmaxuw [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmaxuw` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000082`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `130`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmaxuw: read | Source A vector register. |
|
||||
| `VB` | vmaxuw: read | Source B vector register. |
|
||||
| `VD` | vmaxuw: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmaxuw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmaxuw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxuw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:875`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L875)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:449`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L449)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4455-4462`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4455-L4462)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmaxuw => {
|
||||
let a = ctx.vr[instr.ra()].as_u32x4();
|
||||
let b = ctx.vr[instr.rb()].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[i].max(b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-word unsigned max.** Four word lanes; `VD[i] = max(uint32(VA[i]), uint32(VB[i]))`. Lane 0 is the most-significant word.
|
||||
- **Unsigned ordering.** `vmaxuw(0x8000_0000, 0x0000_0001) = 0x8000_0000`, opposite to [`vmaxsw`](vmaxsw.md).
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Hashtable bucket capacity tracking, packed 32-bit ID upper bounds.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vminuw`](vminuw.md) — the matching minimum.
|
||||
- [`vmaxsw`](vmaxsw.md) — same width, signed max.
|
||||
- [`vmaxub`](vmaxub.md), [`vmaxuh`](vmaxuh.md) — unsigned max at byte / half width.
|
||||
- [`vcmpgtuw`](vcmpgtuw.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmaxuw` (Vector Maximum Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxuw-vector-maximum-unsigned-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
146
migration/project-root/ppc-manual/vmx/vmhaddshs.md
Normal file
146
migration/project-root/ppc-manual/vmx/vmhaddshs.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# `vmhaddshs` — Vector Multiply-High and Add Signed Signed Half Word Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000020`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmhaddshs` | `vmhaddshs` | — | Vector Multiply-High and Add Signed Signed Half Word Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmhaddshs [VD], [VA], [VB], [VC]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmhaddshs` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x10000020`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `32`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmhaddshs: read | Source A vector register. |
|
||||
| `VB` | vmhaddshs: read | Source B vector register. |
|
||||
| `VC` | vmhaddshs: read | Source C vector register / 3-bit selector. |
|
||||
| `VD` | vmhaddshs: write | Destination vector register. |
|
||||
| `VSCR` | vmhaddshs: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmhaddshs`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vmhaddshs`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmhaddshs`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmhaddshs"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:883`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L883)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:102`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L102)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:576`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L576)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3519-3533`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3519-L3533)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmhaddshs => {
|
||||
// vD[i] = sat_i16((vA[i] * vB[i]) >> 15 + vC[i])
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let c = crate::vmx::as_i16x8(ctx.vr[instr.rc()]);
|
||||
let mut r = [0i16; 8]; let mut sat = false;
|
||||
for i in 0..8 {
|
||||
let prod = (a[i] as i32 * b[i] as i32) >> 15;
|
||||
let (v, s) = crate::vmx::sat_i32_to_i16(prod + c[i] as i32);
|
||||
r[i] = v; sat |= s;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Q15 fixed-point multiply-add, saturating.** Eight half-word lanes; per lane:
|
||||
```
|
||||
prod = (int16(VA[i]) * int16(VB[i])) >> 15 ; truncating, no rounding
|
||||
VD[i] = clamp(prod + int16(VC[i]), -32768, +32767)
|
||||
```
|
||||
The "h" in the mnemonic is "high half" — only the upper 17 bits of the 32-bit signed product survive (after >>15), then the accumulator is added.
|
||||
- **Truncating, not rounding.** Bit 14 of the product is discarded silently. Use [`vmhraddshs`](vmhraddshs.md) when half-up rounding is needed (it adds `0x4000` to the product before the shift). The two are otherwise identical.
|
||||
- **`VSCR[SAT]` is sticky-set** if `prod + VC[i]` overflows `int16`. Cleared only by [`mtvscr`](mtvscr.md). Xenia uses `crate::vmx::sat_i32_to_i16` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)).
|
||||
- **Pathological case `0x8000 * 0x8000 >> 15`.** Equals `0x10000` in the un-saturated product = `+32768` after the shift, which overflows `int16` even before adding `VC`. The clamp then produces `+32767` and stickies SAT. This is the classic Q15 "minus-one-times-minus-one" gotcha.
|
||||
- **Big-endian half lanes.** Lane 0 is the most-significant half.
|
||||
- **No XER changes, no exceptions.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Q15 IIR / FIR filter taps, fixed-point matrix-vector multiplies for audio.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmhraddshs`](vmhraddshs.md) — same operation with rounded multiply (`+0x4000` before `>> 15`).
|
||||
- [`vmladduhm`](vmladduhm.md) — same shape, modulo (no shift, no saturate), unsigned half lanes.
|
||||
- [`vmsumshs`](vmsumshs.md), [`vmsumshm`](vmsumshm.md) — multiply-sum across pairs of lanes.
|
||||
- [`vaddshs`](vaddshs.md), [`vmaxsh`](vmaxsh.md) — saturating add and max at the same lane width, useful in the same DSP kernels.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmhaddshs` (Vector Multiply-High and Add Signed Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmhaddshs-vector-multiply-high-add-signed-half-word-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Add Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
146
migration/project-root/ppc-manual/vmx/vmhraddshs.md
Normal file
146
migration/project-root/ppc-manual/vmx/vmhraddshs.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# `vmhraddshs` — Vector Multiply-High Round and Add Signed Signed Half Word Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000021`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmhraddshs` | `vmhraddshs` | — | Vector Multiply-High Round and Add Signed Signed Half Word Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmhraddshs [VD], [VA], [VB], [VC]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmhraddshs` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x10000021`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `33`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmhraddshs: read | Source A vector register. |
|
||||
| `VB` | vmhraddshs: read | Source B vector register. |
|
||||
| `VC` | vmhraddshs: read | Source C vector register / 3-bit selector. |
|
||||
| `VD` | vmhraddshs: write | Destination vector register. |
|
||||
| `VSCR` | vmhraddshs: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmhraddshs`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vmhraddshs`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmhraddshs`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmhraddshs"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:888`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L888)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:102`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L102)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:577`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L577)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3534-3548`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3534-L3548)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmhraddshs => {
|
||||
// Rounded multiply-add: (vA[i]*vB[i] + 0x4000) >> 15 + vC[i], saturating.
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let c = crate::vmx::as_i16x8(ctx.vr[instr.rc()]);
|
||||
let mut r = [0i16; 8]; let mut sat = false;
|
||||
for i in 0..8 {
|
||||
let prod = (a[i] as i32 * b[i] as i32 + 0x4000) >> 15;
|
||||
let (v, s) = crate::vmx::sat_i32_to_i16(prod + c[i] as i32);
|
||||
r[i] = v; sat |= s;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Rounded Q15 fixed-point multiply-add, saturating.** Eight half-word lanes; per lane:
|
||||
```
|
||||
prod = (int16(VA[i]) * int16(VB[i]) + 0x4000) >> 15 ; round half-up
|
||||
VD[i] = clamp(prod + int16(VC[i]), -32768, +32767)
|
||||
```
|
||||
Identical to [`vmhaddshs`](vmhaddshs.md) except for the `+0x4000` rounding bias before the shift.
|
||||
- **Half-up rounding to even-magnitude.** The `+0x4000` bias rounds the discarded low 15 bits *toward* the nearest representable value, with ties broken away from zero. For most DSP work this is the desired behaviour and gives lower mean error than the truncating variant.
|
||||
- **`VSCR[SAT]` is sticky-set** if the final sum overflows `int16`. The rounding bias can itself push a lane that was at `+32767` past the cap — important for tight Q15 audio where the truncating form might not have saturated.
|
||||
- **Same `0x8000 * 0x8000 >> 15` gotcha** as `vmhaddshs`: the product is `+32768.5` rounded to `+32769`, which still saturates.
|
||||
- **Big-endian half lanes.** Lane 0 is the most-significant half.
|
||||
- **No XER changes, no exceptions.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** High-quality Q15 audio filter taps where round-toward-nearest is preferred over truncate-toward-zero.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmhaddshs`](vmhaddshs.md) — same op without the rounding bias.
|
||||
- [`vmladduhm`](vmladduhm.md) — same shape, modulo (no shift, no saturate), unsigned.
|
||||
- [`vmsumshs`](vmsumshs.md), [`vmsumshm`](vmsumshm.md) — multiply-sum across pairs of lanes.
|
||||
- [`vaddshs`](vaddshs.md), [`vmaxsh`](vmaxsh.md) — saturating add and max at same lane width.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmhraddshs` (Vector Multiply-High Round and Add Signed Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmhraddshs-vector-multiply-high-round-add-signed-half-word-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Add Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
182
migration/project-root/ppc-manual/vmx/vminfp.md
Normal file
182
migration/project-root/ppc-manual/vmx/vminfp.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# `vminfp` — Vector Minimum Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000044a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vminfp` | `vminfp` | — | Vector Minimum Floating Point |
|
||||
| `vminfp128` | `vminfp128` | — | Vector128 Minimum Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vminfp [VD], [VA], [VB]
|
||||
vminfp128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vminfp` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000044a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1098`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vminfp128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x180002c0`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `704`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vminfp: read; vminfp128: read | Source A vector register. |
|
||||
| `VB` | vminfp: read; vminfp128: read | Source B vector register. |
|
||||
| `VD` | vminfp: write; vminfp128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vminfp`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vminfp128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vminfp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminfp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:899`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L899)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:527`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L527)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2137-2144`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2137-L2144)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vminfp => {
|
||||
let a = ctx.vr[instr.ra()].as_f32x4();
|
||||
let b = ctx.vr[instr.rb()].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = vmx::min_nan(a[i], b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vminfp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:902`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L902)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:697`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L697)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2145-2152`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2145-L2152)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vminfp128 => {
|
||||
let a = ctx.vr[instr.va128()].as_f32x4();
|
||||
let b = ctx.vr[instr.vb128()].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = vmx::min_nan(a[i], b[i]); }
|
||||
ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-lane IEEE min.** Four word lanes; `VD[i] = (VA[i] < VB[i]) ? VA[i] : VB[i]`.
|
||||
- **NaN propagation surprise.** Xenia uses `if a < b { a } else { b }`, so any NaN comparison evaluates false and the result is `VB`. The IBM manual specifies NaN-propagating min — i.e. NaN inputs should yield NaN. Hardware's `vminfp(NaN, x) = NaN` while xenia returns `x`. **Worth checking against `vmx.rs` for any future correctness fixes.**
|
||||
- **Sign of zero.** `vminfp(+0, -0)` returns `-0` in xenia (since `+0 < -0` is false → returns `b = -0`); hardware likely returns the negative zero too via the same comparator.
|
||||
- **`VSCR[NJ]` denormals.** With `NJ = 1` (Xenon default), denormal inputs are flushed to `±0` before comparison.
|
||||
- **No `VSCR[SAT]` change, no XER change, no exceptions.**
|
||||
- **Big-endian word lanes.** Lane 0 is the most-significant word.
|
||||
- **Aliasing legal.** `vminfp v3, v3, v4` clamps `v3` from above by `v4`.
|
||||
- **VMX128 sibling (`vminfp128`).** Identical comparator semantics with the extended encoding.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmaxfp`](vmaxfp.md) — the per-lane maximum.
|
||||
- [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — combine masks with arbitrary alternatives.
|
||||
- [`vmaddfp`](vmaddfp.md) — fused multiply-add when the min is part of a polynomial.
|
||||
- [`vminsw`](vminsw.md) — integer-word min if the lanes are signed integers.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vminfp` (Vector Minimum Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminfp-vector-minimum-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vminsb.md
Normal file
130
migration/project-root/ppc-manual/vmx/vminsb.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vminsb` — Vector Minimum Signed Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000302`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vminsb` | `vminsb` | — | Vector Minimum Signed Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vminsb [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vminsb` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000302`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `770`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vminsb: read | Source A vector register. |
|
||||
| `VB` | vminsb: read | Source B vector register. |
|
||||
| `VD` | vminsb: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vminsb`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vminsb`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminsb"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:906`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L906)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:499`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L499)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4415-4422`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4415-L4422)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vminsb => {
|
||||
let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i8; 16];
|
||||
for i in 0..16 { r[i] = a[i].min(b[i]); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i8x16(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-byte signed min.** Sixteen byte lanes; `VD[i] = min(int8(VA[i]), int8(VB[i]))`. Lane 0 is the most-significant byte.
|
||||
- **Sign-aware ordering.** `vminsb(0xFF, 0x01) = 0xFF` (i.e. `min(-1, 1) = -1`), versus [`vminub`](vminub.md) which returns `0x01`.
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Pair with [`vmaxsb`](vmaxsb.md) for branchless `clamp(x, lo, hi)`.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmaxsb`](vmaxsb.md) — the matching maximum.
|
||||
- [`vminub`](vminub.md) — same width, unsigned min.
|
||||
- [`vminsh`](vminsh.md), [`vminsw`](vminsw.md) — signed min at half / word width.
|
||||
- [`vcmpgtsb`](vcmpgtsb.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vminsb` (Vector Minimum Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminsb-vector-minimum-signed-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vminsh.md
Normal file
130
migration/project-root/ppc-manual/vmx/vminsh.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vminsh` — Vector Minimum Signed Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000342`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vminsh` | `vminsh` | — | Vector Minimum Signed Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vminsh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vminsh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000342`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `834`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vminsh: read | Source A vector register. |
|
||||
| `VB` | vminsh: read | Source B vector register. |
|
||||
| `VD` | vminsh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vminsh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vminsh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminsh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:913`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L913)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:506`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L506)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4447-4454`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4447-L4454)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vminsh => {
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i16; 8];
|
||||
for i in 0..8 { r[i] = a[i].min(b[i]); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-half signed min.** Eight half-word lanes; `VD[i] = min(int16(VA[i]), int16(VB[i]))`. Lane 0 is the most-significant half.
|
||||
- **Sign-aware ordering.** `vminsh(0x8000, 0x0001) = 0x8000` (i.e. `min(-32768, 1) = -32768`).
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Q15 audio noise-floor computation; signed image-processing kernels.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmaxsh`](vmaxsh.md) — the matching maximum.
|
||||
- [`vminuh`](vminuh.md) — same width, unsigned min.
|
||||
- [`vminsb`](vminsb.md), [`vminsw`](vminsw.md) — signed min at byte / word width.
|
||||
- [`vcmpgtsh`](vcmpgtsh.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vminsh` (Vector Minimum Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminsh-vector-minimum-signed-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vminsw.md
Normal file
130
migration/project-root/ppc-manual/vmx/vminsw.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vminsw` — Vector Minimum Signed Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000382`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vminsw` | `vminsw` | — | Vector Minimum Signed Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vminsw [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vminsw` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000382`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `898`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vminsw: read | Source A vector register. |
|
||||
| `VB` | vminsw: read | Source B vector register. |
|
||||
| `VD` | vminsw: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vminsw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vminsw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminsw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:920`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L920)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:513`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L513)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4479-4486`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4479-L4486)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vminsw => {
|
||||
let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i32; 4];
|
||||
for i in 0..4 { r[i] = a[i].min(b[i]); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-word signed min.** Four word lanes; `VD[i] = min(int32(VA[i]), int32(VB[i]))`. Lane 0 is the most-significant word.
|
||||
- **Sign-aware ordering.** `vminsw(0x8000_0000, 0x0000_0001) = 0x8000_0000` (i.e. `min(INT32_MIN, 1) = INT32_MIN`).
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Z-buffer "keep furthest" updates, signed counter floors.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmaxsw`](vmaxsw.md) — the matching maximum.
|
||||
- [`vminuw`](vminuw.md) — same width, unsigned min.
|
||||
- [`vminsb`](vminsb.md), [`vminsh`](vminsh.md) — signed min at byte / half width.
|
||||
- [`vcmpgtsw`](vcmpgtsw.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vminsw` (Vector Minimum Signed Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminsw-vector-minimum-signed-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vminub.md
Normal file
130
migration/project-root/ppc-manual/vmx/vminub.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vminub` — Vector Minimum Unsigned Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000202`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vminub` | `vminub` | — | Vector Minimum Unsigned Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vminub [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vminub` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000202`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `514`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vminub: read | Source A vector register. |
|
||||
| `VB` | vminub: read | Source B vector register. |
|
||||
| `VD` | vminub: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vminub`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vminub`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminub"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:927`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L927)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:476`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L476)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4399-4406`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4399-L4406)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vminub => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = a[i].min(b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-byte unsigned min.** Sixteen byte lanes; `VD[i] = min(uint8(VA[i]), uint8(VB[i]))`. Lane 0 is the most-significant byte.
|
||||
- **Unsigned ordering.** `vminub(0xFF, 0x01) = 0x01`, opposite to [`vminsb`](vminsb.md).
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Pixel "darker of two" channel selection; alpha mask intersection.
|
||||
- **Aliasing legal.** `vminub v3, v3, v4` clamps `v3`'s upper bound to `v4`.
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmaxub`](vmaxub.md) — the matching maximum.
|
||||
- [`vminsb`](vminsb.md) — same width, signed min.
|
||||
- [`vminuh`](vminuh.md), [`vminuw`](vminuw.md) — unsigned min at half / word width.
|
||||
- [`vcmpgtub`](vcmpgtub.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vminub` (Vector Minimum Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminub-vector-minimum-unsigned-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vminuh.md
Normal file
130
migration/project-root/ppc-manual/vmx/vminuh.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vminuh` — Vector Minimum Unsigned Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000242`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vminuh` | `vminuh` | — | Vector Minimum Unsigned Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vminuh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vminuh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000242`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `578`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vminuh: read | Source A vector register. |
|
||||
| `VB` | vminuh: read | Source B vector register. |
|
||||
| `VD` | vminuh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vminuh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vminuh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminuh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:935`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L935)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:483`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L483)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4431-4438`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4431-L4438)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vminuh => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 { r[i] = a[i].min(b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-half unsigned min.** Eight half-word lanes; `VD[i] = min(uint16(VA[i]), uint16(VB[i]))`. Lane 0 is the most-significant half.
|
||||
- **Unsigned ordering.** `vminuh(0xFFFF, 0x0001) = 0x0001`, opposite to [`vminsh`](vminsh.md).
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Audio sample magnitude floor; UTF-16 codepoint lower bound.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmaxuh`](vmaxuh.md) — the matching maximum.
|
||||
- [`vminsh`](vminsh.md) — same width, signed min.
|
||||
- [`vminub`](vminub.md), [`vminuw`](vminuw.md) — unsigned min at byte / word width.
|
||||
- [`vcmpgtuh`](vcmpgtuh.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vminuh` (Vector Minimum Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminuh-vector-minimum-unsigned-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vminuw.md
Normal file
130
migration/project-root/ppc-manual/vmx/vminuw.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vminuw` — Vector Minimum Unsigned Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000282`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vminuw` | `vminuw` | — | Vector Minimum Unsigned Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vminuw [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vminuw` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000282`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `642`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vminuw: read | Source A vector register. |
|
||||
| `VB` | vminuw: read | Source B vector register. |
|
||||
| `VD` | vminuw: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vminuw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vminuw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminuw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:943`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L943)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:490`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L490)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4463-4470`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4463-L4470)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vminuw => {
|
||||
let a = ctx.vr[instr.ra()].as_u32x4();
|
||||
let b = ctx.vr[instr.rb()].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[i].min(b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-word unsigned min.** Four word lanes; `VD[i] = min(uint32(VA[i]), uint32(VB[i]))`. Lane 0 is the most-significant word.
|
||||
- **Unsigned ordering.** `vminuw(0x8000_0000, 0x0000_0001) = 0x0000_0001`, opposite to [`vminsw`](vminsw.md).
|
||||
- **No `VSCR` interaction, no XER, no exceptions.**
|
||||
- **Common usage.** Hashtable bucket capacity floors, packed 32-bit ID lower bounds.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmaxuw`](vmaxuw.md) — the matching maximum.
|
||||
- [`vminsw`](vminsw.md) — same width, signed min.
|
||||
- [`vminub`](vminub.md), [`vminuh`](vminuh.md) — unsigned min at byte / half width.
|
||||
- [`vcmpgtuw`](vcmpgtuw.md) — separate compare-and-mask path.
|
||||
- [`vsel`](vsel.md) — alternative selection.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vminuw` (Vector Minimum Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminuw-vector-minimum-unsigned-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
141
migration/project-root/ppc-manual/vmx/vmladduhm.md
Normal file
141
migration/project-root/ppc-manual/vmx/vmladduhm.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# `vmladduhm` — Vector Multiply-Low and Add Unsigned Half Word Modulo
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000022`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmladduhm` | `vmladduhm` | — | Vector Multiply-Low and Add Unsigned Half Word Modulo |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmladduhm [VD], [VA], [VB], [VC]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmladduhm` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x10000022`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `34`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmladduhm: read | Source A vector register. |
|
||||
| `VB` | vmladduhm: read | Source B vector register. |
|
||||
| `VC` | vmladduhm: read | Source C vector register / 3-bit selector. |
|
||||
| `VD` | vmladduhm: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmladduhm`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmladduhm`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmladduhm"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:951`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L951)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:104`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L104)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:578`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L578)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3549-3560`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3549-L3560)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmladduhm => {
|
||||
// Multiply-low add (modulo): vD[i] = u16(vA[i] * vB[i] + vC[i]).
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let c = ctx.vr[instr.rc()].as_u16x8();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 {
|
||||
r[i] = a[i].wrapping_mul(b[i]).wrapping_add(c[i]);
|
||||
}
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Modulo multiply-low add.** Eight half-word lanes; per lane:
|
||||
```
|
||||
VD[i] = (uint16(VA[i]) * uint16(VB[i]) + uint16(VC[i])) mod 2^16
|
||||
```
|
||||
Only the **low** 16 bits of the 32-bit product survive — the "ml" in the mnemonic = "multiply low" (versus `vmh*` for "multiply high"). This is the fastest of the multiply-add family because nothing saturates and nothing rounds.
|
||||
- **Sign-agnostic.** Modulo multiply for signed `int16` and unsigned `u16` is bit-identical at the low 16 bits, so this single instruction serves both.
|
||||
- **No `VSCR[SAT]` change.** Wrap is silent.
|
||||
- **No XER, no exceptions.**
|
||||
- **Big-endian half lanes.** Lane 0 is the most-significant half.
|
||||
- **Aliasing legal.** `vmladduhm v3, v3, v4, v3` is the standard accumulate idiom (same register as both `VA` and `VC`).
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Stride / index computation in vector loops, RGBA8 component recombination after a [`vupkhsb`](vupkhsb.md), per-element polynomial evaluation at half precision.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmhaddshs`](vmhaddshs.md) — saturating high-half signed multiply-add (Q15).
|
||||
- [`vmhraddshs`](vmhraddshs.md) — same, with rounding.
|
||||
- [`vmsumuhm`](vmsumuhm.md), [`vmsummbm`](vmsummbm.md) — multiply-sum across pairs of lanes.
|
||||
- [`vadduhm`](vadduhm.md), [`vmaxuh`](vmaxuh.md) — companion modulo / max ops at half width.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmladduhm` (Vector Multiply-Low and Add Unsigned Half Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmladduhm-vector-multiply-low-add-unsigned-half-word-modulo-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Add Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
131
migration/project-root/ppc-manual/vmx/vmrghb.md
Normal file
131
migration/project-root/ppc-manual/vmx/vmrghb.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# `vmrghb` — Vector Merge High Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000000c`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmrghb` | `vmrghb` | — | Vector Merge High Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmrghb [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmrghb` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000000c`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `12`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmrghb: read | Source A vector register. |
|
||||
| `VB` | vmrghb: read | Source B vector register. |
|
||||
| `VD` | vmrghb: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmrghb`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmrghb`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrghb"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:956`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L956)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:439`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L439)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3982-3989`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3982-L3989)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmrghb => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..8 { r[2*i] = a[i]; r[2*i+1] = b[i]; }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Interleave the high (most-significant) eight bytes** of two vectors. After execution, `VD = {VA[0], VB[0], VA[1], VB[1], …, VA[7], VB[7]}`, i.e. the eight high-order bytes of `VA` are interleaved with the eight high-order bytes of `VB`. Because lane 0 is the most-significant byte (big-endian indexing), "high" means the byte that appears at the lowest address after `stvx`.
|
||||
- **Pairs with [`vmrglb`](vmrglb.md).** Together they cover all 32 input bytes — `vmrghb` produces output of bytes 0..7 from each source, `vmrglb` of bytes 8..15. Two `vmrg*` instructions plus a [`stvx`](stvx.md) of each output produces the AoS-from-SoA transpose.
|
||||
- **Useful for unpacking 8-bit channels.** `vmrghb vRG, vR, vG` followed by `vmrghb vRGBA, vRG, vBA` interleaves four byte-streams into RGBA pixels.
|
||||
- **No `VSCR` interaction, no XER, no exceptions.** Pure permute.
|
||||
- **Aliasing legal.** `vmrghb v3, v3, v3` doubles each high byte of `v3`.
|
||||
- **No VMX128 sibling.**
|
||||
- **Equivalent to x86 `_mm_unpackhi_epi8`** with operand orientation swapped (Altivec uses big-endian lane numbering, x86 little-endian, so "high" on PPC ↔ "low" lane indices on x86).
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmrglb`](vmrglb.md) — the "low half" mirror.
|
||||
- [`vmrghh`](vmrghh.md), [`vmrghw`](vmrghw.md) — high-half merge at half / word width.
|
||||
- [`vperm`](vperm.md) — fully programmable permute when neither merge half fits.
|
||||
- [`vsldoi`](vsldoi.md) — static-offset shift-double, often paired with `vmrg*` for AoS↔SoA conversions.
|
||||
- [`vupkhsb`](vupkhsb.md) — sign-extending unpack of the high half.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmrghb` (Vector Merge High Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrghb-vector-merge-high-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
131
migration/project-root/ppc-manual/vmx/vmrghh.md
Normal file
131
migration/project-root/ppc-manual/vmx/vmrghh.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# `vmrghh` — Vector Merge High Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000004c`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmrghh` | `vmrghh` | — | Vector Merge High Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmrghh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmrghh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000004c`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `76`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmrghh: read | Source A vector register. |
|
||||
| `VB` | vmrghh: read | Source B vector register. |
|
||||
| `VD` | vmrghh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmrghh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmrghh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrghh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:968`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L968)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:446`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L446)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3998-4005`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3998-L4005)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmrghh => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..4 { r[2*i] = a[i]; r[2*i+1] = b[i]; }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Interleave the high (most-significant) four halves** of two vectors: `VD = {VA[0], VB[0], VA[1], VB[1], VA[2], VB[2], VA[3], VB[3]}`.
|
||||
- **Pairs with [`vmrglh`](vmrglh.md)** to cover the eight halves of each source. Two-instruction transpose for half-word streams.
|
||||
- **Common usage.** Interleave Q15 stereo audio: `vmrghh vL_R_high, vLeft, vRight` then `vmrglh vL_R_low, vLeft, vRight` and store to produce the natural L/R/L/R ordering.
|
||||
- **Useful for half-precision colour split.** Merge two 4-channel half-precision streams.
|
||||
- **No `VSCR` interaction, no XER, no exceptions.** Pure permute.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmrglh`](vmrglh.md) — the "low half" mirror.
|
||||
- [`vmrghb`](vmrghb.md), [`vmrghw`](vmrghw.md) — high-half merge at byte / word width.
|
||||
- [`vperm`](vperm.md) — programmable permute.
|
||||
- [`vsldoi`](vsldoi.md) — static-offset shift-double.
|
||||
- [`vupkhsh`](vupkhsh.md) — sign-extending unpack of the high half (4 halves → 4 words).
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmrghh` (Vector Merge High Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrghh-vector-merge-high-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
180
migration/project-root/ppc-manual/vmx/vmrghw.md
Normal file
180
migration/project-root/ppc-manual/vmx/vmrghw.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# `vmrghw` — Vector Merge High Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000008c`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmrghw` | `vmrghw` | — | Vector Merge High Word |
|
||||
| `vmrghw128` | `vmrghw128` | — | Vector128 Merge High Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmrghw [VD], [VA], [VB]
|
||||
vmrghw128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmrghw` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000008c`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `140`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vmrghw128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x18000300`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `768`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmrghw: read; vmrghw128: read | Source A vector register. |
|
||||
| `VB` | vmrghw: read; vmrghw128: read | Source B vector register. |
|
||||
| `VD` | vmrghw: write; vmrghw128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmrghw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vmrghw128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmrghw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrghw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:989`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L989)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:451`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L451)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2378-2385`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2378-L2385)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmrghw | PpcOpcode::vmrghw128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
// Merge high words: [a0, b0, a1, b1]
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4(a[0], b[0], a[1], b[1]);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vmrghw128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrghw128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:992`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L992)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:698`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L698)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2378-2385`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2378-L2385)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmrghw | PpcOpcode::vmrghw128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
// Merge high words: [a0, b0, a1, b1]
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4(a[0], b[0], a[1], b[1]);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Interleave the high (most-significant) two words** of two vectors: `VD = {VA[0], VB[0], VA[1], VB[1]}`. Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word.
|
||||
- **Pairs with [`vmrglw`](vmrglw.md)** to cover the four words of each source. Two-instruction word-level transpose.
|
||||
- **Common usage.** Interleave matrix rows during a 4×4 transpose: four `vmrgh*`/`vmrgl*` pairs swap rows and columns of a 4×4 packed-float matrix.
|
||||
- **No `VSCR` interaction, no XER, no exceptions.** Pure permute.
|
||||
- **Aliasing legal.** `vmrghw v3, v3, v3` doubles each high word.
|
||||
- **VMX128 sibling (`vmrghw128`).** Identical semantics with the extended encoding; xenia routes via `vmx_reg_triple`.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmrglw`](vmrglw.md) — the "low half" mirror.
|
||||
- [`vmrghb`](vmrghb.md), [`vmrghh`](vmrghh.md) — high-half merge at byte / half width.
|
||||
- [`vperm`](vperm.md), [`vsldoi`](vsldoi.md) — programmable / static permute primitives.
|
||||
- [`vupkhsh`](vupkhsh.md) — sign-extending unpack of the high half (4 halves → 4 words).
|
||||
- [`vspltw`](vspltw.md) — broadcast a single word for blending tasks.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmrghw` (Vector Merge High Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrghw-vector-merge-high-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmrglb.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmrglb.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmrglb` — Vector Merge Low Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000010c`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmrglb` | `vmrglb` | — | Vector Merge Low Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmrglb [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmrglb` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000010c`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `268`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmrglb: read | Source A vector register. |
|
||||
| `VB` | vmrglb: read | Source B vector register. |
|
||||
| `VD` | vmrglb: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmrglb`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmrglb`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrglb"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:996`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L996)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:458`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L458)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3990-3997`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3990-L3997)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmrglb => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..8 { r[2*i] = a[8+i]; r[2*i+1] = b[8+i]; }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Interleave the low (least-significant) eight bytes** of two vectors: `VD = {VA[8], VB[8], VA[9], VB[9], …, VA[15], VB[15]}`. "Low" in PPC big-endian terms means the eight bytes at the higher byte addresses after `stvx`.
|
||||
- **Pairs with [`vmrghb`](vmrghb.md).** Together they cover all 32 input bytes — one `vmrghb` plus one `vmrglb` is the standard 16-byte "interleave-then-store" pattern.
|
||||
- **Common usage.** Second half of an AoS-from-SoA transpose for 8-bit channels (the high half is produced by `vmrghb`, the low half by `vmrglb`).
|
||||
- **No `VSCR` interaction, no XER, no exceptions.** Pure permute.
|
||||
- **Aliasing legal.** `vmrglb v3, v3, v3` doubles each low byte of `v3`.
|
||||
- **No VMX128 sibling.**
|
||||
- **Equivalent to x86 `_mm_unpacklo_epi8`** modulo lane-numbering convention.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmrghb`](vmrghb.md) — the "high half" mirror.
|
||||
- [`vmrglh`](vmrglh.md), [`vmrglw`](vmrglw.md) — low-half merge at half / word width.
|
||||
- [`vperm`](vperm.md), [`vsldoi`](vsldoi.md) — programmable / static permute primitives.
|
||||
- [`vupklsb`](vupklsb.md) — sign-extending unpack of the low half.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmrglb` (Vector Merge Low Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrglb-vector-merge-low-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
129
migration/project-root/ppc-manual/vmx/vmrglh.md
Normal file
129
migration/project-root/ppc-manual/vmx/vmrglh.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# `vmrglh` — Vector Merge Low Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000014c`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmrglh` | `vmrglh` | — | Vector Merge Low Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmrglh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmrglh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000014c`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `332`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmrglh: read | Source A vector register. |
|
||||
| `VB` | vmrglh: read | Source B vector register. |
|
||||
| `VD` | vmrglh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmrglh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmrglh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrglh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1008`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1008)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:464`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L464)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4006-4013`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4006-L4013)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmrglh => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..4 { r[2*i] = a[4+i]; r[2*i+1] = b[4+i]; }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Interleave the low (least-significant) four halves** of two vectors: `VD = {VA[4], VB[4], VA[5], VB[5], VA[6], VB[6], VA[7], VB[7]}`.
|
||||
- **Pairs with [`vmrghh`](vmrghh.md)** to interleave the entire 8-half source range. The two instructions plus a [`stvx`](stvx.md) of each result produces an interleaved 16-half stream from two 8-half streams.
|
||||
- **Common usage.** Stereo Q15 audio interleave (low half of stream); paired with `vupklsh` for sign-extending unpack.
|
||||
- **No `VSCR` interaction, no XER, no exceptions.** Pure permute.
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmrghh`](vmrghh.md) — the "high half" mirror.
|
||||
- [`vmrglb`](vmrglb.md), [`vmrglw`](vmrglw.md) — low-half merge at byte / word width.
|
||||
- [`vperm`](vperm.md), [`vsldoi`](vsldoi.md) — programmable / static permute primitives.
|
||||
- [`vupklsh`](vupklsh.md) — sign-extending unpack of the low half (4 halves → 4 words).
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmrglh` (Vector Merge Low Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrglh-vector-merge-low-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
179
migration/project-root/ppc-manual/vmx/vmrglw.md
Normal file
179
migration/project-root/ppc-manual/vmx/vmrglw.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# `vmrglw` — Vector Merge Low Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000018c`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmrglw` | `vmrglw` | — | Vector Merge Low Word |
|
||||
| `vmrglw128` | `vmrglw128` | — | Vector128 Merge Low Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmrglw [VD], [VA], [VB]
|
||||
vmrglw128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmrglw` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000018c`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `396`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vmrglw128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x18000340`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `832`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmrglw: read; vmrglw128: read | Source A vector register. |
|
||||
| `VB` | vmrglw: read; vmrglw128: read | Source B vector register. |
|
||||
| `VD` | vmrglw: write; vmrglw128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmrglw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vmrglw128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmrglw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrglw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1030`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1030)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:470`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L470)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2386-2393`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2386-L2393)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmrglw | PpcOpcode::vmrglw128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
// Merge low words: [a2, b2, a3, b3]
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4(a[2], b[2], a[3], b[3]);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vmrglw128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrglw128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1033`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1033)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:699`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L699)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2386-2393`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2386-L2393)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmrglw | PpcOpcode::vmrglw128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
// Merge low words: [a2, b2, a3, b3]
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4(a[2], b[2], a[3], b[3]);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Interleave the low (least-significant) two words** of two vectors: `VD = {VA[2], VB[2], VA[3], VB[3]}`. Lane 0 is the most-significant word.
|
||||
- **Pairs with [`vmrghw`](vmrghw.md)** to cover the four words of each source. Two-instruction word-level transpose.
|
||||
- **Common usage.** Bottom half of a 4×4 packed-float matrix transpose; second-half RGBA pixel re-pack after a `vmrghw`.
|
||||
- **No `VSCR` interaction, no XER, no exceptions.** Pure permute.
|
||||
- **Aliasing legal.**
|
||||
- **VMX128 sibling (`vmrglw128`).** Identical semantics with the extended encoding; xenia routes both via `vmx_reg_triple`.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmrghw`](vmrghw.md) — the "high half" mirror.
|
||||
- [`vmrglb`](vmrglb.md), [`vmrglh`](vmrglh.md) — low-half merge at byte / half width.
|
||||
- [`vperm`](vperm.md), [`vsldoi`](vsldoi.md) — programmable / static permute primitives.
|
||||
- [`vspltw`](vspltw.md) — broadcast a single word for blending tasks.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmrglw` (Vector Merge Low Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrglw-vector-merge-low-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
146
migration/project-root/ppc-manual/vmx/vmsummbm.md
Normal file
146
migration/project-root/ppc-manual/vmx/vmsummbm.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# `vmsummbm` — Vector Multiply-Sum Mixed-Sign Byte Modulo
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000025`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmsummbm` | `vmsummbm` | — | Vector Multiply-Sum Mixed-Sign Byte Modulo |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmsummbm [VD], [VA], [VB], [VC]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmsummbm` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x10000025`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `37`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmsummbm: read | Source A vector register. |
|
||||
| `VB` | vmsummbm: read | Source B vector register. |
|
||||
| `VC` | vmsummbm: read | Source C vector register / 3-bit selector. |
|
||||
| `VD` | vmsummbm: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmsummbm`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmsummbm`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsummbm"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1037`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1037)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:580`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L580)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3579-3594`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3579-L3594)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmsummbm => {
|
||||
// signed bytes × unsigned bytes, signed accumulator
|
||||
let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]);
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let c = crate::vmx::as_i32x4(ctx.vr[instr.rc()]);
|
||||
let mut r = [0i32; 4];
|
||||
for i in 0..4 {
|
||||
let mut s = c[i];
|
||||
for j in 0..4 {
|
||||
s = s.wrapping_add(a[4*i+j] as i32 * b[4*i+j] as i32);
|
||||
}
|
||||
r[i] = s;
|
||||
}
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Mixed signed×unsigned multiply-sum, modulo.** The "m" / "b" / "m" of `vmsummbm` decode as: `m`=mixed (signed `VA` × unsigned `VB`), `b`=byte lanes, `m`=modulo accumulator. Per word lane:
|
||||
```
|
||||
VD[i] = (VC[i] + Σ_{j=0..3} int8(VA[4*i + j]) * uint8(VB[4*i + j])) mod 2^32
|
||||
```
|
||||
Four signed-byte × unsigned-byte products are summed with a signed-word accumulator from `VC`, into a single signed word.
|
||||
- **Mixed signedness is unique to this instruction** — it's the canonical "signed pixel weight × unsigned pixel value" combo for filter convolution.
|
||||
- **No `VSCR[SAT]` change.** Modulo wrap; the saturating sibling for byte lanes does not exist (Altivec only provides a saturating `vmsum` for half-word widths).
|
||||
- **Big-endian byte lanes.** Lane 0 is the most-significant byte; the four contributing bytes for output word `i` are bytes `4*i .. 4*i+3`.
|
||||
- **No XER, no exceptions.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Per-tile signed-weight pixel sums; H.264-style 4-tap signed filter on byte data.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmsumubm`](vmsumubm.md) — same shape, both sources unsigned (no signed weights).
|
||||
- [`vmsumshm`](vmsumshm.md) / [`vmsumshs`](vmsumshs.md) — half-word × half-word multiply-sum (modulo / saturate).
|
||||
- [`vmsumuhm`](vmsumuhm.md) / [`vmsumuhs`](vmsumuhs.md) — unsigned half-word multiply-sum.
|
||||
- [`vmladduhm`](vmladduhm.md) — per-lane multiply-add at half width (no horizontal reduction).
|
||||
- [`vsumsws`](vsumsws.md), [`vsum4sbs`](vsum4sbs.md) — pure horizontal sums.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmsummbm` (Vector Multiply-Sum Mixed-Sign Byte Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsummbm-vector-multiply-sum-mixed-sign-byte-modulo-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
144
migration/project-root/ppc-manual/vmx/vmsumshm.md
Normal file
144
migration/project-root/ppc-manual/vmx/vmsumshm.md
Normal file
@@ -0,0 +1,144 @@
|
||||
# `vmsumshm` — Vector Multiply-Sum Signed Half Word Modulo
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000028`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmsumshm` | `vmsumshm` | — | Vector Multiply-Sum Signed Half Word Modulo |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmsumshm [VD], [VA], [VB], [VC]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmsumshm` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x10000028`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `40`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmsumshm: read | Source A vector register. |
|
||||
| `VB` | vmsumshm: read | Source B vector register. |
|
||||
| `VC` | vmsumshm: read | Source C vector register / 3-bit selector. |
|
||||
| `VD` | vmsumshm: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmsumshm`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmsumshm`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsumshm"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1042`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1042)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:583`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L583)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3625-3638`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3625-L3638)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmsumshm => {
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let c = crate::vmx::as_i32x4(ctx.vr[instr.rc()]);
|
||||
let mut r = [0i32; 4];
|
||||
for i in 0..4 {
|
||||
let s = (a[2*i] as i32 * b[2*i] as i32)
|
||||
.wrapping_add(a[2*i+1] as i32 * b[2*i+1] as i32)
|
||||
.wrapping_add(c[i]);
|
||||
r[i] = s;
|
||||
}
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Signed half-word multiply-sum, modulo.** Per word lane:
|
||||
```
|
||||
VD[i] = (VC[i] + int16(VA[2*i]) * int16(VB[2*i])
|
||||
+ int16(VA[2*i+1]) * int16(VB[2*i+1])) mod 2^32
|
||||
```
|
||||
Two signed-half × signed-half products plus a signed-word accumulator → one signed word per output lane.
|
||||
- **Modulo wrap, never saturates.** **`VSCR[SAT]` is not touched** — wraparound silently. Use [`vmsumshs`](vmsumshs.md) for the saturating variant.
|
||||
- **Big-endian half lanes.** Lane 0 is the most-significant half; output word `i` consumes halves `2*i` and `2*i+1`.
|
||||
- **No XER, no exceptions.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Q15 dot products of paired audio samples, 2-tap signed FIR coefficients.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmsumshs`](vmsumshs.md) — same shape with saturating output.
|
||||
- [`vmsumuhm`](vmsumuhm.md) / [`vmsumuhs`](vmsumuhs.md) — unsigned half multiply-sum.
|
||||
- [`vmsumubm`](vmsumubm.md), [`vmsummbm`](vmsummbm.md) — multiply-sum at byte width.
|
||||
- [`vmhaddshs`](vmhaddshs.md), [`vmhraddshs`](vmhraddshs.md) — per-lane multiply-add (no horizontal reduction).
|
||||
- [`vsum2sws`](vsum2sws.md), [`vsumsws`](vsumsws.md) — pure horizontal sums.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmsumshm` (Vector Multiply-Sum Signed Half Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsumshm-vector-multiply-sum-signed-half-word-modulo-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
149
migration/project-root/ppc-manual/vmx/vmsumshs.md
Normal file
149
migration/project-root/ppc-manual/vmx/vmsumshs.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# `vmsumshs` — Vector Multiply-Sum Signed Half Word Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000029`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmsumshs` | `vmsumshs` | — | Vector Multiply-Sum Signed Half Word Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmsumshs [VD], [VA], [VB], [VC]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmsumshs` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x10000029`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `41`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmsumshs: read | Source A vector register. |
|
||||
| `VB` | vmsumshs: read | Source B vector register. |
|
||||
| `VC` | vmsumshs: read | Source C vector register / 3-bit selector. |
|
||||
| `VD` | vmsumshs: write | Destination vector register. |
|
||||
| `VSCR` | vmsumshs: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmsumshs`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vmsumshs`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmsumshs`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsumshs"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1047`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1047)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:584`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L584)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3639-3655`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3639-L3655)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmsumshs => {
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let c = crate::vmx::as_i32x4(ctx.vr[instr.rc()]);
|
||||
let mut r = [0i32; 4]; let mut sat = false;
|
||||
for i in 0..4 {
|
||||
// Running-sum saturation: accumulate in i64, clamp once at end.
|
||||
let s = (a[2*i] as i64 * b[2*i] as i64)
|
||||
+ (a[2*i+1] as i64 * b[2*i+1] as i64)
|
||||
+ c[i] as i64;
|
||||
let (v, o) = crate::vmx::sat_i64_to_i32(s);
|
||||
r[i] = v; sat |= o;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Signed half-word multiply-sum, saturating.** Per word lane:
|
||||
```
|
||||
VD[i] = clamp(VC[i] + int16(VA[2*i]) * int16(VB[2*i])
|
||||
+ int16(VA[2*i+1]) * int16(VB[2*i+1]), INT32_MIN, INT32_MAX)
|
||||
```
|
||||
Two signed-half × signed-half products plus a signed-word accumulator, clamped to `int32`.
|
||||
- **Wide-then-clamp ordering.** Xenia accumulates into `i64` first and clamps the *final* sum to `int32`, exactly matching the IBM specification ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). This avoids spurious mid-sum saturation that would happen if the products were clamped individually.
|
||||
- **`VSCR[SAT]` is sticky-set** if any of the four lane sums saturates. Cleared only via [`mtvscr`](mtvscr.md).
|
||||
- **Big-endian half lanes.** Lane 0 is the most-significant half.
|
||||
- **No XER, no exceptions.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** High-precision dot products, audio FIR taps with overflow detection, signed-pixel filter convolution.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmsumshm`](vmsumshm.md) — same shape, modulo (no clamp, no SAT flag).
|
||||
- [`vmsumuhs`](vmsumuhs.md) — unsigned half multiply-sum, saturating.
|
||||
- [`vmsummbm`](vmsummbm.md), [`vmsumubm`](vmsumubm.md) — multiply-sum at byte width.
|
||||
- [`vaddsws`](vaddsws.md) — saturating word add for further accumulation.
|
||||
- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear `VSCR[SAT]`.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmsumshs` (Vector Multiply-Sum Signed Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsumshs-vector-multiply-sum-signed-half-word-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
143
migration/project-root/ppc-manual/vmx/vmsumubm.md
Normal file
143
migration/project-root/ppc-manual/vmx/vmsumubm.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# `vmsumubm` — Vector Multiply-Sum Unsigned Byte Modulo
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000024`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmsumubm` | `vmsumubm` | — | Vector Multiply-Sum Unsigned Byte Modulo |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmsumubm [VD], [VA], [VB], [VC]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmsumubm` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x10000024`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `36`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmsumubm: read | Source A vector register. |
|
||||
| `VB` | vmsumubm: read | Source B vector register. |
|
||||
| `VC` | vmsumubm: read | Source C vector register / 3-bit selector. |
|
||||
| `VD` | vmsumubm: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmsumubm`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmsumubm`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsumubm"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1052`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1052)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:579`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L579)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3564-3578`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3564-L3578)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmsumubm => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let c = ctx.vr[instr.rc()].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 {
|
||||
let mut s = c[i];
|
||||
for j in 0..4 {
|
||||
s = s.wrapping_add(a[4*i+j] as u32 * b[4*i+j] as u32);
|
||||
}
|
||||
r[i] = s;
|
||||
}
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Unsigned byte multiply-sum, modulo.** Per word lane:
|
||||
```
|
||||
VD[i] = (VC[i] + Σ_{j=0..3} uint8(VA[4*i + j]) * uint8(VB[4*i + j])) mod 2^32
|
||||
```
|
||||
Four unsigned-byte × unsigned-byte products and an unsigned-word accumulator from `VC`, summed into one unsigned word per lane.
|
||||
- **Modulo wrap, never saturates.** **`VSCR[SAT]` is not touched** — wraparound silently.
|
||||
- **Big-endian byte lanes.** Lane 0 is the most-significant byte; output word `i` consumes bytes `4*i .. 4*i+3`.
|
||||
- **No XER, no exceptions.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Pixel-component dot products (RGBA × weights packed as bytes); 4-tap unsigned convolution; per-pixel "intensity sum" where the weights are byte-quantised.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmsummbm`](vmsummbm.md) — same shape, signed × unsigned (mixed-sign).
|
||||
- [`vmsumuhm`](vmsumuhm.md) / [`vmsumuhs`](vmsumuhs.md) — unsigned half multiply-sum (modulo / saturate).
|
||||
- [`vmsumshm`](vmsumshm.md) / [`vmsumshs`](vmsumshs.md) — signed half multiply-sum.
|
||||
- [`vsum4ubs`](vsum4ubs.md) — pure horizontal sum of bytes into words.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmsumubm` (Vector Multiply-Sum Unsigned Byte Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsumubm-vector-multiply-sum-unsigned-byte-modulo-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
143
migration/project-root/ppc-manual/vmx/vmsumuhm.md
Normal file
143
migration/project-root/ppc-manual/vmx/vmsumuhm.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# `vmsumuhm` — Vector Multiply-Sum Unsigned Half Word Modulo
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000026`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmsumuhm` | `vmsumuhm` | — | Vector Multiply-Sum Unsigned Half Word Modulo |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmsumuhm [VD], [VA], [VB], [VC]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmsumuhm` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x10000026`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `38`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmsumuhm: read | Source A vector register. |
|
||||
| `VB` | vmsumuhm: read | Source B vector register. |
|
||||
| `VC` | vmsumuhm: read | Source C vector register / 3-bit selector. |
|
||||
| `VD` | vmsumuhm: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmsumuhm`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmsumuhm`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsumuhm"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1057`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1057)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:581`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L581)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3595-3608`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3595-L3608)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmsumuhm => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let c = ctx.vr[instr.rc()].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 {
|
||||
let s = (a[2*i] as u32 * b[2*i] as u32)
|
||||
.wrapping_add(a[2*i+1] as u32 * b[2*i+1] as u32)
|
||||
.wrapping_add(c[i]);
|
||||
r[i] = s;
|
||||
}
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Unsigned half-word multiply-sum, modulo.** Per word lane:
|
||||
```
|
||||
VD[i] = (VC[i] + uint16(VA[2*i]) * uint16(VB[2*i])
|
||||
+ uint16(VA[2*i+1]) * uint16(VB[2*i+1])) mod 2^32
|
||||
```
|
||||
Two unsigned-half × unsigned-half products and an unsigned-word accumulator → one unsigned word per output lane.
|
||||
- **Modulo wrap, never saturates.** **`VSCR[SAT]` is not touched.**
|
||||
- **Big-endian half lanes.** Lane 0 is the most-significant half.
|
||||
- **No XER, no exceptions.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Unsigned 16-bit FIR taps; pair-wise component sums for half-precision colour data.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmsumuhs`](vmsumuhs.md) — same shape, saturating output.
|
||||
- [`vmsumshm`](vmsumshm.md) / [`vmsumshs`](vmsumshs.md) — signed half multiply-sum.
|
||||
- [`vmsumubm`](vmsumubm.md), [`vmsummbm`](vmsummbm.md) — multiply-sum at byte width.
|
||||
- [`vmladduhm`](vmladduhm.md) — per-lane multiply-add at half width.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmsumuhm` (Vector Multiply-Sum Unsigned Half Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsumuhm-vector-multiply-sum-unsigned-half-word-modulo-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
148
migration/project-root/ppc-manual/vmx/vmsumuhs.md
Normal file
148
migration/project-root/ppc-manual/vmx/vmsumuhs.md
Normal file
@@ -0,0 +1,148 @@
|
||||
# `vmsumuhs` — Vector Multiply-Sum Unsigned Half Word Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000027`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmsumuhs` | `vmsumuhs` | — | Vector Multiply-Sum Unsigned Half Word Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmsumuhs [VD], [VA], [VB], [VC]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmsumuhs` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x10000027`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `39`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmsumuhs: read | Source A vector register. |
|
||||
| `VB` | vmsumuhs: read | Source B vector register. |
|
||||
| `VC` | vmsumuhs: read | Source C vector register / 3-bit selector. |
|
||||
| `VD` | vmsumuhs: write | Destination vector register. |
|
||||
| `VSCR` | vmsumuhs: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmsumuhs`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vmsumuhs`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmsumuhs`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsumuhs"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1062`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1062)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:582`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L582)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3609-3624`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3609-L3624)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmsumuhs => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let c = ctx.vr[instr.rc()].as_u32x4();
|
||||
let mut r = [0u32; 4]; let mut sat = false;
|
||||
for i in 0..4 {
|
||||
let s = (a[2*i] as u64 * b[2*i] as u64)
|
||||
+ (a[2*i+1] as u64 * b[2*i+1] as u64)
|
||||
+ c[i] as u64;
|
||||
let (v, overflow) = if s > u32::MAX as u64 { (u32::MAX, true) } else { (s as u32, false) };
|
||||
r[i] = v; sat |= overflow;
|
||||
}
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Unsigned half-word multiply-sum, saturating.** Per word lane:
|
||||
```
|
||||
VD[i] = clamp(VC[i] + uint16(VA[2*i]) * uint16(VB[2*i])
|
||||
+ uint16(VA[2*i+1]) * uint16(VB[2*i+1]), 0, UINT32_MAX)
|
||||
```
|
||||
Two unsigned-half × unsigned-half products plus an unsigned-word accumulator, clamped to `uint32`.
|
||||
- **Wide-then-clamp ordering.** Xenia accumulates into `u64` first and clamps the *final* sum to `u32` ([`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs)) — matches the IBM spec.
|
||||
- **`VSCR[SAT]` is sticky-set** if any lane clamps. Only the upper bound `0xFFFF_FFFF` ever triggers; unsigned overflow on the low side is impossible.
|
||||
- **Big-endian half lanes.** Lane 0 is the most-significant half.
|
||||
- **No XER, no exceptions.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Per-pixel summed-area calculations with overflow detection; high-precision unsigned-half FIR convolution.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmsumuhm`](vmsumuhm.md) — same shape, modulo (no clamp, no SAT flag).
|
||||
- [`vmsumshs`](vmsumshs.md) — signed half multiply-sum, saturating.
|
||||
- [`vmsumubm`](vmsumubm.md), [`vmsummbm`](vmsummbm.md) — multiply-sum at byte width.
|
||||
- [`vadduws`](vadduws.md) — unsigned saturating word add for further accumulation.
|
||||
- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear `VSCR[SAT]`.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmsumuhs` (Vector Multiply-Sum Unsigned Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsumuhs-vector-multiply-sum-unsigned-half-word-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
136
migration/project-root/ppc-manual/vmx/vmulesb.md
Normal file
136
migration/project-root/ppc-manual/vmx/vmulesb.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# `vmulesb` — Vector Multiply Even Signed Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000308`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmulesb` | `vmulesb` | — | Vector Multiply Even Signed Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmulesb [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmulesb` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000308`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `776`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmulesb: read | Source A vector register. |
|
||||
| `VB` | vmulesb: read | Source B vector register. |
|
||||
| `VD` | vmulesb: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmulesb`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmulesb`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulesb"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1086`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1086)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:108`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L108)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:501`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L501)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3469-3476`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3469-L3476)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmulesb => {
|
||||
let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i16; 8];
|
||||
for i in 0..8 { r[i] = a[2 * i] as i16 * b[2 * i] as i16; }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Even-byte signed multiply, half-word result.** Per output half lane:
|
||||
```
|
||||
VD[i] = int16(int8(VA[2*i]) * int8(VB[2*i])) ; for i = 0..7
|
||||
```
|
||||
Only the eight **even-indexed** byte lanes (lanes 0, 2, 4, …, 14 in big-endian) are read from each source. Each `int8 × int8` product is widened to `int16`, producing eight half-word results that fill all of `VD`.
|
||||
- **No saturation, no `VSCR[SAT]`.** The full 16-bit product of two signed bytes always fits in `int16` (range `-127*-128 = +16256 .. +127*+127 = +16129` is well within `±32767`), so no clipping is needed — even at the bit-pattern extremes `(-128) * (-128) = +16384` is representable.
|
||||
- **Pairs with [`vmulosb`](vmulosb.md)** (odd-byte sibling). Together they consume all 16 bytes; two instructions are needed for a "multiply every lane" 16×16-bit byte multiply.
|
||||
- **Big-endian byte indexing.** Even byte indices `0, 2, 4, …, 14` correspond to the high-order halves of each half-word slot.
|
||||
- **No XER, no exceptions.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Signed-coefficient byte multiply for image filters; first half of a 16-byte signed multiply when paired with `vmulosb`.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmulosb`](vmulosb.md) — odd-byte sibling (lanes 1, 3, …, 15).
|
||||
- [`vmuleub`](vmuleub.md), [`vmuloub`](vmuloub.md) — same split, unsigned.
|
||||
- [`vmulesh`](vmulesh.md), [`vmulosh`](vmulosh.md) — same family at half width (→ word results).
|
||||
- [`vmladduhm`](vmladduhm.md) — per-lane modulo multiply-add (low half only).
|
||||
- [`vpkshus`](vpkshus.md) — saturating pack down from `int16` halves to `uint8` bytes (combine with `vmule*` for "scale + clamp" pipelines).
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmulesb` (Vector Multiply Even Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmulesb-vector-multiply-even-signed-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
136
migration/project-root/ppc-manual/vmx/vmulesh.md
Normal file
136
migration/project-root/ppc-manual/vmx/vmulesh.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# `vmulesh` — Vector Multiply Even Signed Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000348`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmulesh` | `vmulesh` | — | Vector Multiply Even Signed Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmulesh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmulesh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000348`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `840`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmulesh: read | Source A vector register. |
|
||||
| `VB` | vmulesh: read | Source B vector register. |
|
||||
| `VD` | vmulesh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmulesh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmulesh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulesh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1091`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1091)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:108`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L108)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:508`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L508)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3501-3508`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3501-L3508)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmulesh => {
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i32; 4];
|
||||
for i in 0..4 { r[i] = a[2 * i] as i32 * b[2 * i] as i32; }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Even-half signed multiply, word result.** Per output word lane:
|
||||
```
|
||||
VD[i] = int32(int16(VA[2*i]) * int16(VB[2*i])) ; for i = 0..3
|
||||
```
|
||||
Only the four **even-indexed** half lanes (lanes 0, 2, 4, 6 in big-endian) are read. Each `int16 × int16` product is widened to `int32`, producing four word results.
|
||||
- **No saturation, no `VSCR[SAT]`.** The full 32-bit product of two signed `int16` always fits — even `(-32768) * (-32768) = +1_073_741_824` is well within `INT32_MAX`.
|
||||
- **Pairs with [`vmulosh`](vmulosh.md)** (odd-half sibling). Two instructions to cover all eight half lanes.
|
||||
- **Big-endian half indexing.** Even half indices `0, 2, 4, 6` correspond to the high-order words of each word slot.
|
||||
- **No XER, no exceptions.**
|
||||
- **Aliasing legal.**
|
||||
- **No VMX128 sibling.**
|
||||
- **Common usage.** Q15 × Q15 dot products with full 32-bit precision; signed-half-coefficient FIR taps; first half of a "multiply every half lane" sequence when paired with `vmulosh`.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmulosh`](vmulosh.md) — odd-half sibling (lanes 1, 3, 5, 7).
|
||||
- [`vmuleuh`](vmuleuh.md), [`vmulouh`](vmulouh.md) — same split, unsigned.
|
||||
- [`vmulesb`](vmulesb.md), [`vmulosb`](vmulosb.md) — same family at byte width (→ half-word results).
|
||||
- [`vmsumshm`](vmsumshm.md), [`vmsumshs`](vmsumshs.md) — signed half multiply-sum across pairs (different shape).
|
||||
- [`vmladduhm`](vmladduhm.md) — per-lane modulo multiply-add (low half only).
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmulesh` (Vector Multiply Even Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmulesh-vector-multiply-even-signed-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
131
migration/project-root/ppc-manual/vmx/vmuleub.md
Normal file
131
migration/project-root/ppc-manual/vmx/vmuleub.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# `vmuleub` — Vector Multiply Even Unsigned Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000208`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmuleub` | `vmuleub` | — | Vector Multiply Even Unsigned Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmuleub [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmuleub` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000208`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `520`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmuleub: read | Source A vector register. |
|
||||
| `VB` | vmuleub: read | Source B vector register. |
|
||||
| `VD` | vmuleub: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmuleub`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmuleub`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmuleub"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1096`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1096)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:108`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L108)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:478`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L478)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3453-3460`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3453-L3460)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmuleub => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 { r[i] = a[2 * i] as u16 * b[2 * i] as u16; }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Even-lane multiply.** Only the *even-indexed* bytes of `VA` and `VB` participate — lanes 0, 2, 4, 6, 8, 10, 12, 14 (big-endian indexing, MSB-first). Each unsigned-byte × unsigned-byte product widens to an unsigned 16-bit half-word and is written to the corresponding half-word of `VD`. The odd lanes are ignored.
|
||||
- **Lane-count reduction.** Input has 16 byte lanes; output has 8 half-word lanes. The pairing is `VD.h[i] = VA.b[2*i] * VB.b[2*i]` for `i ∈ 0..7`.
|
||||
- **No overflow possible.** 8-bit × 8-bit unsigned ≤ `0xFF * 0xFF = 0xFE01`, which fits in 16 bits. `VSCR[SAT]` is **not** touched; this is a modulo-equivalent op even though no modulo is needed.
|
||||
- **Pair with [`vmuloub`](vmuloub.md) to get all 16 products.** Software that wants every byte × byte product typically issues `vmuleub` + `vmuloub` and then interleaves the two half-word vectors (`vmrghh`/`vmrglh`) or sums them (`vmsumubm`).
|
||||
- **No `Rc`, no XER, no FPSCR.** VMX multiply never touches CR, CA, OV, or VSCR.
|
||||
- **No VMX128 sibling.** Xbox 360 code that needs this pattern typically goes through [`vmsumubm`](vmsumubm.md) instead.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmuloub`](vmuloub.md) — odd-lane twin (bytes 1, 3, …, 15).
|
||||
- [`vmulesb`](vmulesb.md), [`vmulosb`](vmulosb.md) — signed-byte even/odd multiplies.
|
||||
- [`vmuleuh`](vmuleuh.md), [`vmulouh`](vmulouh.md) — unsigned-half-word even/odd multiplies (→ word lanes).
|
||||
- [`vmulesh`](vmulesh.md), [`vmulosh`](vmulosh.md) — signed-half-word even/odd.
|
||||
- [`vmsumubm`](vmsumubm.md) — fused multiply-sum unsigned-byte-modulo; often replaces the even/odd pair when the caller only needs the sum.
|
||||
- [`vmrghh`](vmrghh.md), [`vmrglh`](vmrglh.md) — interleave the even/odd half-word results.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmuleub` (Vector Multiply Even Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmuleub-vector-multiply-even-unsigned-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmuleuh.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmuleuh.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmuleuh` — Vector Multiply Even Unsigned Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000248`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmuleuh` | `vmuleuh` | — | Vector Multiply Even Unsigned Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmuleuh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmuleuh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000248`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `584`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmuleuh: read | Source A vector register. |
|
||||
| `VB` | vmuleuh: read | Source B vector register. |
|
||||
| `VD` | vmuleuh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmuleuh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmuleuh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmuleuh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1101`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1101)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:108`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L108)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:485`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L485)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3485-3492`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3485-L3492)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmuleuh => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[2 * i] as u32 * b[2 * i] as u32; }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Even-lane half-word multiply.** Only half-word lanes 0, 2, 4, 6 of `VA` and `VB` participate (big-endian indexing). Each 16×16 unsigned product widens to an unsigned 32-bit word and is written to the corresponding word lane of `VD`. The odd half-words are ignored.
|
||||
- **Lane-count reduction.** 8 half-word input lanes → 4 word output lanes. Pairing is `VD.w[i] = VA.h[2*i] * VB.h[2*i]` for `i ∈ 0..3`.
|
||||
- **No overflow possible.** `0xFFFF * 0xFFFF = 0xFFFE0001` — fits in 32 bits. `VSCR[SAT]` is untouched.
|
||||
- **Pair with [`vmulouh`](vmulouh.md)** to multiply every half-word lane. Interleave the two vectors with `vmrghw`/`vmrglw` (word-granularity) to rebuild the full element order, or feed both into [`vmsumuhm`](vmsumuhm.md) variants.
|
||||
- **No `Rc`, no XER, no FPSCR.**
|
||||
- **No VMX128 sibling.** Xenon code that needs 16-bit lane multiplies usually goes through [`vmsumuhm`](vmsumuhm.md) / [`vmsumuhs`](vmsumuhs.md).
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmulouh`](vmulouh.md) — odd-half-word twin.
|
||||
- [`vmulesh`](vmulesh.md), [`vmulosh`](vmulosh.md) — signed-half-word even/odd.
|
||||
- [`vmuleub`](vmuleub.md), [`vmuloub`](vmuloub.md) — byte-granularity even/odd (→ half-word lanes).
|
||||
- [`vmsumuhm`](vmsumuhm.md), [`vmsumuhs`](vmsumuhs.md) — fused multiply-sum unsigned-half-word (modulo / saturating).
|
||||
- [`vmrghw`](vmrghw.md), [`vmrglw`](vmrglw.md) — interleave results.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmuleuh` (Vector Multiply Even Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmuleuh-vector-multiply-even-unsigned-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmulosb.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmulosb.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmulosb` — Vector Multiply Odd Signed Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000108`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmulosb` | `vmulosb` | — | Vector Multiply Odd Signed Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmulosb [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmulosb` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000108`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `264`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmulosb: read | Source A vector register. |
|
||||
| `VB` | vmulosb: read | Source B vector register. |
|
||||
| `VD` | vmulosb: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmulosb`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmulosb`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulosb"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1106`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1106)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:109`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L109)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:456`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L456)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3477-3484`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3477-L3484)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmulosb => {
|
||||
let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i16; 8];
|
||||
for i in 0..8 { r[i] = a[2 * i + 1] as i16 * b[2 * i + 1] as i16; }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Odd-lane signed-byte multiply.** Only the odd-indexed byte lanes (1, 3, 5, 7, 9, 11, 13, 15 — big-endian numbering) of `VA` and `VB` participate. Each pair is treated as signed 8-bit, multiplied, and sign-extended to a signed 16-bit result in the corresponding half-word of `VD`. Pairing: `VD.h[i] = (int8)VA.b[2*i+1] * (int8)VB.b[2*i+1]` for `i ∈ 0..7`.
|
||||
- **Lane-count reduction.** 16 byte lanes → 8 half-word lanes.
|
||||
- **No overflow.** `(-128) * (-128) = 0x4000`, `(127) * (127) = 0x3F01` — both fit in int16. `VSCR[SAT]` is **not** set.
|
||||
- **Pair with [`vmulesb`](vmulesb.md)** to get every signed byte × byte product; interleave via `vmrghh`/`vmrglh`, or feed into [`vmsummbm`](vmsummbm.md) for a multiply-accumulate.
|
||||
- **Signed vs. unsigned distinction.** The `s` in `vmulosb` makes the product arithmetic: negative operands sign-extend. Compare with [`vmuloub`](vmuloub.md) which zero-extends.
|
||||
- **No `Rc`, no XER, no VSCR side-effect.** No VMX128 sibling.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmulesb`](vmulesb.md) — even-lane signed byte multiply.
|
||||
- [`vmuloub`](vmuloub.md), [`vmuleub`](vmuleub.md) — unsigned byte twins.
|
||||
- [`vmulosh`](vmulosh.md), [`vmulesh`](vmulesh.md) — signed half-word even/odd.
|
||||
- [`vmsummbm`](vmsummbm.md) — fused signed-byte multiply-sum modulo.
|
||||
- [`vmrghh`](vmrghh.md), [`vmrglh`](vmrglh.md) — interleave the even/odd half-word results.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmulosb` (Vector Multiply Odd Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmulosb-vector-multiply-odd-signed-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmulosh.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmulosh.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmulosh` — Vector Multiply Odd Signed Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000148`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmulosh` | `vmulosh` | — | Vector Multiply Odd Signed Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmulosh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmulosh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000148`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `328`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmulosh: read | Source A vector register. |
|
||||
| `VB` | vmulosh: read | Source B vector register. |
|
||||
| `VD` | vmulosh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmulosh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmulosh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulosh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1111`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1111)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:109`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L109)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:462`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L462)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3509-3516`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3509-L3516)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmulosh => {
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]);
|
||||
let mut r = [0i32; 4];
|
||||
for i in 0..4 { r[i] = a[2 * i + 1] as i32 * b[2 * i + 1] as i32; }
|
||||
ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Odd-lane signed half-word multiply.** Only half-word lanes 1, 3, 5, 7 of `VA` and `VB` (big-endian numbering) participate. Each pair is treated as signed 16-bit, multiplied, and sign-extended to a signed 32-bit word in `VD`. Pairing: `VD.w[i] = (int16)VA.h[2*i+1] * (int16)VB.h[2*i+1]` for `i ∈ 0..3`.
|
||||
- **Lane-count reduction.** 8 half-word lanes → 4 word lanes.
|
||||
- **No overflow.** `(-32768)*(-32768) = 0x40000000` — fits in int32. `VSCR[SAT]` is untouched.
|
||||
- **Pair with [`vmulesh`](vmulesh.md)** for all eight products, then interleave with `vmrghw`/`vmrglw`. Feed into [`vmsumshm`](vmsumshm.md)/[`vmsumshs`](vmsumshs.md) for accumulation.
|
||||
- **Signed arithmetic.** Negative inputs sign-extend before multiplication; contrast with [`vmulouh`](vmulouh.md).
|
||||
- **No `Rc`, no XER.** No VMX128 sibling.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmulesh`](vmulesh.md) — even-lane signed half-word multiply.
|
||||
- [`vmulouh`](vmulouh.md), [`vmuleuh`](vmuleuh.md) — unsigned half-word twins.
|
||||
- [`vmulosb`](vmulosb.md), [`vmulesb`](vmulesb.md) — signed byte even/odd.
|
||||
- [`vmhaddshs`](vmhaddshs.md), [`vmhraddshs`](vmhraddshs.md) — fused half-word fixed-point MAC variants.
|
||||
- [`vmsumshm`](vmsumshm.md), [`vmsumshs`](vmsumshs.md) — signed multiply-sum modulo / saturating.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmulosh` (Vector Multiply Odd Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmulosh-vector-multiply-odd-signed-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmuloub.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmuloub.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmuloub` — Vector Multiply Odd Unsigned Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000008`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmuloub` | `vmuloub` | — | Vector Multiply Odd Unsigned Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmuloub [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmuloub` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000008`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `8`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmuloub: read | Source A vector register. |
|
||||
| `VB` | vmuloub: read | Source B vector register. |
|
||||
| `VD` | vmuloub: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmuloub`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmuloub`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmuloub"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1116`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1116)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:109`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L109)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:437`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L437)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3461-3468`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3461-L3468)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmuloub => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 { r[i] = a[2 * i + 1] as u16 * b[2 * i + 1] as u16; }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Odd-lane unsigned byte multiply.** Only the odd-indexed byte lanes (1, 3, 5, 7, 9, 11, 13, 15 — big-endian) of `VA` and `VB` participate. Each 8×8 unsigned product widens to an unsigned 16-bit half-word in the corresponding half-word of `VD`. Pairing: `VD.h[i] = VA.b[2*i+1] * VB.b[2*i+1]` for `i ∈ 0..7`.
|
||||
- **Lane-count reduction.** 16 byte lanes → 8 half-word lanes.
|
||||
- **No overflow.** `0xFF * 0xFF = 0xFE01` fits in 16 bits. `VSCR[SAT]` is not touched.
|
||||
- **Pair with [`vmuleub`](vmuleub.md)** to get every byte product; re-interleave with `vmrghh`/`vmrglh`, or use [`vmsumubm`](vmsumubm.md) for a fused multiply-accumulate.
|
||||
- **Unsigned arithmetic.** No sign-extension; negatives don't exist for `b` lanes in this op. Contrast with [`vmulosb`](vmulosb.md).
|
||||
- **No `Rc`, no XER.** No VMX128 sibling.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmuleub`](vmuleub.md) — even-lane unsigned byte twin.
|
||||
- [`vmulosb`](vmulosb.md), [`vmulesb`](vmulesb.md) — signed byte even/odd.
|
||||
- [`vmulouh`](vmulouh.md), [`vmuleuh`](vmuleuh.md) — unsigned half-word even/odd.
|
||||
- [`vmsumubm`](vmsumubm.md) — fused unsigned-byte multiply-sum.
|
||||
- [`vmrghh`](vmrghh.md), [`vmrglh`](vmrglh.md) — interleave even/odd half-word products.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmuloub` (Vector Multiply Odd Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmuloub-vector-multiply-odd-unsigned-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vmulouh.md
Normal file
130
migration/project-root/ppc-manual/vmx/vmulouh.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vmulouh` — Vector Multiply Odd Unsigned Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000048`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vmulouh` | `vmulouh` | — | Vector Multiply Odd Unsigned Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vmulouh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vmulouh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000048`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `72`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vmulouh: read | Source A vector register. |
|
||||
| `VB` | vmulouh: read | Source B vector register. |
|
||||
| `VD` | vmulouh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vmulouh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vmulouh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulouh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1121`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1121)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:109`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L109)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:444`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L444)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3493-3500`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3493-L3500)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vmulouh => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[2 * i + 1] as u32 * b[2 * i + 1] as u32; }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Odd-lane unsigned half-word multiply.** Only half-word lanes 1, 3, 5, 7 (big-endian) of `VA` and `VB` participate. Each 16×16 unsigned product widens to a 32-bit word in `VD`. Pairing: `VD.w[i] = VA.h[2*i+1] * VB.h[2*i+1]` for `i ∈ 0..3`.
|
||||
- **Lane-count reduction.** 8 half-word lanes → 4 word lanes.
|
||||
- **No overflow.** `0xFFFF * 0xFFFF = 0xFFFE0001` fits in uint32. `VSCR[SAT]` is untouched.
|
||||
- **Pair with [`vmuleuh`](vmuleuh.md)** to multiply every half-word; interleave via `vmrghw`/`vmrglw`, or feed into [`vmsumuhm`](vmsumuhm.md)/[`vmsumuhs`](vmsumuhs.md).
|
||||
- **Unsigned arithmetic.** Zero-extension; contrast with [`vmulosh`](vmulosh.md).
|
||||
- **No `Rc`, no XER.** No VMX128 sibling.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmuleuh`](vmuleuh.md) — even-lane unsigned half-word twin.
|
||||
- [`vmulosh`](vmulosh.md), [`vmulesh`](vmulesh.md) — signed half-word even/odd.
|
||||
- [`vmuloub`](vmuloub.md), [`vmuleub`](vmuleub.md) — byte-granularity even/odd.
|
||||
- [`vmsumuhm`](vmsumuhm.md), [`vmsumuhs`](vmsumuhs.md) — fused unsigned multiply-sum modulo / saturating.
|
||||
- [`vmrghw`](vmrghw.md), [`vmrglw`](vmrglw.md) — interleave word results.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vmulouh` (Vector Multiply Odd Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmulouh-vector-multiply-odd-unsigned-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
194
migration/project-root/ppc-manual/vmx/vnmsubfp.md
Normal file
194
migration/project-root/ppc-manual/vmx/vnmsubfp.md
Normal file
@@ -0,0 +1,194 @@
|
||||
# `vnmsubfp` — Vector Negative Multiply-Subtract Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002f`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vnmsubfp` | `vnmsubfp` | — | Vector Negative Multiply-Subtract Floating Point |
|
||||
| `vnmsubfp128` | `vnmsubfp128` | — | Vector128 Negative Multiply-Subtract Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vnmsubfp [VD], [VA], [VC], [VB]
|
||||
vnmsubfp128 [VD], [VA], [VD], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vnmsubfp` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x1000002f`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `47`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
### `vnmsubfp128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000150`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `336`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vnmsubfp: read; vnmsubfp128: read | Source A vector register. |
|
||||
| `VC` | vnmsubfp: read | Source C vector register / 3-bit selector. |
|
||||
| `VB` | vnmsubfp: read; vnmsubfp128: read | Source B vector register. |
|
||||
| `VD` | vnmsubfp: write; vnmsubfp128: read; vnmsubfp128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vnmsubfp`
|
||||
|
||||
- **Reads (always):** `VA`, `VC`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vnmsubfp128`
|
||||
|
||||
- **Reads (always):** `VA`, `VD`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
for each 32-bit float lane i in 0..3:
|
||||
VD[i] <- −((VA[i] * VC[i]) − VB[i])
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vnmsubfp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vnmsubfp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1154`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1154)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:110`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L110)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:589`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L589)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2074-2089`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2074-L2089)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vnmsubfp => {
|
||||
// vD = -(vA * vC - vB) = vB - vA * vC. Same denorm-flush rule as vmaddfp.
|
||||
let a = ctx.vr[instr.ra()].as_f32x4();
|
||||
let b = ctx.vr[instr.rb()].as_f32x4();
|
||||
let c = ctx.vr[instr.rc()].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 {
|
||||
let ai = vmx::flush_denorm(a[i]);
|
||||
let bi = vmx::flush_denorm(b[i]);
|
||||
let ci = vmx::flush_denorm(c[i]);
|
||||
// PPCBUG-426: single FMA rounding instead of two-step (b - a*c).
|
||||
r[i] = vmx::flush_denorm(-ai.mul_add(ci, -bi));
|
||||
}
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vnmsubfp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vnmsubfp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1157`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1157)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:110`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L110)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:615`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L615)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2090-2107`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2090-L2107)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vnmsubfp128 => {
|
||||
// VMX128 form: vD <- -((vA * vB) - vD) = vD - (vA * vB). Canary
|
||||
// routes through `InstrEmit_vnmsubfp_` with the same arg-swap,
|
||||
// which flushes all inputs unconditionally.
|
||||
let a = ctx.vr[instr.va128()].as_f32x4();
|
||||
let b = ctx.vr[instr.vb128()].as_f32x4();
|
||||
let d = ctx.vr[instr.vd128()].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 {
|
||||
let ai = vmx::flush_denorm(a[i]);
|
||||
let bi = vmx::flush_denorm(b[i]);
|
||||
let di = vmx::flush_denorm(d[i]);
|
||||
// PPCBUG-427: single FMA rounding.
|
||||
r[i] = vmx::flush_denorm(-ai.mul_add(bi, -di));
|
||||
}
|
||||
ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Lane-wise negative multiply-subtract.** Each of the four lanes computes `VD[i] = −((VA[i] × VC[i]) − VB[i])`, i.e. `VB[i] − VA[i] × VC[i]`. The multiply and the subsequent add are **not** a single fused rounding step in xenia — they're a multiply, a subtract, then a negate — but the PowerPC ISA specifies the sequence to behave *as if* it were fused (single IEEE-754 rounding). Hardware Xenon indeed rounds only once.
|
||||
- **IEEE-754 binary32 lanes.** Follows `VSCR[NJ]`: denormal inputs/outputs flush to zero when `NJ = 1`.
|
||||
- **No VSCR[SAT] update.** VMX float ops never set saturation.
|
||||
- **No FPSCR effect.** Unlike scalar `fnmsub[s]`, `vnmsubfp` does not touch FPSCR.
|
||||
- **NaN propagation.** A NaN in any of `VA`, `VB`, or `VC` yields a NaN in the corresponding lane. Sign-of-NaN is unspecified but stable in xenia (matches the x86 host's `vfnmadd`-family output).
|
||||
- **Big-endian lane indexing.** Lane 0 is the MSB-most 4 bytes.
|
||||
- **VMX128 sibling: [`vnmsubfp128`](vnmsubfp128.md).** Identical operation with access to `v0..v127`.
|
||||
- **No `Rc` bit** on this opcode; it never touches CR.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vmaddfp`](vmaddfp.md) — the positive-rounded fused MAC `(VA × VC) + VB`.
|
||||
- [`vaddfp`](vaddfp.md), [`vsubfp`](vsubfp.md) — the underlying adds/subs.
|
||||
- [`vmulfp`](vmulfp.md) — xenia-convenience lane-wise float multiply (no native Altivec form; usually encoded as `vmaddfp VD, VA, VC, v0_zero`).
|
||||
- [`vrefp`](vrefp.md), [`vrsqrtefp`](vrsqrtefp.md) — Newton iterations that pair with `vnmsubfp`.
|
||||
- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — the other float-arithmetic primitives.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vnmsubfp` (Vector Negative Multiply-Subtract Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vnmsubfp-vector-negative-multiply-subtract-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
182
migration/project-root/ppc-manual/vmx/vnor.md
Normal file
182
migration/project-root/ppc-manual/vmx/vnor.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# `vnor` — Vector Logical NOR
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000504`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vnor` | `vnor` | — | Vector Logical NOR |
|
||||
| `vnor128` | `vnor128` | — | Vector128 Logical NOR |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vnor [VD], [VA], [VB]
|
||||
vnor128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vnor` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000504`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1284`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vnor128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000290`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `656`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vnor: read; vnor128: read | Source A vector register. |
|
||||
| `VB` | vnor: read; vnor128: read | Source B vector register. |
|
||||
| `VD` | vnor: write; vnor128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vnor`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vnor128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vnor`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vnor"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1168`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1168)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:110`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L110)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:534`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L534)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2244-2252`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2244-L2252)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vnor | PpcOpcode::vnor128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = !(a[i] | b[i]); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vnor128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vnor128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1171`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1171)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:110`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L110)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:623`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L623)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2244-2252`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2244-L2252)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vnor | PpcOpcode::vnor128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = !(a[i] | b[i]); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Bitwise NOR across the full 128-bit register.** `VD = ~(VA | VB)`. The operation is lane-agnostic; PPC documents it per-bit, xenia implements it as four 32-bit lanes for convenience but the result is identical to 16-byte or 8-half-word decomposition.
|
||||
- **`vnor VD, VA, VA` is the idiomatic `vnot`** (bitwise complement of `VA`). No dedicated `vnot` exists in base Altivec.
|
||||
- **Aliasing is legal.** `vnor v3, v3, v4` or `vnor v3, v3, v3` are well-defined and common.
|
||||
- **No flags.** No CR, XER, VSCR side-effect.
|
||||
- **VMX128 sibling [`vnor128`](vnor128.md)** provides the same op with access to `v0..v127`; xenia shares the interpreter arm (`vmx_reg_triple` selects the right encoding helper).
|
||||
- **Useful for mask inversion.** When a compare result needs to be inverted — e.g. "where not equal" — `vnor` of the compare result with itself is cheaper than a dedicated inversion.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vand`](vand.md) — bitwise AND.
|
||||
- [`vandc`](vandc.md) — `VA & ~VB` (useful as a fused inversion on the B side).
|
||||
- [`vor`](vor.md), [`vxor`](vxor.md) — complete the boolean-primitive set.
|
||||
- [`vsel`](vsel.md) — three-input bit-select; often fed by the inverse of a compare mask.
|
||||
- [`vcmpequb`](vcmpequb.md) and related compares — produce the masks `vnor` is typically applied to.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vnor` (Vector Logical NOR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vnor-vector-logical-nor-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 3 — Logical Operations](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
181
migration/project-root/ppc-manual/vmx/vor.md
Normal file
181
migration/project-root/ppc-manual/vmx/vor.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# `vor` — Vector Logical OR
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000484`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vor` | `vor` | — | Vector Logical OR |
|
||||
| `vor128` | `vor128` | — | Vector128 Logical OR |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vor [VD], [VA], [VB]
|
||||
vor128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vor` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000484`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `1156`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vor128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x140002d0`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `720`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vor: read; vor128: read | Source A vector register. |
|
||||
| `VB` | vor: read; vor128: read | Source B vector register. |
|
||||
| `VD` | vor: write; vor128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vor`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vor128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vor`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vor"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1186`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1186)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:111`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L111)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:531`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L531)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2226-2234`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2226-L2234)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vor | PpcOpcode::vor128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[i] | b[i]; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vor128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vor128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1189`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1189)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:111`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L111)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:625`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L625)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2226-2234`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2226-L2234)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vor | PpcOpcode::vor128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 { r[i] = a[i] | b[i]; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Bitwise OR across the full 128-bit register.** Lane-agnostic; xenia implements it as four 32-bit lanes but the result is identical at any granularity.
|
||||
- **`vor VD, VA, VA` is the idiomatic register move.** No dedicated "vmr" exists in base Altivec; compilers recognise the `vor v3, v4, v4` pattern as a move and schedule accordingly.
|
||||
- **Aliasing is legal.** `vor v3, v3, v4` merges the mask in `v4` into `v3`.
|
||||
- **No flags, no VSCR effect.**
|
||||
- **VMX128 sibling [`vor128`](vor128.md).** Same operation, wider register file.
|
||||
- **Common pattern: ORing a compare mask with a data vector** to force specific lanes to all-ones without needing a select.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vand`](vand.md), [`vandc`](vandc.md) — AND / AND-with-complement.
|
||||
- [`vnor`](vnor.md) — NOR, includes the idiom for bitwise NOT.
|
||||
- [`vxor`](vxor.md) — XOR; also a common "zero register" via `vxor vD, vD, vD`.
|
||||
- [`vsel`](vsel.md) — three-operand bit-select, often combined with OR of masks.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vor` (Vector Logical OR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vor-vector-logical-or-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 3 — Logical Operations](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
216
migration/project-root/ppc-manual/vmx/vperm.md
Normal file
216
migration/project-root/ppc-manual/vmx/vperm.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# `vperm` — Vector Permute
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002b`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vperm` | `vperm` | — | Vector Permute |
|
||||
| `vperm128` | `vperm128` | — | Vector128 Permute |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vperm [VD], [VA], [VB], [VC]
|
||||
vperm128 [VD], [VA], [VB], [VC]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vperm` — form `VA`
|
||||
|
||||
- **Opcode word:** `0x1000002b`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `43`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT` | destination vector register |
|
||||
| 11–15 | `VRA` | source A |
|
||||
| 16–20 | `VRB` | source B |
|
||||
| 21–25 | `VRC` | source C / shift |
|
||||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||||
|
||||
### `vperm128` — form `VX128_2`
|
||||
|
||||
- **Opcode word:** `0x14000000`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `0`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 23–25 | `VC` | source C 3-bit field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vperm: read; vperm128: read | Source A vector register. |
|
||||
| `VB` | vperm: read; vperm128: read | Source B vector register. |
|
||||
| `VC` | vperm: read; vperm128: read | Source C vector register / 3-bit selector. |
|
||||
| `VD` | vperm: write; vperm128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vperm`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vperm128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`, `VC`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vperm`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vperm"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1199`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1199)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:112`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L112)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:586`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L586)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2278-2302`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2278-L2302)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vperm | PpcOpcode::vperm128 => {
|
||||
let (va, vb, vd);
|
||||
let vc;
|
||||
if matches!(instr.opcode, PpcOpcode::vperm128) {
|
||||
va = instr.va128();
|
||||
vb = instr.vb128();
|
||||
vd = instr.vd128();
|
||||
vc = instr.vc128_2();
|
||||
} else {
|
||||
va = instr.ra();
|
||||
vb = instr.rb();
|
||||
vd = instr.rd();
|
||||
vc = instr.rc();
|
||||
}
|
||||
let a_bytes = ctx.vr[va].as_bytes();
|
||||
let b_bytes = ctx.vr[vb].as_bytes();
|
||||
let c_bytes = ctx.vr[vc].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 {
|
||||
let idx = (c_bytes[i] & 0x1F) as usize;
|
||||
r[i] = if idx < 16 { a_bytes[idx] } else { b_bytes[idx - 16] };
|
||||
}
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vperm128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vperm128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1202`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1202)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:112`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L112)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:605`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L605)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2278-2302`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2278-L2302)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vperm | PpcOpcode::vperm128 => {
|
||||
let (va, vb, vd);
|
||||
let vc;
|
||||
if matches!(instr.opcode, PpcOpcode::vperm128) {
|
||||
va = instr.va128();
|
||||
vb = instr.vb128();
|
||||
vd = instr.vd128();
|
||||
vc = instr.vc128_2();
|
||||
} else {
|
||||
va = instr.ra();
|
||||
vb = instr.rb();
|
||||
vd = instr.rd();
|
||||
vc = instr.rc();
|
||||
}
|
||||
let a_bytes = ctx.vr[va].as_bytes();
|
||||
let b_bytes = ctx.vr[vb].as_bytes();
|
||||
let c_bytes = ctx.vr[vc].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 {
|
||||
let idx = (c_bytes[i] & 0x1F) as usize;
|
||||
r[i] = if idx < 16 { a_bytes[idx] } else { b_bytes[idx - 16] };
|
||||
}
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-byte selector drives a cross-vector permute.** Each byte of `VC` is a 5-bit selector (low 5 bits used, upper 3 bits ignored). Bit 3 of that 5-bit field (i.e. the "16 bit") chooses which source: 0 selects from `VA`, 1 selects from `VB`. The low 4 bits index a byte within the chosen 16-byte operand.
|
||||
- **`vperm` is the universal "16-byte reshuffle" primitive.** It can express any byte-level permutation of 32 source bytes (`VA ‖ VB`) down to 16 destination bytes, including duplicates and drops.
|
||||
- **Big-endian byte indexing.** `VC.b[0]` controls `VD.b[0]` (the MSB byte). Selector value 0 picks `VA.b[0]`, value 15 picks `VA.b[15]`, value 16 picks `VB.b[0]`, value 31 picks `VB.b[15]`.
|
||||
- **Upper 3 bits of each `VC` byte are ignored.** Only bits 3..7 (the low 5) are consulted, so values like 0x1F and 0x5F both mean "byte 15 of VB". Software can use those upper bits for its own tagging.
|
||||
- **Pair with [`lvsl`](lvsl.md) / [`lvsr`](lvsr.md) for unaligned 16-byte loads.** `lvsl` produces the selector that shifts "left" by `EA & 0xF` bytes; feeding that into `vperm` with two aligned `lvx` results yields the unaligned 16-byte view.
|
||||
- **Aliasing legal.** `VD` may equal `VA` or `VB`.
|
||||
- **VMX128 sibling [`vperm128`](vperm128.md).** Same shape with the 7-bit register file. The VMX128 encoding carries `VC` in the 3-bit `VC` sub-field of the `VX128_2` form — which only lets `VC` select one of **8** specific registers, not 128. In xenia's decoder this is `vc128()`.
|
||||
- **No flags, no VSCR side-effect.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vsldoi`](vsldoi.md) — static-shift-by-`SHB` form; when the shift is a compile-time constant this is cheaper than `lvsl`+`vperm`.
|
||||
- [`lvsl`](lvsl.md), [`lvsr`](lvsr.md) — generate the permute mask from an effective address.
|
||||
- [`vmrghb`](vmrghb.md), [`vmrglb`](vmrglb.md), [`vmrghh`](vmrghh.md), [`vmrglh`](vmrglh.md), [`vmrghw`](vmrghw.md), [`vmrglw`](vmrglw.md) — dedicated merges that are a subset of `vperm`.
|
||||
- [`vspltb`](vspltb.md), [`vsplth`](vsplth.md), [`vspltw`](vspltw.md) — splat-from-lane, also expressible via `vperm` + a constant mask.
|
||||
- [`vpkuhum`](vpkuhum.md) and other `vpk*` — narrower-lane packs whose pattern can also be encoded in `vperm`.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vperm` (Vector Permute)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vperm-vector-permute-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
130
migration/project-root/ppc-manual/vmx/vpkpx.md
Normal file
130
migration/project-root/ppc-manual/vmx/vpkpx.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# `vpkpx` — Vector Pack Pixel
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000030e`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vpkp` | `vpkpx` | — | Vector Pack Pixel |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vpkpx [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vpkpx` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000030e`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `782`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vpkpx: read | Source A vector register. |
|
||||
| `VB` | vpkpx: read | Source B vector register. |
|
||||
| `VD` | vpkpx: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vpkpx`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vpkpx`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkpx"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1810`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1810)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:113`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L113)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:504`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L504)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4123-4131`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4123-L4131)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkpx => {
|
||||
let a = ctx.vr[instr.ra()].as_u32x4();
|
||||
let b = ctx.vr[instr.rb()].as_u32x4();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..4 { r[i] = crate::vmx::pack_pixel_555(a[i]); }
|
||||
for i in 0..4 { r[4 + i] = crate::vmx::pack_pixel_555(b[i]); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Pack 4×4 pixel words → 8×16-bit 1-5-5-5 pixels.** For each 32-bit word lane, three bit-fields are sampled and concatenated into a 16-bit `1.5.5.5` (A.R.G.B) format, losing precision but not saturating.
|
||||
- **Bit-layout of each output half-word.** Bit 0 of the output = bit 7 of the source byte (alpha); the next 5 bits come from the red channel's top 5 bits (bits 8..12 of the source word); then 5 bits of green (bits 16..20); then 5 bits of blue (bits 24..28). Xenia's helper is `vmx::pack_pixel_555` (in `crates/xenia-cpu/src/vmx.rs`).
|
||||
- **No saturation / no rounding.** The op truncates the lower bits of each channel; `VSCR[SAT]` is **not** affected.
|
||||
- **Big-endian lane order.** `VA`'s 4 words produce the first 4 output half-words (`VD.h[0..3]`); `VB`'s 4 words fill `VD.h[4..7]`.
|
||||
- **Paired with [`vupkhpx`](vupkhpx.md) / [`vupklpx`](vupklpx.md)** — these unpack a 1-5-5-5 pixel back into a word-lane `0x00RRGGBB`-like form for further arithmetic.
|
||||
- **No `Rc`, no XER.** No VMX128 sibling — game code that needs 555-pixel packing on Xenon either uses the scalar path or runs into the `vpkd3d128` family for richer D3D formats.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vupkhpx`](vupkhpx.md), [`vupklpx`](vupklpx.md) — the inverse unpacks.
|
||||
- [`vpkuhum`](vpkuhum.md), [`vpkuhus`](vpkuhus.md), [`vpkuwum`](vpkuwum.md), [`vpkuwus`](vpkuwus.md) — lane-halving packs (unsigned modulo / saturating).
|
||||
- [`vpkshss`](vpkshss.md), [`vpkshus`](vpkshus.md), [`vpkswss`](vpkswss.md), [`vpkswus`](vpkswus.md) — signed-input saturating packs.
|
||||
- [`vpkd3d128`](../vmx128/vpkd3d128.md) — VMX128-exclusive D3D-format pack (richer pixel formats than 1-5-5-5).
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vpkpx` (Vector Pack Pixel32)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkpx-vector-pack-pixel32-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
193
migration/project-root/ppc-manual/vmx/vpkshss.md
Normal file
193
migration/project-root/ppc-manual/vmx/vpkshss.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# `vpkshss` — Vector Pack Signed Half Word Signed Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000018e`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vpkshss` | `vpkshss` | — | Vector Pack Signed Half Word Signed Saturate |
|
||||
| `vpkshss128` | `vpkshss128` | — | Vector128 Pack Signed Half Word Signed Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vpkshss [VD], [VA], [VB]
|
||||
vpkshss128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vpkshss` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000018e`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `398`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vpkshss128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000200`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `512`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vpkshss: read; vpkshss128: read | Source A vector register. |
|
||||
| `VB` | vpkshss: read; vpkshss128: read | Source B vector register. |
|
||||
| `VD` | vpkshss: write; vpkshss128: write | Destination vector register. |
|
||||
| `VSCR` | vpkshss: write; vpkshss128: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vpkshss`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vpkshss128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vpkshss`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
- `vpkshss128`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vpkshss`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkshss"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1845`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1845)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:113`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L113)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:471`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L471)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4070-4082`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4070-L4082)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkshss | PpcOpcode::vpkshss128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkshss128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[ra]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[rb]);
|
||||
let mut r = [0i8; 16]; let mut sat = false;
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_i8(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_i8(b[i]); r[8 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = crate::vmx::from_i8x16(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vpkshss128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkshss128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1848`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1848)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:113`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L113)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:618`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L618)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4070-4082`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4070-L4082)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkshss | PpcOpcode::vpkshss128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkshss128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[ra]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[rb]);
|
||||
let mut r = [0i8; 16]; let mut sat = false;
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_i8(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_i8(b[i]); r[8 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = crate::vmx::from_i8x16(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Signed half-word → signed byte saturating pack.** Each of the 16 input half-word lanes (8 from `VA`, 8 from `VB`) is clamped to the `int8` range `[−128, +127]`. Values outside that range produce the nearest extreme and **set the sticky `VSCR[SAT]`** bit.
|
||||
- **Lane-count doubling.** 8+8 = 16 half-word lanes → 16 byte lanes in `VD`.
|
||||
- **Big-endian ordering.** `VA`'s 8 half-words fill `VD.b[0..7]`; `VB`'s 8 fill `VD.b[8..15]`.
|
||||
- **Signed vs. unsigned output.** `vpkshss` has signed input and signed output. Compare with [`vpkshus`](vpkshus.md), which keeps signed input but clamps to `uint8` (`[0, 255]`).
|
||||
- **`VSCR[SAT]` is sticky.** Once set it remains set until an `mtvscr` clears it. Software that needs a per-block saturation signal must clear before the kernel and test after.
|
||||
- **No `Rc`, no XER / FPSCR.**
|
||||
- **VMX128 sibling [`vpkshss128`](vpkshss128.md).** Same semantics, wider register file.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vpkshus`](vpkshus.md) — signed → unsigned saturating (same half-word input).
|
||||
- [`vpkuhus`](vpkuhus.md), [`vpkuhum`](vpkuhum.md) — unsigned half-word input, unsigned byte output (saturating / modulo).
|
||||
- [`vpkswss`](vpkswss.md), [`vpkswus`](vpkswus.md) — the word → half-word analogues.
|
||||
- [`vupkhsb`](vupkhsb.md), [`vupklsb`](vupklsb.md) — the inverse unpacks that sign-extend bytes back to half-words.
|
||||
- [`vaddsbs`](vaddsbs.md), [`vsubsbs`](vsubsbs.md) — other sources of byte-saturating arithmetic.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vpkshss` (Vector Pack Signed Half Word Signed Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkshss-vector-pack-signed-half-word-signed-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
192
migration/project-root/ppc-manual/vmx/vpkshus.md
Normal file
192
migration/project-root/ppc-manual/vmx/vpkshus.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# `vpkshus` — Vector Pack Signed Half Word Unsigned Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000010e`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vpkshus` | `vpkshus` | — | Vector Pack Signed Half Word Unsigned Saturate |
|
||||
| `vpkshus128` | `vpkshus128` | — | Vector128 Pack Signed Half Word Unsigned Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vpkshus [VD], [VA], [VB]
|
||||
vpkshus128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vpkshus` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000010e`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `270`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vpkshus128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000240`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `576`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vpkshus: read; vpkshus128: read | Source A vector register. |
|
||||
| `VB` | vpkshus: read; vpkshus128: read | Source B vector register. |
|
||||
| `VD` | vpkshus: write; vpkshus128: write | Destination vector register. |
|
||||
| `VSCR` | vpkshus: write; vpkshus128: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vpkshus`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vpkshus128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vpkshus`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
- `vpkshus128`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vpkshus`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkshus"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1953`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1953)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:113`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L113)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:459`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L459)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4057-4069`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4057-L4069)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkshus | PpcOpcode::vpkshus128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkshus128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[ra]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[rb]);
|
||||
let mut r = [0u8; 16]; let mut sat = false;
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_u8(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_u8(b[i]); r[8 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vpkshus128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkshus128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1956`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1956)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:113`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L113)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:620`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L620)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4057-4069`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4057-L4069)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkshus | PpcOpcode::vpkshus128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkshus128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = crate::vmx::as_i16x8(ctx.vr[ra]);
|
||||
let b = crate::vmx::as_i16x8(ctx.vr[rb]);
|
||||
let mut r = [0u8; 16]; let mut sat = false;
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_u8(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_u8(b[i]); r[8 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Signed half-word → unsigned byte saturating pack.** Each of the 16 input half-word lanes (8 from `VA`, 8 from `VB`) is interpreted as `int16` and clamped to `[0, 255]`. Negative values → 0; values above 255 → 255. Clamping any lane sticky-sets `VSCR[SAT]`.
|
||||
- **Lane-count doubling.** 16 half-word lanes → 16 byte lanes, `VA` filling `VD.b[0..7]` and `VB` filling `VD.b[8..15]`.
|
||||
- **Difference from [`vpkshss`](vpkshss.md).** Both take signed half-words; `shss` clamps to `int8`, `shus` clamps to `uint8`. Choose `shus` when the signed negative half is not physically meaningful (e.g. after subtracting a clamped-at-zero value).
|
||||
- **`VSCR[SAT]` is sticky.**
|
||||
- **No `Rc`, no XER / FPSCR.**
|
||||
- **VMX128 sibling [`vpkshus128`](vpkshus128.md).** Same behaviour with wider register file.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vpkshss`](vpkshss.md) — signed → signed clamp.
|
||||
- [`vpkuhus`](vpkuhus.md) — unsigned input → unsigned byte.
|
||||
- [`vpkuhum`](vpkuhum.md) — unsigned input, truncating (modulo) pack.
|
||||
- [`vpkswus`](vpkswus.md) — the word → half-word signed→unsigned analogue.
|
||||
- [`vupkhub`](vupkhub.md)-family unpacks (if present) — the inverse.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vpkshus` (Vector Pack Signed Half Word Unsigned Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkshus-vector-pack-signed-half-word-unsigned-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
193
migration/project-root/ppc-manual/vmx/vpkswss.md
Normal file
193
migration/project-root/ppc-manual/vmx/vpkswss.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# `vpkswss` — Vector Pack Signed Word Signed Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100001ce`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vpkswss` | `vpkswss` | — | Vector Pack Signed Word Signed Saturate |
|
||||
| `vpkswss128` | `vpkswss128` | — | Vector128 Pack Signed Word Signed Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vpkswss [VD], [VA], [VB]
|
||||
vpkswss128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vpkswss` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x100001ce`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `462`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vpkswss128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000280`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `640`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vpkswss: read; vpkswss128: read | Source A vector register. |
|
||||
| `VB` | vpkswss: read; vpkswss128: read | Source B vector register. |
|
||||
| `VD` | vpkswss: write; vpkswss128: write | Destination vector register. |
|
||||
| `VSCR` | vpkswss: write; vpkswss128: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vpkswss`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vpkswss128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vpkswss`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
- `vpkswss128`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vpkswss`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkswss"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1867`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1867)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:114`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L114)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:474`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L474)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4109-4121`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4109-L4121)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkswss | PpcOpcode::vpkswss128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkswss128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = crate::vmx::as_i32x4(ctx.vr[ra]);
|
||||
let b = crate::vmx::as_i32x4(ctx.vr[rb]);
|
||||
let mut r = [0i16; 8]; let mut sat = false;
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_i16(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_i16(b[i]); r[4 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = crate::vmx::from_i16x8(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vpkswss128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkswss128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1870`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1870)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:114`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L114)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:622`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L622)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4109-4121`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4109-L4121)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkswss | PpcOpcode::vpkswss128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkswss128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = crate::vmx::as_i32x4(ctx.vr[ra]);
|
||||
let b = crate::vmx::as_i32x4(ctx.vr[rb]);
|
||||
let mut r = [0i16; 8]; let mut sat = false;
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_i16(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_i16(b[i]); r[4 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = crate::vmx::from_i16x8(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Signed word → signed half-word saturating pack.** Each of the 8 input word lanes (4 from `VA`, 4 from `VB`) is clamped to `[−32768, +32767]`. Out-of-range clamping sticky-sets `VSCR[SAT]`.
|
||||
- **Lane-count doubling.** 4+4 = 8 word lanes → 8 half-word lanes.
|
||||
- **Ordering.** `VA`'s four words produce `VD.h[0..3]`; `VB`'s produce `VD.h[4..7]`.
|
||||
- **Signed vs. unsigned output.** `vpkswss` preserves sign; [`vpkswus`](vpkswus.md) clamps the same signed-word input to `uint16`.
|
||||
- **`VSCR[SAT]` is sticky.**
|
||||
- **No `Rc`, no XER / FPSCR.**
|
||||
- **VMX128 sibling [`vpkswss128`](vpkswss128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vpkswus`](vpkswus.md) — signed → unsigned saturating.
|
||||
- [`vpkuwus`](vpkuwus.md), [`vpkuwum`](vpkuwum.md) — unsigned word input.
|
||||
- [`vpkshss`](vpkshss.md), [`vpkshus`](vpkshus.md) — half-word input → byte output analogues.
|
||||
- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — the signed-half-word → word unpacks that reverse a `vpkswss` pair.
|
||||
- [`vaddsws`](vaddsws.md), [`vsubsws`](vsubsws.md) — word-saturating arithmetic producers.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vpkswss` (Vector Pack Signed Word Signed Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkswss-vector-pack-signed-word-signed-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
192
migration/project-root/ppc-manual/vmx/vpkswus.md
Normal file
192
migration/project-root/ppc-manual/vmx/vpkswus.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# `vpkswus` — Vector Pack Signed Word Unsigned Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000014e`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vpkswus` | `vpkswus` | — | Vector Pack Signed Word Unsigned Saturate |
|
||||
| `vpkswus128` | `vpkswus128` | — | Vector128 Pack Signed Word Unsigned Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vpkswus [VD], [VA], [VB]
|
||||
vpkswus128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vpkswus` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000014e`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `334`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vpkswus128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x140002c0`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `704`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vpkswus: read; vpkswus128: read | Source A vector register. |
|
||||
| `VB` | vpkswus: read; vpkswus128: read | Source B vector register. |
|
||||
| `VD` | vpkswus: write; vpkswus128: write | Destination vector register. |
|
||||
| `VSCR` | vpkswus: write; vpkswus128: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vpkswus`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vpkswus128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vpkswus`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
- `vpkswus128`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vpkswus`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkswus"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1889`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1889)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:114`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L114)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:465`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L465)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4096-4108`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4096-L4108)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkswus | PpcOpcode::vpkswus128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkswus128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = crate::vmx::as_i32x4(ctx.vr[ra]);
|
||||
let b = crate::vmx::as_i32x4(ctx.vr[rb]);
|
||||
let mut r = [0u16; 8]; let mut sat = false;
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_u16(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_u16(b[i]); r[4 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vpkswus128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkswus128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1892`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1892)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:114`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L114)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:624`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L624)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4096-4108`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4096-L4108)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkswus | PpcOpcode::vpkswus128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkswus128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = crate::vmx::as_i32x4(ctx.vr[ra]);
|
||||
let b = crate::vmx::as_i32x4(ctx.vr[rb]);
|
||||
let mut r = [0u16; 8]; let mut sat = false;
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_u16(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_u16(b[i]); r[4 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Signed word → unsigned half-word saturating pack.** Each of the 8 input word lanes is interpreted as `int32` and clamped to `[0, 65535]`. Negatives → 0, values above 65535 → 65535, sticky-setting `VSCR[SAT]`.
|
||||
- **Lane-count doubling.** 8 word lanes → 8 half-word lanes, ordered as `VA` then `VB`.
|
||||
- **Choose over [`vpkswss`](vpkswss.md)** when negative results shouldn't survive — e.g. clamped colour or intensity values that happen to have arrived in `int32` form.
|
||||
- **`VSCR[SAT]` is sticky.**
|
||||
- **No `Rc`, no XER / FPSCR.**
|
||||
- **VMX128 sibling [`vpkswus128`](vpkswus128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vpkswss`](vpkswss.md) — signed → signed clamp.
|
||||
- [`vpkuwus`](vpkuwus.md) — unsigned word input → unsigned half-word.
|
||||
- [`vpkuwum`](vpkuwum.md) — modulo (truncate) pack.
|
||||
- [`vpkshus`](vpkshus.md) — the half-word → byte analogue.
|
||||
- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — signed-half-word unpacks.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vpkswus` (Vector Pack Signed Word Unsigned Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkswus-vector-pack-signed-word-unsigned-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
188
migration/project-root/ppc-manual/vmx/vpkuhum.md
Normal file
188
migration/project-root/ppc-manual/vmx/vpkuhum.md
Normal file
@@ -0,0 +1,188 @@
|
||||
# `vpkuhum` — Vector Pack Unsigned Half Word Unsigned Modulo
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000000e`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vpkuhum` | `vpkuhum` | — | Vector Pack Unsigned Half Word Unsigned Modulo |
|
||||
| `vpkuhum128` | `vpkuhum128` | — | Vector128 Pack Unsigned Half Word Unsigned Modulo |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vpkuhum [VD], [VA], [VB]
|
||||
vpkuhum128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vpkuhum` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000000e`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `14`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vpkuhum128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000300`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `768`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vpkuhum: read; vpkuhum128: read | Source A vector register. |
|
||||
| `VB` | vpkuhum: read; vpkuhum128: read | Source B vector register. |
|
||||
| `VD` | vpkuhum: write; vpkuhum128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vpkuhum`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vpkuhum128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vpkuhum`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuhum"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1909`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1909)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:115`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L115)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:440`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L440)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4019-4030`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4019-L4030)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkuhum | PpcOpcode::vpkuhum128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkuhum128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = ctx.vr[ra].as_u16x8();
|
||||
let b = ctx.vr[rb].as_u16x8();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..8 { r[i] = a[i] as u8; }
|
||||
for i in 0..8 { r[8 + i] = b[i] as u8; }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vpkuhum128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuhum128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1912`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1912)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:115`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L115)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:626`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L626)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4019-4030`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4019-L4030)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkuhum | PpcOpcode::vpkuhum128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkuhum128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = ctx.vr[ra].as_u16x8();
|
||||
let b = ctx.vr[rb].as_u16x8();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..8 { r[i] = a[i] as u8; }
|
||||
for i in 0..8 { r[8 + i] = b[i] as u8; }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Unsigned half-word → byte modulo pack.** Each of the 16 input half-word lanes (8 from `VA`, 8 from `VB`) is truncated to its low 8 bits. No saturation; values above 255 wrap modulo 256.
|
||||
- **`VSCR[SAT]` is never touched.** This is the `-m` (modulo) form. For saturation use [`vpkuhus`](vpkuhus.md).
|
||||
- **Lane-count doubling.** 16 half-word lanes → 16 byte lanes, `VA`'s 8 half-words into `VD.b[0..7]` and `VB`'s 8 into `VD.b[8..15]`.
|
||||
- **Cheap "low-byte extract" primitive.** Often used to repack per-channel results after a half-word arithmetic step. Contrast with shifting + masking.
|
||||
- **No `Rc`, no XER.**
|
||||
- **VMX128 sibling [`vpkuhum128`](vpkuhum128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vpkuhus`](vpkuhus.md) — the saturating sibling.
|
||||
- [`vpkuwum`](vpkuwum.md) — word → half-word modulo pack.
|
||||
- [`vpkshss`](vpkshss.md), [`vpkshus`](vpkshus.md) — signed half-word packs.
|
||||
- [`vupkhub`](vupkhub.md) / [`vupklub`](vupklub.md) (if present) — zero-extending byte → half-word unpacks that reverse this op.
|
||||
- [`vperm`](vperm.md) — general-purpose alternative when the packing pattern is irregular.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vpkuhum` (Vector Pack Unsigned Half Word Unsigned Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkuhum-vector-pack-unsigned-half-word-unsigned-modulo-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
192
migration/project-root/ppc-manual/vmx/vpkuhus.md
Normal file
192
migration/project-root/ppc-manual/vmx/vpkuhus.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# `vpkuhus` — Vector Pack Unsigned Half Word Unsigned Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000008e`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vpkuhus` | `vpkuhus` | — | Vector Pack Unsigned Half Word Unsigned Saturate |
|
||||
| `vpkuhus128` | `vpkuhus128` | — | Vector128 Pack Unsigned Half Word Unsigned Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vpkuhus [VD], [VA], [VB]
|
||||
vpkuhus128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vpkuhus` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000008e`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `142`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vpkuhus128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000340`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `832`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vpkuhus: read; vpkuhus128: read | Source A vector register. |
|
||||
| `VB` | vpkuhus: read; vpkuhus128: read | Source B vector register. |
|
||||
| `VD` | vpkuhus: write; vpkuhus128: write | Destination vector register. |
|
||||
| `VSCR` | vpkuhus: write; vpkuhus128: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vpkuhus`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vpkuhus128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vpkuhus`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
- `vpkuhus128`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vpkuhus`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuhus"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1931`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1931)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:115`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L115)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:452`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L452)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4044-4056`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4044-L4056)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkuhus | PpcOpcode::vpkuhus128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkuhus128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = ctx.vr[ra].as_u16x8();
|
||||
let b = ctx.vr[rb].as_u16x8();
|
||||
let mut r = [0u8; 16]; let mut sat = false;
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_u16_to_u8(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_u16_to_u8(b[i]); r[8 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vpkuhus128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuhus128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1934`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1934)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:115`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L115)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:628`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L628)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4044-4056`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4044-L4056)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkuhus | PpcOpcode::vpkuhus128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkuhus128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = ctx.vr[ra].as_u16x8();
|
||||
let b = ctx.vr[rb].as_u16x8();
|
||||
let mut r = [0u8; 16]; let mut sat = false;
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_u16_to_u8(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..8 { let (v, s) = crate::vmx::sat_u16_to_u8(b[i]); r[8 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Unsigned half-word → unsigned byte saturating pack.** Each of the 16 input half-word lanes (interpreted as `uint16`) is clamped to `[0, 255]`. Values above 255 produce 255 and sticky-set `VSCR[SAT]`.
|
||||
- **Lane-count doubling.** 16 half-word lanes → 16 byte lanes; `VA` first, then `VB`.
|
||||
- **Pair with [`vpkuhum`](vpkuhum.md)** when saturation is not desired (truncate the low byte instead).
|
||||
- **`VSCR[SAT]` is sticky.**
|
||||
- **No `Rc`, no XER.**
|
||||
- **VMX128 sibling [`vpkuhus128`](vpkuhus128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vpkuhum`](vpkuhum.md) — modulo counterpart.
|
||||
- [`vpkshss`](vpkshss.md), [`vpkshus`](vpkshus.md) — signed-input half-word packs.
|
||||
- [`vpkuwus`](vpkuwus.md), [`vpkuwum`](vpkuwum.md) — word → half-word analogues.
|
||||
- [`vupkhsb`](vupkhsb.md), [`vupklsb`](vupklsb.md) — sign-extending byte → half-word unpacks.
|
||||
- [`vperm`](vperm.md) — programmable alternative for irregular packs.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vpkuhus` (Vector Pack Unsigned Half Word Unsigned Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkuhus-vector-pack-unsigned-half-word-unsigned-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
188
migration/project-root/ppc-manual/vmx/vpkuwum.md
Normal file
188
migration/project-root/ppc-manual/vmx/vpkuwum.md
Normal file
@@ -0,0 +1,188 @@
|
||||
# `vpkuwum` — Vector Pack Unsigned Word Unsigned Modulo
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000004e`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vpkuwum` | `vpkuwum` | — | Vector Pack Unsigned Word Unsigned Modulo |
|
||||
| `vpkuwum128` | `vpkuwum128` | — | Vector128 Pack Unsigned Word Unsigned Modulo |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vpkuwum [VD], [VA], [VB]
|
||||
vpkuwum128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vpkuwum` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000004e`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `78`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vpkuwum128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x14000380`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `896`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vpkuwum: read; vpkuwum128: read | Source A vector register. |
|
||||
| `VB` | vpkuwum: read; vpkuwum128: read | Source B vector register. |
|
||||
| `VD` | vpkuwum: write; vpkuwum128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vpkuwum`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vpkuwum128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vpkuwum`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuwum"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1973`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1973)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:116`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L116)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:447`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L447)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4031-4042`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4031-L4042)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkuwum | PpcOpcode::vpkuwum128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkuwum128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = ctx.vr[ra].as_u32x4();
|
||||
let b = ctx.vr[rb].as_u32x4();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..4 { r[i] = a[i] as u16; }
|
||||
for i in 0..4 { r[4 + i] = b[i] as u16; }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vpkuwum128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuwum128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1976`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1976)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:116`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L116)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:630`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L630)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4031-4042`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4031-L4042)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkuwum | PpcOpcode::vpkuwum128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkuwum128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = ctx.vr[ra].as_u32x4();
|
||||
let b = ctx.vr[rb].as_u32x4();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..4 { r[i] = a[i] as u16; }
|
||||
for i in 0..4 { r[4 + i] = b[i] as u16; }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Unsigned word → half-word modulo pack.** Each of the 8 input word lanes is truncated to its low 16 bits.
|
||||
- **Lane-count doubling.** 4+4 word lanes → 8 half-word lanes; `VA`'s 4 words into `VD.h[0..3]`, `VB`'s into `VD.h[4..7]`.
|
||||
- **`VSCR[SAT]` never touched** (modulo variant). Use [`vpkuwus`](vpkuwus.md) for saturation.
|
||||
- **Cheap low-half extract.** Typical after a 32-bit lane accumulator is "narrowed" back down to 16-bit for storage.
|
||||
- **No `Rc`, no XER.**
|
||||
- **VMX128 sibling [`vpkuwum128`](vpkuwum128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vpkuwus`](vpkuwus.md) — saturating counterpart.
|
||||
- [`vpkuhum`](vpkuhum.md) — half-word → byte modulo.
|
||||
- [`vpkswss`](vpkswss.md), [`vpkswus`](vpkswus.md) — signed word inputs.
|
||||
- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — signed-half-word → word unpacks.
|
||||
- [`vperm`](vperm.md) — alternative for irregular patterns.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vpkuwum` (Vector Pack Unsigned Word Unsigned Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkuwum-vector-pack-unsigned-word-unsigned-modulo-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
192
migration/project-root/ppc-manual/vmx/vpkuwus.md
Normal file
192
migration/project-root/ppc-manual/vmx/vpkuwus.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# `vpkuwus` — Vector Pack Unsigned Word Unsigned Saturate
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100000ce`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vpkuwus` | `vpkuwus` | — | Vector Pack Unsigned Word Unsigned Saturate |
|
||||
| `vpkuwus128` | `vpkuwus128` | — | Vector128 Pack Unsigned Word Unsigned Saturate |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vpkuwus [VD], [VA], [VB]
|
||||
vpkuwus128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vpkuwus` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x100000ce`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `206`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vpkuwus128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x140003c0`
|
||||
- **Primary opcode (bits 0–5):** `5`
|
||||
- **Extended opcode:** `960`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vpkuwus: read; vpkuwus128: read | Source A vector register. |
|
||||
| `VB` | vpkuwus: read; vpkuwus128: read | Source B vector register. |
|
||||
| `VD` | vpkuwus: write; vpkuwus128: write | Destination vector register. |
|
||||
| `VSCR` | vpkuwus: write; vpkuwus128: write | Vector Status and Control Register (NJ/SAT bits). |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vpkuwus`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vpkuwus128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`, `VSCR`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
- `vpkuwus`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
- `vpkuwus128`: **VSCR[SAT]** may be stickied on saturating vector operations.
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vpkuwus`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuwus"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1995`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1995)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:116`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L116)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:453`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L453)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4083-4095`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4083-L4095)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkuwus | PpcOpcode::vpkuwus128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkuwus128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = ctx.vr[ra].as_u32x4();
|
||||
let b = ctx.vr[rb].as_u32x4();
|
||||
let mut r = [0u16; 8]; let mut sat = false;
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_u32_to_u16(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_u32_to_u16(b[i]); r[4 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vpkuwus128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuwus128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1998`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1998)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:116`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L116)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:632`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L632)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4083-4095`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4083-L4095)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vpkuwus | PpcOpcode::vpkuwus128 => {
|
||||
let is_128 = matches!(instr.opcode, PpcOpcode::vpkuwus128);
|
||||
let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) }
|
||||
else { (instr.ra(), instr.rb(), instr.rd()) };
|
||||
let a = ctx.vr[ra].as_u32x4();
|
||||
let b = ctx.vr[rb].as_u32x4();
|
||||
let mut r = [0u16; 8]; let mut sat = false;
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_u32_to_u16(a[i]); r[i] = v; sat |= s; }
|
||||
for i in 0..4 { let (v, s) = crate::vmx::sat_u32_to_u16(b[i]); r[4 + i] = v; sat |= s; }
|
||||
if sat { ctx.set_vscr_sat(true); }
|
||||
ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Unsigned word → unsigned half-word saturating pack.** Each of the 8 input word lanes (interpreted as `uint32`) is clamped to `[0, 65535]`. Overflow sticky-sets `VSCR[SAT]`.
|
||||
- **Lane-count doubling.** 8 word lanes → 8 half-word lanes; `VA` then `VB`.
|
||||
- **Pair with [`vpkuwum`](vpkuwum.md)** when a modulo wrap is required.
|
||||
- **`VSCR[SAT]` is sticky.**
|
||||
- **No `Rc`, no XER.**
|
||||
- **VMX128 sibling [`vpkuwus128`](vpkuwus128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vpkuwum`](vpkuwum.md) — modulo counterpart.
|
||||
- [`vpkswus`](vpkswus.md), [`vpkswss`](vpkswss.md) — signed-word input.
|
||||
- [`vpkuhus`](vpkuhus.md) — half-word → byte saturating pack.
|
||||
- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — inverse unpack (sign-extending).
|
||||
- [`vperm`](vperm.md) — irregular pack.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vpkuwus` (Vector Pack Unsigned Word Unsigned Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkuwus-vector-pack-unsigned-word-unsigned-saturate-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
176
migration/project-root/ppc-manual/vmx/vrefp.md
Normal file
176
migration/project-root/ppc-manual/vmx/vrefp.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# `vrefp` — Vector Reciprocal Estimate Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000010a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vrefp` | `vrefp` | — | Vector Reciprocal Estimate Floating Point |
|
||||
| `vrefp128` | `vrefp128` | — | Vector128 Reciprocal Estimate Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vrefp [VD], [VB]
|
||||
vrefp128 [VD], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vrefp` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000010a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `266`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vrefp128` — form `VX128_3`
|
||||
|
||||
- **Opcode word:** `0x18000630`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `1584`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (6) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `IMM` | 5-bit immediate |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21–27 | `XO` | extended opcode |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vrefp: read; vrefp128: read | Source B vector register. |
|
||||
| `VD` | vrefp: write; vrefp128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vrefp`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vrefp128`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vrefp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrefp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1227`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1227)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:117`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L117)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:457`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L457)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2153-2161`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2153-L2161)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrefp | PpcOpcode::vrefp128 => {
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrefp128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrefp128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = 1.0 / b[i]; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vrefp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrefp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1230`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1230)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:117`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L117)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:664`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L664)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2153-2161`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2153-L2161)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrefp | PpcOpcode::vrefp128 => {
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrefp128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrefp128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = 1.0 / b[i]; }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Lane-wise reciprocal *estimate*.** Each 32-bit float lane of `VB` is approximated by `1.0 / VB[i]`. The PowerPC spec permits an **estimate** accurate to about 1/4096 (≈12 bits); xenia-rs produces the *exact* IEEE-754 reciprocal by dividing, trading accuracy for simplicity. Game code that cares about bit-reproducible behaviour should Newton-iterate with [`vnmsubfp`](vnmsubfp.md) regardless of which backend computes the seed.
|
||||
- **Standard Newton iteration.** `x₁ = x₀ * (2 − VB * x₀)`, expressible as `vnmsubfp x₁, x₀, VB, 2.0f` followed by `vmaddfp x₁, x₀, x₁, 0.0f` (or similar). One iteration roughly doubles the valid bit count.
|
||||
- **IEEE-754 binary32 lanes; `VSCR[NJ]` honoured** (denormals flush to zero when `NJ = 1`).
|
||||
- **No VSCR[SAT] update, no FPSCR update, no exception.** Division by zero yields ±∞; division of zero yields ±∞ too (same sign convention).
|
||||
- **Big-endian lane indexing.**
|
||||
- **VMX128 sibling [`vrefp128`](vrefp128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vrsqrtefp`](vrsqrtefp.md) — reciprocal *square root* estimate, used with the same Newton scheme.
|
||||
- [`vmaddfp`](vmaddfp.md), [`vnmsubfp`](vnmsubfp.md) — the building blocks of the Newton iteration.
|
||||
- [`vexptefp`](vexptefp.md), [`vlogefp`](vlogefp.md) — other "estimate"-style transcendentals.
|
||||
- [`vaddfp`](vaddfp.md), [`vsubfp`](vsubfp.md) — the float add/sub.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vrefp` (Vector Reciprocal Estimate Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrefp-vector-reciprocal-estimate-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
177
migration/project-root/ppc-manual/vmx/vrfim.md
Normal file
177
migration/project-root/ppc-manual/vmx/vrfim.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# `vrfim` — Vector Round to Floating-Point Integer toward -Infinity
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100002ca`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vrfim` | `vrfim` | — | Vector Round to Floating-Point Integer toward -Infinity |
|
||||
| `vrfim128` | `vrfim128` | — | Vector128 Round to Floating-Point Integer toward -Infinity |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vrfim [VD], [VB]
|
||||
vrfim128 [VD], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vrfim` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x100002ca`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `714`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vrfim128` — form `VX128_3`
|
||||
|
||||
- **Opcode word:** `0x18000330`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `816`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (6) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `IMM` | 5-bit immediate |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21–27 | `XO` | extended opcode |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vrfim: read; vrfim128: read | Source B vector register. |
|
||||
| `VD` | vrfim: write; vrfim128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vrfim`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vrfim128`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vrfim`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfim"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1240`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1240)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:496`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L496)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2493-2501`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2493-L2501)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrfim | PpcOpcode::vrfim128 => {
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrfim128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrfim128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].floor(); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vrfim128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfim128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1243`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1243)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:660`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L660)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2493-2501`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2493-L2501)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrfim | PpcOpcode::vrfim128 => {
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrfim128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrfim128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].floor(); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Round toward minus-infinity (floor).** Each 32-bit float lane of `VB` is rounded down to the nearest integer value still representable as a float. `3.2 → 3.0`, `−3.2 → −4.0`.
|
||||
- **IEEE-754 binary32 output; `VSCR[NJ]` honoured** (denormal flush-to-zero).
|
||||
- **Integer-too-big lanes are a no-op:** values already ≥ 2²³ in magnitude are all-integer and unchanged.
|
||||
- **NaN propagation.** NaN input → NaN output. `±∞` → `±∞`.
|
||||
- **No VSCR[SAT], no FPSCR update.** No "inexact" trap flag; this is the VMX rounding variant that deliberately ignores FPSCR's rounding mode.
|
||||
- **Big-endian lane indexing.**
|
||||
- **VMX128 sibling [`vrfim128`](vrfim128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vrfin`](vrfin.md) — round to nearest (ties-to-even).
|
||||
- [`vrfip`](vrfip.md) — round toward +∞ (ceiling).
|
||||
- [`vrfiz`](vrfiz.md) — round toward zero (truncate).
|
||||
- [`vctsxs`](vctsxs.md), [`vctuxs`](vctuxs.md) — float → fixed-point integer conversion with explicit scale.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vrfim` (Vector Round to Floating-Point Integer toward Minus Infinity)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrfim-vector-round-floating-point-integer-toward-minus-infinity-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
181
migration/project-root/ppc-manual/vmx/vrfin.md
Normal file
181
migration/project-root/ppc-manual/vmx/vrfin.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# `vrfin` — Vector Round to Floating-Point Integer Nearest
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000020a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vrfin` | `vrfin` | — | Vector Round to Floating-Point Integer Nearest |
|
||||
| `vrfin128` | `vrfin128` | — | Vector128 Round to Floating-Point Integer Nearest |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vrfin [VD], [VB]
|
||||
vrfin128 [VD], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vrfin` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000020a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `522`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vrfin128` — form `VX128_3`
|
||||
|
||||
- **Opcode word:** `0x18000370`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `880`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (6) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `IMM` | 5-bit immediate |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21–27 | `XO` | extended opcode |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vrfin: read; vrfin128: read | Source B vector register. |
|
||||
| `VD` | vrfin: write; vrfin128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vrfin`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vrfin128`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vrfin`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfin"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1253`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1253)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:479`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L479)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2473-2483`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2473-L2483)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrfin | PpcOpcode::vrfin128 => {
|
||||
// PPCBUG-432: ISA round-to-nearest-even, NOT Rust's `round()`
|
||||
// (which is round-half-away-from-zero).
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrfin128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrfin128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].round_ties_even(); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vrfin128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfin128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1256`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1256)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:661`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L661)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2473-2483`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2473-L2483)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrfin | PpcOpcode::vrfin128 => {
|
||||
// PPCBUG-432: ISA round-to-nearest-even, NOT Rust's `round()`
|
||||
// (which is round-half-away-from-zero).
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrfin128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrfin128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].round_ties_even(); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Round to nearest integer.** Each 32-bit float lane of `VB` is rounded to the nearest representable integer value. Xenia-rs uses Rust's `f32::round`, which rounds half-away-from-zero; the hardware Xenon actually implements round-ties-to-even. This is a known small mismatch tracked in xenia.
|
||||
- **IEEE-754 binary32 output; `VSCR[NJ]` honoured.**
|
||||
- **Integer-too-big lanes are no-ops** (|x| ≥ 2²³).
|
||||
- **NaN and ±∞** pass through unchanged.
|
||||
- **No VSCR[SAT], no FPSCR update.**
|
||||
- **Big-endian lane indexing.**
|
||||
- **VMX128 sibling [`vrfin128`](vrfin128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vrfim`](vrfim.md) — round toward −∞.
|
||||
- [`vrfip`](vrfip.md) — round toward +∞.
|
||||
- [`vrfiz`](vrfiz.md) — round toward zero.
|
||||
- [`vctsxs`](vctsxs.md), [`vctuxs`](vctuxs.md) — float → fixed-point integer.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vrfin` (Vector Round to Floating-Point Integer to Nearest)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrfin-vector-round-floating-point-integer-nearest-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
177
migration/project-root/ppc-manual/vmx/vrfip.md
Normal file
177
migration/project-root/ppc-manual/vmx/vrfip.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# `vrfip` — Vector Round to Floating-Point Integer toward +Infinity
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000028a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vrfip` | `vrfip` | — | Vector Round to Floating-Point Integer toward +Infinity |
|
||||
| `vrfip128` | `vrfip128` | — | Vector128 Round to Floating-Point Integer toward +Infinity |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vrfip [VD], [VB]
|
||||
vrfip128 [VD], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vrfip` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000028a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `650`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vrfip128` — form `VX128_3`
|
||||
|
||||
- **Opcode word:** `0x180003b0`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `944`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (6) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `IMM` | 5-bit immediate |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21–27 | `XO` | extended opcode |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vrfip: read; vrfip128: read | Source B vector register. |
|
||||
| `VD` | vrfip: write; vrfip128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vrfip`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vrfip128`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vrfip`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfip"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1266`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1266)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:492`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L492)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2484-2492`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2484-L2492)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrfip | PpcOpcode::vrfip128 => {
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrfip128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrfip128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].ceil(); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vrfip128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfip128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1269`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1269)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:662`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L662)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2484-2492`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2484-L2492)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrfip | PpcOpcode::vrfip128 => {
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrfip128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrfip128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].ceil(); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Round toward plus-infinity (ceiling).** Each 32-bit float lane of `VB` is rounded up. `3.2 → 4.0`, `−3.2 → −3.0`.
|
||||
- **IEEE-754 binary32 output; `VSCR[NJ]` honoured.**
|
||||
- **Integer-too-big lanes are no-ops.**
|
||||
- **NaN and ±∞** pass through.
|
||||
- **No VSCR[SAT], no FPSCR update.**
|
||||
- **Big-endian lane indexing.**
|
||||
- **VMX128 sibling [`vrfip128`](vrfip128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vrfim`](vrfim.md) — round toward −∞ (the symmetric partner).
|
||||
- [`vrfin`](vrfin.md) — round to nearest.
|
||||
- [`vrfiz`](vrfiz.md) — round toward zero.
|
||||
- [`vctsxs`](vctsxs.md), [`vctuxs`](vctuxs.md) — float → fixed-point integer.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vrfip` (Vector Round to Floating-Point Integer toward Plus Infinity)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrfip-vector-round-floating-point-integer-toward-plus-infinity-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
175
migration/project-root/ppc-manual/vmx/vrfiz.md
Normal file
175
migration/project-root/ppc-manual/vmx/vrfiz.md
Normal file
@@ -0,0 +1,175 @@
|
||||
# `vrfiz` — Vector Round to Floating-Point Integer toward Zero
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000024a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vrfiz` | `vrfiz` | — | Vector Round to Floating-Point Integer toward Zero |
|
||||
| `vrfiz128` | `vrfiz128` | — | Vector128 Round to Floating-Point Integer toward Zero |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vrfiz [VD], [VB]
|
||||
vrfiz128 [VD], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vrfiz` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000024a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `586`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vrfiz128` — form `VX128_3`
|
||||
|
||||
- **Opcode word:** `0x180003f0`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `1008`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (6) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `IMM` | 5-bit immediate |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21–27 | `XO` | extended opcode |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vrfiz: read; vrfiz128: read | Source B vector register. |
|
||||
| `VD` | vrfiz: write; vrfiz128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vrfiz`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vrfiz128`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vrfiz`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfiz"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1279`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1279)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:486`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L486)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2464-2472`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2464-L2472)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrfiz | PpcOpcode::vrfiz128 => {
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrfiz128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrfiz128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].trunc(); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vrfiz128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfiz128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1282`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1282)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:663`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L663)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2464-2472`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2464-L2472)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrfiz | PpcOpcode::vrfiz128 => {
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrfiz128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrfiz128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = b[i].trunc(); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Round toward zero (truncate).** Each 32-bit float lane of `VB` has its fractional part dropped. `3.7 → 3.0`, `−3.7 → −3.0`.
|
||||
- **IEEE-754 binary32 output; `VSCR[NJ]` honoured.**
|
||||
- **Integer-too-big lanes are no-ops.**
|
||||
- **NaN and ±∞** pass through.
|
||||
- **No VSCR[SAT], no FPSCR update.** `vrfiz` is the VMX analogue of C's `truncf`.
|
||||
- **Big-endian lane indexing.**
|
||||
- **VMX128 sibling [`vrfiz128`](vrfiz128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vrfin`](vrfin.md), [`vrfim`](vrfim.md), [`vrfip`](vrfip.md) — the other three rounding modes.
|
||||
- [`vctsxs`](vctsxs.md), [`vctuxs`](vctuxs.md) — float → signed / unsigned fixed-point (these truncate too, and also apply a `UIMM` power-of-two scale).
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vrfiz` (Vector Round to Floating-Point Integer toward Zero)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrfiz-vector-round-floating-point-integer-toward-zero-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
129
migration/project-root/ppc-manual/vmx/vrlb.md
Normal file
129
migration/project-root/ppc-manual/vmx/vrlb.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# `vrlb` — Vector Rotate Left Integer Byte
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000004`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vrlb` | `vrlb` | — | Vector Rotate Left Integer Byte |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vrlb [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vrlb` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000004`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `4`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vrlb: read | Source A vector register. |
|
||||
| `VB` | vrlb: read | Source B vector register. |
|
||||
| `VD` | vrlb: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vrlb`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vrlb`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrlb"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1286`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1286)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:119`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L119)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:436`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L436)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3876-3883`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3876-L3883)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrlb => {
|
||||
let a = ctx.vr[instr.ra()].as_bytes();
|
||||
let b = ctx.vr[instr.rb()].as_bytes();
|
||||
let mut r = [0u8; 16];
|
||||
for i in 0..16 { r[i] = a[i].rotate_left((b[i] & 7) as u32); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-lane left-rotate of bytes.** For each of the 16 byte lanes, `VD.b[i] = rotate_left(VA.b[i], VB.b[i] & 7)`. The low 3 bits of each shift-count byte are used; upper bits are ignored.
|
||||
- **Shift counts are per-lane, not scalar.** Unlike most CPUs' vector rotate, Altivec's shift/rotate takes a whole vector as the shift-count. If you want a uniform rotate, splat first with [`vspltb`](vspltb.md).
|
||||
- **Big-endian byte lanes.** `VA.b[0]` is the most significant byte.
|
||||
- **No overflow, no sticky saturation.** Rotation is information-preserving.
|
||||
- **No `Rc`, no XER, no VSCR side-effect.**
|
||||
- **No VMX128 sibling.** Xenon software that needs per-byte rotation typically uses `vrlw` on pre-swizzled data.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vrlh`](vrlh.md), [`vrlw`](vrlw.md) — half-word / word rotates (same "per-lane rotate count" pattern).
|
||||
- [`vslb`](vslb.md), [`vsrb`](vsrb.md), [`vsrab`](vsrab.md) — byte shift-left / logical-right / arithmetic-right.
|
||||
- [`vsl`](vsl.md), [`vsr`](vsr.md), [`vslo`](vslo.md), [`vsro`](vsro.md) — whole-register shifts.
|
||||
- [`vspltb`](vspltb.md) — splat to build a uniform shift-count vector.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vrlb` (Vector Rotate Left Integer Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrlb-vector-rotate-left-integer-byte-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
129
migration/project-root/ppc-manual/vmx/vrlh.md
Normal file
129
migration/project-root/ppc-manual/vmx/vrlh.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# `vrlh` — Vector Rotate Left Integer Half Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000044`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vrlh` | `vrlh` | — | Vector Rotate Left Integer Half Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vrlh [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vrlh` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000044`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `68`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vrlh: read | Source A vector register. |
|
||||
| `VB` | vrlh: read | Source B vector register. |
|
||||
| `VD` | vrlh: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vrlh`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vrlh`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrlh"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1294`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1294)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:119`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L119)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:443`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L443)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3908-3915`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3908-L3915)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrlh => {
|
||||
let a = ctx.vr[instr.ra()].as_u16x8();
|
||||
let b = ctx.vr[instr.rb()].as_u16x8();
|
||||
let mut r = [0u16; 8];
|
||||
for i in 0..8 { r[i] = a[i].rotate_left((b[i] & 0xF) as u32); }
|
||||
ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-lane left-rotate of half-words.** For each of the 8 half-word lanes, `VD.h[i] = rotate_left(VA.h[i], VB.h[i] & 0xF)`. Low 4 bits of each shift-count half-word are used.
|
||||
- **Per-lane shift counts.** Splat first (`vsplth`) if a uniform rotate is needed.
|
||||
- **Big-endian half-word lanes.** Lane 0 is the most significant pair of bytes.
|
||||
- **No overflow, no saturation.** Rotation is information-preserving.
|
||||
- **No `Rc`, no XER, no VSCR effect.**
|
||||
- **No VMX128 sibling.**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vrlb`](vrlb.md), [`vrlw`](vrlw.md) — byte / word rotate siblings.
|
||||
- [`vslh`](vslh.md), [`vsrh`](vsrh.md), [`vsrah`](vsrah.md) — half-word shift-left / logical-right / arithmetic-right.
|
||||
- [`vsl`](vsl.md), [`vsr`](vsr.md) — bit-level whole-register shifts.
|
||||
- [`vsplth`](vsplth.md) — splat to build a uniform shift-count vector.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vrlh` (Vector Rotate Left Integer Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrlh-vector-rotate-left-integer-half-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
189
migration/project-root/ppc-manual/vmx/vrlw.md
Normal file
189
migration/project-root/ppc-manual/vmx/vrlw.md
Normal file
@@ -0,0 +1,189 @@
|
||||
# `vrlw` — Vector Rotate Left Integer Word
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000084`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vrlw` | `vrlw` | — | Vector Rotate Left Integer Word |
|
||||
| `vrlw128` | `vrlw128` | — | Vector128 Rotate Left Word |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vrlw [VD], [VA], [VB]
|
||||
vrlw128 [VD], [VA], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vrlw` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x10000084`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `132`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vrlw128` — form `VX128`
|
||||
|
||||
- **Opcode word:** `0x18000050`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `80`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4 or 5) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `VA128l` | source A low 5 bits |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21 | `VA128H` | source A high bit |
|
||||
| 22 | `—` | reserved |
|
||||
| 23–25 | `VC` | optional VC / XO sub-field |
|
||||
| 26 | `VA128h` | source A middle bit |
|
||||
| 27 | `—` | reserved |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VA` | vrlw: read; vrlw128: read | Source A vector register. |
|
||||
| `VB` | vrlw: read; vrlw128: read | Source B vector register. |
|
||||
| `VD` | vrlw: write; vrlw128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vrlw`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vrlw128`
|
||||
|
||||
- **Reads (always):** `VA`, `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vrlw`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrlw"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1308`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1308)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:119`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L119)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:450`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L450)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2450-2461`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2450-L2461)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrlw | PpcOpcode::vrlw128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 {
|
||||
let sh = b[i] & 0x1F;
|
||||
r[i] = a[i].rotate_left(sh);
|
||||
}
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vrlw128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrlw128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1311`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1311)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:119`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L119)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:692`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L692)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2450-2461`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2450-L2461)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrlw | PpcOpcode::vrlw128 => {
|
||||
let (va, vb, vd) = vmx_reg_triple(instr);
|
||||
let a = ctx.vr[va].as_u32x4();
|
||||
let b = ctx.vr[vb].as_u32x4();
|
||||
let mut r = [0u32; 4];
|
||||
for i in 0..4 {
|
||||
let sh = b[i] & 0x1F;
|
||||
r[i] = a[i].rotate_left(sh);
|
||||
}
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Per-lane left-rotate of words.** For each of the 4 word lanes, `VD.w[i] = rotate_left(VA.w[i], VB.w[i] & 0x1F)`. Low 5 bits of each shift-count word are used.
|
||||
- **Per-lane shift counts.** Splat with [`vspltw`](vspltw.md) or [`vspltisw`](vspltisw.md) for uniform rotation.
|
||||
- **Big-endian word lanes.** Lane 0 is the most significant 4 bytes.
|
||||
- **No overflow, no saturation.**
|
||||
- **No `Rc`, no XER, no VSCR effect.**
|
||||
- **VMX128 sibling [`vrlw128`](vrlw128.md)** — same op with the wider register file.
|
||||
- **Building block for [`vrlimi128`](../vmx128/vrlimi128.md).** VMX128 fuses a rotate with an immediate-masked insert for cheaper bitfield shuffles; `vrlw` is the plain variant that the XDK uses for scalar-style 32-bit rotates.
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vrlb`](vrlb.md), [`vrlh`](vrlh.md) — byte / half-word rotate siblings.
|
||||
- [`vslw`](vslw.md), [`vsrw`](vsrw.md), [`vsraw`](vsraw.md) — word shift-left / logical-right / arithmetic-right.
|
||||
- [`vsl`](vsl.md), [`vsr`](vsr.md) — bit-level whole-register shifts.
|
||||
- [`vspltw`](vspltw.md), [`vspltisw`](vspltisw.md) — splat sources for uniform shift counts.
|
||||
- [`vrlimi128`](../vmx128/vrlimi128.md) — rotate + mask-insert (VMX128-exclusive).
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vrlw` (Vector Rotate Left Integer Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrlw-vector-rotate-left-integer-word-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
176
migration/project-root/ppc-manual/vmx/vrsqrtefp.md
Normal file
176
migration/project-root/ppc-manual/vmx/vrsqrtefp.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# `vrsqrtefp` — Vector Reciprocal Square Root Estimate Floating Point
|
||||
|
||||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000014a`
|
||||
|
||||
<!-- GENERATED: BEGIN -->
|
||||
|
||||
## Assembler Mnemonics
|
||||
|
||||
| Mnemonic | XML entry | Flags | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `vrsqrtefp` | `vrsqrtefp` | — | Vector Reciprocal Square Root Estimate Floating Point |
|
||||
| `vrsqrtefp128` | `vrsqrtefp128` | — | Vector128 Reciprocal Square Root Estimate Floating Point |
|
||||
|
||||
## Syntax
|
||||
|
||||
```asm
|
||||
vrsqrtefp [VD], [VB]
|
||||
vrsqrtefp128 [VD], [VB]
|
||||
```
|
||||
|
||||
## Encoding
|
||||
|
||||
### `vrsqrtefp` — form `VX`
|
||||
|
||||
- **Opcode word:** `0x1000014a`
|
||||
- **Primary opcode (bits 0–5):** `4`
|
||||
- **Extended opcode:** `330`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (4) |
|
||||
| 6–10 | `VRT/VD` | destination vector register |
|
||||
| 11–15 | `VRA/VA` | source A vector register |
|
||||
| 16–20 | `VRB/VB` | source B vector register |
|
||||
| 21–31 | `XO` | extended opcode (11 bits) |
|
||||
|
||||
### `vrsqrtefp128` — form `VX128_3`
|
||||
|
||||
- **Opcode word:** `0x18000670`
|
||||
- **Primary opcode (bits 0–5):** `6`
|
||||
- **Extended opcode:** `1648`
|
||||
- **Synchronising:** no
|
||||
|
||||
| Bits | Field | Meaning |
|
||||
| --- | --- | --- |
|
||||
| 0–5 | `OPCD` | primary opcode (6) |
|
||||
| 6–10 | `VD128l` | destination low 5 bits |
|
||||
| 11–15 | `IMM` | 5-bit immediate |
|
||||
| 16–20 | `VB128l` | source B low 5 bits |
|
||||
| 21–27 | `XO` | extended opcode |
|
||||
| 28–29 | `VD128h` | destination high 2 bits |
|
||||
| 30–31 | `VB128h` | source B high 2 bits |
|
||||
|
||||
## Operands
|
||||
|
||||
| Field | Role | Description |
|
||||
| --- | --- | --- |
|
||||
| `VB` | vrsqrtefp: read; vrsqrtefp128: read | Source B vector register. |
|
||||
| `VD` | vrsqrtefp: write; vrsqrtefp128: write | Destination vector register. |
|
||||
|
||||
## Register Effects
|
||||
|
||||
### `vrsqrtefp`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
### `vrsqrtefp128`
|
||||
|
||||
- **Reads (always):** `VB`
|
||||
- **Reads (conditional):** _none_
|
||||
- **Writes (always):** `VD`
|
||||
- **Writes (conditional):** _none_
|
||||
|
||||
## Status-Register Effects
|
||||
|
||||
_No condition-register or status-register effects._
|
||||
|
||||
## Operation (pseudocode)
|
||||
|
||||
```
|
||||
; Pseudocode derives directly from the xenia-rs interpreter
|
||||
; arm (see Implementation References). Operation semantics:
|
||||
; - Read source operands from the fields listed under Operands.
|
||||
; - Apply the arithmetic / logical / memory action described
|
||||
; in the Description field above.
|
||||
; - Write results to the destination register(s); update any
|
||||
; status bits enumerated under Status-Register Effects.
|
||||
; Consult the IBM AIX reference link under IBM Reference for
|
||||
; canonical PPC-style pseudocode where xenia's expression is
|
||||
; terse.
|
||||
```
|
||||
|
||||
## C Translation Example
|
||||
|
||||
```c
|
||||
/* C translation: the xenia-rs interpreter arm below in */
|
||||
/* Implementation References is the authoritative semantic */
|
||||
/* snapshot. Translate it line-by-line: */
|
||||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||||
/* The Register Effects and Status-Register Effects tables above */
|
||||
/* enumerate every side effect a faithful translation must emit. */
|
||||
```
|
||||
|
||||
## Implementation References
|
||||
|
||||
**`vrsqrtefp`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrsqrtefp"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1371`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1371)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:120`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L120)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:463`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L463)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2162-2170`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2162-L2170)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrsqrtefp | PpcOpcode::vrsqrtefp128 => {
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrsqrtefp128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrsqrtefp128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = 1.0 / b[i].sqrt(); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
**`vrsqrtefp128`**
|
||||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrsqrtefp128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1374`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1374)
|
||||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:120`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L120)
|
||||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:665`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L665)
|
||||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2162-2170`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2162-L2170)
|
||||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||||
|
||||
```rust
|
||||
PpcOpcode::vrsqrtefp | PpcOpcode::vrsqrtefp128 => {
|
||||
let vb = if matches!(instr.opcode, PpcOpcode::vrsqrtefp128) { instr.vb128() } else { instr.rb() };
|
||||
let vd = if matches!(instr.opcode, PpcOpcode::vrsqrtefp128) { instr.vd128() } else { instr.rd() };
|
||||
let b = ctx.vr[vb].as_f32x4();
|
||||
let mut r = [0f32; 4];
|
||||
for i in 0..4 { r[i] = 1.0 / b[i].sqrt(); }
|
||||
ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r);
|
||||
ctx.pc += 4;
|
||||
}
|
||||
```
|
||||
</details>
|
||||
|
||||
<!-- GENERATED: END -->
|
||||
|
||||
## Special Cases & Edge Conditions
|
||||
|
||||
- **Lane-wise reciprocal-square-root *estimate*.** Each 32-bit float lane of `VB` is approximated by `1.0 / sqrt(VB[i])`. The PowerPC spec permits a 12-bit estimate; xenia-rs computes the exact IEEE-754 result. Games that depend on Xenon's low-precision estimate may need a helper to truncate bits to match hardware.
|
||||
- **Standard Newton iteration (Quake-style):** `x₁ = x₀ * (1.5 − 0.5 * VB * x₀²)`. One pass produces ~24 bits of precision — essentially indistinguishable from a true `1/sqrt`.
|
||||
- **Negative input is a trap** in math terms but not in ISA terms: the hardware returns a QNaN. `sqrt(−x)` for `x > 0` → QNaN. Zero produces `+∞` (and may sticky-set no bits).
|
||||
- **IEEE-754 binary32; `VSCR[NJ]` honoured.**
|
||||
- **No VSCR[SAT], no FPSCR update, no exception.**
|
||||
- **Big-endian lane indexing.**
|
||||
- **VMX128 sibling [`vrsqrtefp128`](vrsqrtefp128.md).**
|
||||
|
||||
## Related Instructions
|
||||
|
||||
- [`vrefp`](vrefp.md) — plain reciprocal estimate.
|
||||
- [`vmaddfp`](vmaddfp.md), [`vnmsubfp`](vnmsubfp.md) — the Newton iteration primitives.
|
||||
- [`vexptefp`](vexptefp.md), [`vlogefp`](vlogefp.md) — the other transcendental estimates.
|
||||
|
||||
## IBM Reference
|
||||
|
||||
- [AIX 7.3 — `vrsqrtefp` (Vector Reciprocal Square Root Estimate Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrsqrtefp-vector-reciprocal-square-root-estimate-floating-point-instruction)
|
||||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user