Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:
- claude-memory/ ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
(103 files, 1.1 MB - MEMORY.md + every
project_xenia_rs_*.md from audits
addis_signext through audit-058)
- project-root/dot-claude/ <project-root>/.claude/settings.json
(Stop hook + permissions)
- project-root/ppc-manual/ <project-root>/ppc-manual/
(PowerPC reference docs, 397 files, 3.7 MB)
- project-root/run-canary.sh <project-root>/run-canary.sh
- README.md Human-readable setup checklist
- setup.sh Idempotent installer (also reclones
xenia-canary at pinned HEAD 6de80dffe)
- MANIFEST.md Per-file mapping + per-file-not-bundled
restoration recipe
Excluded from bundle (not shippable via git):
- Sylpheed ISO (7.8 GB; copyright; manual copy required)
- sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
- target/ build artifacts (rebuild on target)
- audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
- audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
- xenia-canary checkout (setup.sh reclones from
git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
217 lines
9.2 KiB
Markdown
217 lines
9.2 KiB
Markdown
# `vperm` — Vector Permute
|
||
|
||
> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002b`
|
||
|
||
<!-- GENERATED: BEGIN -->
|
||
|
||
## Assembler Mnemonics
|
||
|
||
| Mnemonic | XML entry | Flags | Description |
|
||
| --- | --- | --- | --- |
|
||
| `vperm` | `vperm` | — | Vector Permute |
|
||
| `vperm128` | `vperm128` | — | Vector128 Permute |
|
||
|
||
## Syntax
|
||
|
||
```asm
|
||
vperm [VD], [VA], [VB], [VC]
|
||
vperm128 [VD], [VA], [VB], [VC]
|
||
```
|
||
|
||
## Encoding
|
||
|
||
### `vperm` — form `VA`
|
||
|
||
- **Opcode word:** `0x1000002b`
|
||
- **Primary opcode (bits 0–5):** `4`
|
||
- **Extended opcode:** `43`
|
||
- **Synchronising:** no
|
||
|
||
| Bits | Field | Meaning |
|
||
| --- | --- | --- |
|
||
| 0–5 | `OPCD` | primary opcode (4) |
|
||
| 6–10 | `VRT` | destination vector register |
|
||
| 11–15 | `VRA` | source A |
|
||
| 16–20 | `VRB` | source B |
|
||
| 21–25 | `VRC` | source C / shift |
|
||
| 26–31 | `XO` | extended opcode (6 bits) |
|
||
|
||
### `vperm128` — form `VX128_2`
|
||
|
||
- **Opcode word:** `0x14000000`
|
||
- **Primary opcode (bits 0–5):** `5`
|
||
- **Extended opcode:** `0`
|
||
- **Synchronising:** no
|
||
|
||
| Bits | Field | Meaning |
|
||
| --- | --- | --- |
|
||
| 0–5 | `OPCD` | primary opcode (5) |
|
||
| 6–10 | `VD128l` | destination low 5 bits |
|
||
| 11–15 | `VA128l` | source A low 5 bits |
|
||
| 16–20 | `VB128l` | source B low 5 bits |
|
||
| 21 | `VA128H` | source A high bit |
|
||
| 23–25 | `VC` | source C 3-bit field |
|
||
| 26 | `VA128h` | source A middle bit |
|
||
| 28–29 | `VD128h` | destination high 2 bits |
|
||
| 30–31 | `VB128h` | source B high 2 bits |
|
||
|
||
## Operands
|
||
|
||
| Field | Role | Description |
|
||
| --- | --- | --- |
|
||
| `VA` | vperm: read; vperm128: read | Source A vector register. |
|
||
| `VB` | vperm: read; vperm128: read | Source B vector register. |
|
||
| `VC` | vperm: read; vperm128: read | Source C vector register / 3-bit selector. |
|
||
| `VD` | vperm: write; vperm128: write | Destination vector register. |
|
||
|
||
## Register Effects
|
||
|
||
### `vperm`
|
||
|
||
- **Reads (always):** `VA`, `VB`, `VC`
|
||
- **Reads (conditional):** _none_
|
||
- **Writes (always):** `VD`
|
||
- **Writes (conditional):** _none_
|
||
|
||
### `vperm128`
|
||
|
||
- **Reads (always):** `VA`, `VB`, `VC`
|
||
- **Reads (conditional):** _none_
|
||
- **Writes (always):** `VD`
|
||
- **Writes (conditional):** _none_
|
||
|
||
## Status-Register Effects
|
||
|
||
_No condition-register or status-register effects._
|
||
|
||
## Operation (pseudocode)
|
||
|
||
```
|
||
; Pseudocode derives directly from the xenia-rs interpreter
|
||
; arm (see Implementation References). Operation semantics:
|
||
; - Read source operands from the fields listed under Operands.
|
||
; - Apply the arithmetic / logical / memory action described
|
||
; in the Description field above.
|
||
; - Write results to the destination register(s); update any
|
||
; status bits enumerated under Status-Register Effects.
|
||
; Consult the IBM AIX reference link under IBM Reference for
|
||
; canonical PPC-style pseudocode where xenia's expression is
|
||
; terse.
|
||
```
|
||
|
||
## C Translation Example
|
||
|
||
```c
|
||
/* C translation: the xenia-rs interpreter arm below in */
|
||
/* Implementation References is the authoritative semantic */
|
||
/* snapshot. Translate it line-by-line: */
|
||
/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */
|
||
/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */
|
||
/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */
|
||
/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */
|
||
/* The Register Effects and Status-Register Effects tables above */
|
||
/* enumerate every side effect a faithful translation must emit. */
|
||
```
|
||
|
||
## Implementation References
|
||
|
||
**`vperm`**
|
||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vperm"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1199`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1199)
|
||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:112`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L112)
|
||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:586`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L586)
|
||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2278-2302`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2278-L2302)
|
||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||
|
||
```rust
|
||
PpcOpcode::vperm | PpcOpcode::vperm128 => {
|
||
let (va, vb, vd);
|
||
let vc;
|
||
if matches!(instr.opcode, PpcOpcode::vperm128) {
|
||
va = instr.va128();
|
||
vb = instr.vb128();
|
||
vd = instr.vd128();
|
||
vc = instr.vc128_2();
|
||
} else {
|
||
va = instr.ra();
|
||
vb = instr.rb();
|
||
vd = instr.rd();
|
||
vc = instr.rc();
|
||
}
|
||
let a_bytes = ctx.vr[va].as_bytes();
|
||
let b_bytes = ctx.vr[vb].as_bytes();
|
||
let c_bytes = ctx.vr[vc].as_bytes();
|
||
let mut r = [0u8; 16];
|
||
for i in 0..16 {
|
||
let idx = (c_bytes[i] & 0x1F) as usize;
|
||
r[i] = if idx < 16 { a_bytes[idx] } else { b_bytes[idx - 16] };
|
||
}
|
||
ctx.vr[vd] = xenia_types::Vec128::from_bytes(r);
|
||
ctx.pc += 4;
|
||
}
|
||
```
|
||
</details>
|
||
|
||
**`vperm128`**
|
||
- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vperm128"`](../../xenia-canary/tools/ppc-instructions.xml)
|
||
- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1202`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1202)
|
||
- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:112`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L112)
|
||
- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:605`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L605)
|
||
- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2278-2302`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2278-L2302)
|
||
<details><summary>xenia-rs interpreter body (frozen snapshot)</summary>
|
||
|
||
```rust
|
||
PpcOpcode::vperm | PpcOpcode::vperm128 => {
|
||
let (va, vb, vd);
|
||
let vc;
|
||
if matches!(instr.opcode, PpcOpcode::vperm128) {
|
||
va = instr.va128();
|
||
vb = instr.vb128();
|
||
vd = instr.vd128();
|
||
vc = instr.vc128_2();
|
||
} else {
|
||
va = instr.ra();
|
||
vb = instr.rb();
|
||
vd = instr.rd();
|
||
vc = instr.rc();
|
||
}
|
||
let a_bytes = ctx.vr[va].as_bytes();
|
||
let b_bytes = ctx.vr[vb].as_bytes();
|
||
let c_bytes = ctx.vr[vc].as_bytes();
|
||
let mut r = [0u8; 16];
|
||
for i in 0..16 {
|
||
let idx = (c_bytes[i] & 0x1F) as usize;
|
||
r[i] = if idx < 16 { a_bytes[idx] } else { b_bytes[idx - 16] };
|
||
}
|
||
ctx.vr[vd] = xenia_types::Vec128::from_bytes(r);
|
||
ctx.pc += 4;
|
||
}
|
||
```
|
||
</details>
|
||
|
||
<!-- GENERATED: END -->
|
||
|
||
## Special Cases & Edge Conditions
|
||
|
||
- **Per-byte selector drives a cross-vector permute.** Each byte of `VC` is a 5-bit selector (low 5 bits used, upper 3 bits ignored). Bit 3 of that 5-bit field (i.e. the "16 bit") chooses which source: 0 selects from `VA`, 1 selects from `VB`. The low 4 bits index a byte within the chosen 16-byte operand.
|
||
- **`vperm` is the universal "16-byte reshuffle" primitive.** It can express any byte-level permutation of 32 source bytes (`VA ‖ VB`) down to 16 destination bytes, including duplicates and drops.
|
||
- **Big-endian byte indexing.** `VC.b[0]` controls `VD.b[0]` (the MSB byte). Selector value 0 picks `VA.b[0]`, value 15 picks `VA.b[15]`, value 16 picks `VB.b[0]`, value 31 picks `VB.b[15]`.
|
||
- **Upper 3 bits of each `VC` byte are ignored.** Only bits 3..7 (the low 5) are consulted, so values like 0x1F and 0x5F both mean "byte 15 of VB". Software can use those upper bits for its own tagging.
|
||
- **Pair with [`lvsl`](lvsl.md) / [`lvsr`](lvsr.md) for unaligned 16-byte loads.** `lvsl` produces the selector that shifts "left" by `EA & 0xF` bytes; feeding that into `vperm` with two aligned `lvx` results yields the unaligned 16-byte view.
|
||
- **Aliasing legal.** `VD` may equal `VA` or `VB`.
|
||
- **VMX128 sibling [`vperm128`](vperm128.md).** Same shape with the 7-bit register file. The VMX128 encoding carries `VC` in the 3-bit `VC` sub-field of the `VX128_2` form — which only lets `VC` select one of **8** specific registers, not 128. In xenia's decoder this is `vc128()`.
|
||
- **No flags, no VSCR side-effect.**
|
||
|
||
## Related Instructions
|
||
|
||
- [`vsldoi`](vsldoi.md) — static-shift-by-`SHB` form; when the shift is a compile-time constant this is cheaper than `lvsl`+`vperm`.
|
||
- [`lvsl`](lvsl.md), [`lvsr`](lvsr.md) — generate the permute mask from an effective address.
|
||
- [`vmrghb`](vmrghb.md), [`vmrglb`](vmrglb.md), [`vmrghh`](vmrghh.md), [`vmrglh`](vmrglh.md), [`vmrghw`](vmrghw.md), [`vmrglw`](vmrglw.md) — dedicated merges that are a subset of `vperm`.
|
||
- [`vspltb`](vspltb.md), [`vsplth`](vsplth.md), [`vspltw`](vspltw.md) — splat-from-lane, also expressible via `vperm` + a constant mask.
|
||
- [`vpkuhum`](vpkuhum.md) and other `vpk*` — narrower-lane packs whose pattern can also be encoded in `vperm`.
|
||
|
||
## IBM Reference
|
||
|
||
- [AIX 7.3 — `vperm` (Vector Permute)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vperm-vector-permute-instruction)
|
||
- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf)
|