Files
xenia-rs/migration/project-root/ppc-manual/vmx/vperm.md
MechaCat02 e6d43a23ac chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:

  - claude-memory/             ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
                               (103 files, 1.1 MB - MEMORY.md + every
                                project_xenia_rs_*.md from audits
                                addis_signext through audit-058)
  - project-root/dot-claude/   <project-root>/.claude/settings.json
                               (Stop hook + permissions)
  - project-root/ppc-manual/   <project-root>/ppc-manual/
                               (PowerPC reference docs, 397 files, 3.7 MB)
  - project-root/run-canary.sh <project-root>/run-canary.sh
  - README.md                  Human-readable setup checklist
  - setup.sh                   Idempotent installer (also reclones
                               xenia-canary at pinned HEAD 6de80dffe)
  - MANIFEST.md                Per-file mapping + per-file-not-bundled
                               restoration recipe

Excluded from bundle (not shippable via git):
  - Sylpheed ISO (7.8 GB; copyright; manual copy required)
  - sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
  - target/ build artifacts (rebuild on target)
  - audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
  - audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
  - xenia-canary checkout (setup.sh reclones from
    git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:38:38 +02:00

9.2 KiB
Raw Blame History

vperm — Vector Permute

Category: VMX (Altivec) · Form: VA · Opcode: 0x1000002b

Assembler Mnemonics

Mnemonic XML entry Flags Description
vperm vperm Vector Permute
vperm128 vperm128 Vector128 Permute

Syntax

vperm [VD], [VA], [VB], [VC]
vperm128 [VD], [VA], [VB], [VC]

Encoding

vperm — form VA

  • Opcode word: 0x1000002b
  • Primary opcode (bits 05): 4
  • Extended opcode: 43
  • Synchronising: no
Bits Field Meaning
05 OPCD primary opcode (4)
610 VRT destination vector register
1115 VRA source A
1620 VRB source B
2125 VRC source C / shift
2631 XO extended opcode (6 bits)

vperm128 — form VX128_2

  • Opcode word: 0x14000000
  • Primary opcode (bits 05): 5
  • Extended opcode: 0
  • Synchronising: no
Bits Field Meaning
05 OPCD primary opcode (5)
610 VD128l destination low 5 bits
1115 VA128l source A low 5 bits
1620 VB128l source B low 5 bits
21 VA128H source A high bit
2325 VC source C 3-bit field
26 VA128h source A middle bit
2829 VD128h destination high 2 bits
3031 VB128h source B high 2 bits

Operands

Field Role Description
VA vperm: read; vperm128: read Source A vector register.
VB vperm: read; vperm128: read Source B vector register.
VC vperm: read; vperm128: read Source C vector register / 3-bit selector.
VD vperm: write; vperm128: write Destination vector register.

Register Effects

vperm

  • Reads (always): VA, VB, VC
  • Reads (conditional): none
  • Writes (always): VD
  • Writes (conditional): none

vperm128

  • Reads (always): VA, VB, VC
  • Reads (conditional): none
  • Writes (always): VD
  • Writes (conditional): none

Status-Register Effects

No condition-register or status-register effects.

Operation (pseudocode)

; Pseudocode derives directly from the xenia-rs interpreter
; arm (see Implementation References). Operation semantics:
;   - Read source operands from the fields listed under Operands.
;   - Apply the arithmetic / logical / memory action described
;     in the Description field above.
;   - Write results to the destination register(s); update any
;     status bits enumerated under Status-Register Effects.
; Consult the IBM AIX reference link under IBM Reference for
; canonical PPC-style pseudocode where xenia's expression is
; terse.

C Translation Example

/* C translation: the xenia-rs interpreter arm below in           */
/* Implementation References is the authoritative semantic        */
/* snapshot. Translate it line-by-line:                            */
/*   - ctx.gpr[N]  -> r[N]       (or f[]/v[] for FPRs/VRs)        */
/*   - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be   */
/*   - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v)   */
/*   - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO   */
/* The Register Effects and Status-Register Effects tables above  */
/* enumerate every side effect a faithful translation must emit.  */

Implementation References

vperm

xenia-rs interpreter body (frozen snapshot)
        PpcOpcode::vperm | PpcOpcode::vperm128 => {
            let (va, vb, vd);
            let vc;
            if matches!(instr.opcode, PpcOpcode::vperm128) {
                va = instr.va128();
                vb = instr.vb128();
                vd = instr.vd128();
                vc = instr.vc128_2();
            } else {
                va = instr.ra();
                vb = instr.rb();
                vd = instr.rd();
                vc = instr.rc();
            }
            let a_bytes = ctx.vr[va].as_bytes();
            let b_bytes = ctx.vr[vb].as_bytes();
            let c_bytes = ctx.vr[vc].as_bytes();
            let mut r = [0u8; 16];
            for i in 0..16 {
                let idx = (c_bytes[i] & 0x1F) as usize;
                r[i] = if idx < 16 { a_bytes[idx] } else { b_bytes[idx - 16] };
            }
            ctx.vr[vd] = xenia_types::Vec128::from_bytes(r);
            ctx.pc += 4;
        }

vperm128

xenia-rs interpreter body (frozen snapshot)
        PpcOpcode::vperm | PpcOpcode::vperm128 => {
            let (va, vb, vd);
            let vc;
            if matches!(instr.opcode, PpcOpcode::vperm128) {
                va = instr.va128();
                vb = instr.vb128();
                vd = instr.vd128();
                vc = instr.vc128_2();
            } else {
                va = instr.ra();
                vb = instr.rb();
                vd = instr.rd();
                vc = instr.rc();
            }
            let a_bytes = ctx.vr[va].as_bytes();
            let b_bytes = ctx.vr[vb].as_bytes();
            let c_bytes = ctx.vr[vc].as_bytes();
            let mut r = [0u8; 16];
            for i in 0..16 {
                let idx = (c_bytes[i] & 0x1F) as usize;
                r[i] = if idx < 16 { a_bytes[idx] } else { b_bytes[idx - 16] };
            }
            ctx.vr[vd] = xenia_types::Vec128::from_bytes(r);
            ctx.pc += 4;
        }

Special Cases & Edge Conditions

  • Per-byte selector drives a cross-vector permute. Each byte of VC is a 5-bit selector (low 5 bits used, upper 3 bits ignored). Bit 3 of that 5-bit field (i.e. the "16 bit") chooses which source: 0 selects from VA, 1 selects from VB. The low 4 bits index a byte within the chosen 16-byte operand.
  • vperm is the universal "16-byte reshuffle" primitive. It can express any byte-level permutation of 32 source bytes (VA ‖ VB) down to 16 destination bytes, including duplicates and drops.
  • Big-endian byte indexing. VC.b[0] controls VD.b[0] (the MSB byte). Selector value 0 picks VA.b[0], value 15 picks VA.b[15], value 16 picks VB.b[0], value 31 picks VB.b[15].
  • Upper 3 bits of each VC byte are ignored. Only bits 3..7 (the low 5) are consulted, so values like 0x1F and 0x5F both mean "byte 15 of VB". Software can use those upper bits for its own tagging.
  • Pair with lvsl / lvsr for unaligned 16-byte loads. lvsl produces the selector that shifts "left" by EA & 0xF bytes; feeding that into vperm with two aligned lvx results yields the unaligned 16-byte view.
  • Aliasing legal. VD may equal VA or VB.
  • VMX128 sibling vperm128. Same shape with the 7-bit register file. The VMX128 encoding carries VC in the 3-bit VC sub-field of the VX128_2 form — which only lets VC select one of 8 specific registers, not 128. In xenia's decoder this is vc128().
  • No flags, no VSCR side-effect.
  • vsldoi — static-shift-by-SHB form; when the shift is a compile-time constant this is cheaper than lvsl+vperm.
  • lvsl, lvsr — generate the permute mask from an effective address.
  • vmrghb, vmrglb, vmrghh, vmrglh, vmrghw, vmrglw — dedicated merges that are a subset of vperm.
  • vspltb, vsplth, vspltw — splat-from-lane, also expressible via vperm + a constant mask.
  • vpkuhum and other vpk* — narrower-lane packs whose pattern can also be encoded in vperm.

IBM Reference