vpkd3d128 was storing the pack codec output directly into vd128 without
applying the MakePermuteMask permutation that merges the packed scalar(s)
into the previous register value according to pack (slot layout) and shift
(destination lane offset).
PPCBUG-363: vpkd3d128 was missing the post-pack lane-placement step.
PPCBUG-369: vpkd3d128 pack field not extracted; pack=0 still worked
(identity), but pack=1/2/3 always wrote raw out instead of blending.
Fix: extract `pack = uimm & 3` and `shift = instr.vx128_4_z()` from the
VX128_4 IMM and z fields. For pack==0 (identity) store out directly as
before. For pack 1-3, read the existing vd128 value and select 4 u32
words from {prev, out} using the 3×4 static permutation tables from
canary ppc_emit_altivec.cc:2126-2188.
Tables derived from canary MakePermuteMask(r0,l0,…r3,l3):
pack=1 (VPACK_32): out[3] placed at lane (3-shift), prev elsewhere
pack=2 (64-bit): out[2..3] placed at lanes (2-shift)..(3-shift)
pack=3 (64-bit): same as pack=2 except shift=3 → out[2] at lane 3
Tests: vpkd3d128_pack0_legacy_unchanged, vpkd3d128_pack1_shift0_d3d_vertex_pack,
vpkd3d128_pack1_shift3_puts_out3_at_lane0
interpreter.rs: vpkd3d128 arm (~line 3999)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>